Chitika

Showing posts with label Datastage. Show all posts
Showing posts with label Datastage. Show all posts

Monday, 12 March 2012

Performance Tuning techniques in datastage

Apply the filter conditions at source level to avoid unnecessary data flow in the job design.
Extract the only required columns from source.
Create large no.of datastage jobs with less no.of stages rather then less no.of jobs with more no.of stages.
Use dataset stage to store temporary data.
Avoid using the transformer stage as much as possible because it needs to have additional C++ compiler to execute the ETL Program.
Insted of using Transformer stage use copy stage to drop the unwanted columns or to send it to multiple stages.
use filter stage to apply conditions.
use surrogate key generator stage to generate sequence number.
prefer database sorts than data stage sorts.
use transformer stage to avoid aggregator stage.

Saturday, 3 March 2012

Some useful tips of how to debug parallel jobs in datastage.



Enable the following environment variables in DataStage Administrator:
* APT_PM_PLAYER_TIMING – shows how much CPU time each stage uses
* APT_PM_SHOW_PIDS – show process ID of each stage
* APT_RECORD_COUNTS – shows record counts in log
* APT_CONFIG_FILE – switch configuration file (one node, multiple nodes)
* OSH_DUMP – shows OSH code for your job. Shows if any unexpected settings were set by the GUI.
* APT_DUMP_SCORE – shows all processes and inserted operators in your job
* APT_DISABLE_COMBINATION – do not combine multiple stages in to one process. Disabling this will make it easier to see where your errors are occurring.
Use a Copy stage to dump out data to intermediate peek stages or sequential debug files. Copy stages get removed during compile time so they do not increase overhead.
Use row generator stage to generate sample data.
Look at the phantom files for additional error messages: c:\datastage\project_folder\&PH&
To catch partitioning problems, run your job with a single node configuration file and compare the output with your multi-node run. You can just look at the file size, or sort the data for a more detailed comparison (Unix sort + diff commands).

What are the Main Features in DataStage


DataStage has the following features to aid the design and processing required to
build a data warehouse:
• Uses graphical design tools. With simple point-and-click techniques you
can draw a scheme to represent your processing requirements.
• Extracts data from any number or types of database.
• Handles all the meta data definitions required to define your data warehouse. You can view and modify the table definitions at any point during
the design of your application.
• Aggregates data. You can modify SQL SELECT statements used to extract
data.
• Transforms data. DataStage has a set of predefined transforms and functions you can use to convert your data. You can easily extend the functionality by defining your own transforms to use.
• Loads the data warehouse

Wednesday, 15 February 2012

Differences between Filter & Switch

Filter:
  • It receives the data from one input link and send it to any no.of out put links.
  • It filters the records based on one or more where clause conditions.
  • Here every incoming record checks all the where clause conditions. So one input record goes to multiple output links.
  • Fallowing keywords are valid in defining where clause conditions  =,<,>,<>,<=,>=,AND,OR,NOT,BETWEEN
  • Output Reject = True or False.
Switch:   
  • It receives the data from one input link and send up to 128 output links.
  • It filters the records based on one or more case conditions.
  • Here every incoming record goes from one case condition only.
  • It supports only for equality operator.
  • If Not Found =Drop / Fail / Output.

Tuesday, 14 February 2012

Difference between Lookup, Join and Merge

Lookup stage supports:
  • One stream input link
  • 'n' reference links(n=1 to ...)
  • one output link
  • one optional reject link
Join stage supports:
  • one Left set 
  • one Right set
  • 'n' intermediate links(n=0 to ....)
  • one output link
  • no reject links
Merge stage supports:
  • one Master link
  • n-update links(n=1 to ...)
  • m-optional Reject links(m=0 to n)
  • one output link
In Lookup we can apply Conditions. But in Join & Merge not possible to apply Conditions
In Lookup joining columns may not be with same name. But in Join & Merge Joining columns must be with same name