Datastage Stuff: 2012

Monday 2 April 2012

Scenario13

source has 2 fields like

COMPANY             LOCATION
IBM                  HYD
TCS                  BAN
IBM                  CHE
HCL                  HYD
TCS                  CHE
IBM                  BAN
HCL                  BAN
HCL                  CHE

LIKE THIS.......

THEN THE OUTPUT LOOKS LIKE THIS....

Company loc count

TCS HYD 3
       BAN
         CHE
IBM HYD 3
         BAN
         CHE
HCL HYD 3
       BAN
         CHE

Monday 12 March 2012

Performance Tuning techniques in datastage

Apply the filter conditions at source level to avoid unnecessary data flow in the job design.
Extract the only required columns from source.
Create large no.of datastage jobs with less no.of stages rather then less no.of jobs with more no.of stages.
Use dataset stage to store temporary data.
Avoid using the transformer stage as much as possible because it needs to have additional C++ compiler to execute the ETL Program.
Insted of using Transformer stage use copy stage to drop the unwanted columns or to send it to multiple stages.
use filter stage to apply conditions.
use surrogate key generator stage to generate sequence number.
prefer database sorts than data stage sorts.
use transformer stage to avoid aggregator stage.

Saturday 3 March 2012

Some useful tips of how to debug parallel jobs in datastage.

Enable the following environment variables in DataStage Administrator:
* APT_PM_PLAYER_TIMING – shows how much CPU time each stage uses
* APT_PM_SHOW_PIDS – show process ID of each stage
* APT_RECORD_COUNTS – shows record counts in log
* APT_CONFIG_FILE – switch configuration file (one node, multiple nodes)
* OSH_DUMP – shows OSH code for your job. Shows if any unexpected settings were set by the GUI.
* APT_DUMP_SCORE – shows all processes and inserted operators in your job
* APT_DISABLE_COMBINATION – do not combine multiple stages in to one process. Disabling this will make it easier to see where your errors are occurring.
Use a Copy stage to dump out data to intermediate peek stages or sequential debug files. Copy stages get removed during compile time so they do not increase overhead.
Use row generator stage to generate sample data.
Look at the phantom files for additional error messages: c:\datastage\project_folder\&PH&
To catch partitioning problems, run your job with a single node configuration file and compare the output with your multi-node run. You can just look at the file size, or sort the data for a more detailed comparison (Unix sort + diff commands).

What are the Main Features in DataStage

DataStage has the following features to aid the design and processing required to
build a data warehouse:
• Uses graphical design tools. With simple point-and-click techniques you
can draw a scheme to represent your processing requirements.
• Extracts data from any number or types of database.
• Handles all the meta data definitions required to define your data warehouse. You can view and modify the table definitions at any point during
the design of your application.
• Aggregates data. You can modify SQL SELECT statements used to extract
data.
• Transforms data. DataStage has a set of predefined transforms and functions you can use to convert your data. You can easily extend the functionality by defining your own transforms to use.
• Loads the data warehouse

Tuesday 28 February 2012

Scenario12

There are 5 columns in a flat file, How to read 2nd and 5th columns only..

Scenario11

Difference between Trim, Trim B, Trim L, Trim T, Trim F, Trim Leading Trailing, Strip White Spaces..

Scenario10

How to abort the job after 50 rows are processed (assume there are 500 records available in a flat file)..

Scenario9

How many ways are there to perform remove duplicates function with out using Remove duplicate stage..

Saturday 25 February 2012

Scenario8

How to find out First sal, Last sal in each dept with out using aggrigater stage

Scenario7

How to calculate Sum(sal), Avg(sal), Min(sal), Max(sal) with out using Aggrigator stage..

Scenario6

Input is like this:
file1
1
2
3
4
5
6
7
8
9
10

Output is like:
file2(odd) file3(even)
1    2
3                      4
5                      6
7                      8
9                      10

Scenario5

Input is like this:
file1
10
20
10
10
20
30

Output is like:
file2 file3
10 30
20

Friday 17 February 2012

Scenario4

Input is like:
file1
10
20
10
10
20
30

Output is like Multiple occurrences in one file and single occurrences in one file:
file2 file3
10 30
10
10
20
20

Scenario3

Input is like this:
file1
10
20
10
10
20
30

Output is like:
file2 file3(duplicates)
10 10
20 10
30 20

Scenario2

input is like this:
no,char
1,a
2,b
3,a
4,b
5,a
6,a
7,b
8,a

But the output is in this form with row numbering of Duplicate occurence

output:

no,char,Count
"1","a","1"
"6","a","2"
"5","a","3"
"8","a","4"
"3","a","5"
"2","b","1"
"7","b","2"
"4","b","3"

Thursday 16 February 2012

Scenario1

Input is like this :
i/p
col1,col2
101,a
102,b
103,c
104,d

Output need to be:
o/p
col1,col2
101,d
102,c
103,b
104,a

Wednesday 15 February 2012

Differences between Filter & Switch

Filter:

It receives the data from one input link and send it to any no.of out put links.
It filters the records based on one or more where clause conditions.
Here every incoming record checks all the where clause conditions. So one input record goes to multiple output links.
Fallowing keywords are valid in defining where clause conditions =,<,>,<>,<=,>=,AND,OR,NOT,BETWEEN
Output Reject = True or False.

Switch:

It receives the data from one input link and send up to 128 output links.
It filters the records based on one or more case conditions.
Here every incoming record goes from one case condition only.
It supports only for equality operator.
If Not Found =Drop / Fail / Output.

Tuesday 14 February 2012

Difference between Lookup, Join and Merge

Lookup stage supports:

One stream input link
'n' reference links(n=1 to ...)
one output link
one optional reject link

Join stage supports:

one Left set
one Right set
'n' intermediate links(n=0 to ....)
one output link
no reject links

Merge stage supports:

one Master link
n-update links(n=1 to ...)
m-optional Reject links(m=0 to n)
one output link

In Lookup we can apply Conditions. But in Join & Merge not possible to apply Conditions
In Lookup joining columns may not be with same name. But in Join & Merge Joining columns must be with same name

Datastage Stuff

Pages

Chitika