Chitika

Monday 2 April 2012

Scenario13

source has 2 fields like

COMPANY             LOCATION
IBM                  HYD
TCS                  BAN
IBM                  CHE
HCL                  HYD
TCS                  CHE
IBM                  BAN
HCL                  BAN
HCL                  CHE

LIKE THIS.......

THEN THE OUTPUT LOOKS LIKE THIS....

Company loc count

TCS  HYD  3
         BAN
         CHE
IBM  HYD  3
         BAN
         CHE
HCL  HYD  3
         BAN
         CHE

Monday 12 March 2012

Performance Tuning techniques in datastage

Apply the filter conditions at source level to avoid unnecessary data flow in the job design.
Extract the only required columns from source.
Create large no.of datastage jobs with less no.of stages rather then less no.of jobs with more no.of stages.
Use dataset stage to store temporary data.
Avoid using the transformer stage as much as possible because it needs to have additional C++ compiler to execute the ETL Program.
Insted of using Transformer stage use copy stage to drop the unwanted columns or to send it to multiple stages.
use filter stage to apply conditions.
use surrogate key generator stage to generate sequence number.
prefer database sorts than data stage sorts.
use transformer stage to avoid aggregator stage.

Saturday 3 March 2012

Some useful tips of how to debug parallel jobs in datastage.



Enable the following environment variables in DataStage Administrator:
* APT_PM_PLAYER_TIMING – shows how much CPU time each stage uses
* APT_PM_SHOW_PIDS – show process ID of each stage
* APT_RECORD_COUNTS – shows record counts in log
* APT_CONFIG_FILE – switch configuration file (one node, multiple nodes)
* OSH_DUMP – shows OSH code for your job. Shows if any unexpected settings were set by the GUI.
* APT_DUMP_SCORE – shows all processes and inserted operators in your job
* APT_DISABLE_COMBINATION – do not combine multiple stages in to one process. Disabling this will make it easier to see where your errors are occurring.
Use a Copy stage to dump out data to intermediate peek stages or sequential debug files. Copy stages get removed during compile time so they do not increase overhead.
Use row generator stage to generate sample data.
Look at the phantom files for additional error messages: c:\datastage\project_folder\&PH&
To catch partitioning problems, run your job with a single node configuration file and compare the output with your multi-node run. You can just look at the file size, or sort the data for a more detailed comparison (Unix sort + diff commands).

What are the Main Features in DataStage


DataStage has the following features to aid the design and processing required to
build a data warehouse:
• Uses graphical design tools. With simple point-and-click techniques you
can draw a scheme to represent your processing requirements.
• Extracts data from any number or types of database.
• Handles all the meta data definitions required to define your data warehouse. You can view and modify the table definitions at any point during
the design of your application.
• Aggregates data. You can modify SQL SELECT statements used to extract
data.
• Transforms data. DataStage has a set of predefined transforms and functions you can use to convert your data. You can easily extend the functionality by defining your own transforms to use.
• Loads the data warehouse

Tuesday 28 February 2012

Scenario12

There are 5 columns in a flat file, How to read 2nd and 5th columns only..

Scenario11

Difference between Trim, Trim B, Trim L, Trim T, Trim F, Trim Leading Trailing, Strip White Spaces..

Scenario10

How to abort the job after 50 rows are processed (assume there are 500 records available in a flat file)..

Scenario9

How many ways are there to perform remove duplicates function with out using Remove duplicate stage..

Saturday 25 February 2012

Scenario8

How to find out First sal, Last sal in each dept with out using aggrigater stage

Scenario7

How to calculate Sum(sal), Avg(sal), Min(sal), Max(sal) with out using Aggrigator stage..

Scenario6

Input is like this:
file1
1
2
3
4
5
6
7
8
9
10

Output is like:
file2(odd)         file3(even)
1                      2   
3                      4
5                      6
7                      8
9                      10

Scenario5

Input is like this:
file1
10
20
10
10
20
30

Output is like:
file2               file3
10                  30
20 

Friday 17 February 2012

Scenario4

Input is like:
file1
10
20
10
10
20
30

Output is like Multiple occurrences in one file and single occurrences in one file:
file2      file3
10         30
10
10
20
20

Scenario3

Input is like this:
file1
10
20
10
10
20
30

Output is like:
file2               file3(duplicates)
10                  10  
20                  10
30                  20 

Scenario2


input is like this:
no,char
1,a
2,b
3,a
4,b
5,a
6,a
7,b
8,a



But the output is in this form  with row numbering of Duplicate occurence


output:

no,char,Count
"1","a","1"
"6","a","2"
"5","a","3"
"8","a","4"
"3","a","5"
"2","b","1"
"7","b","2"
"4","b","3"

Thursday 16 February 2012

Scenario1


Input is like this :
i/p
col1,col2
101,a
102,b
103,c
104,d

Output need to be:
o/p
col1,col2
101,d
102,c
103,b
104,a

Wednesday 15 February 2012

Differences between Filter & Switch

Filter:
  • It receives the data from one input link and send it to any no.of out put links.
  • It filters the records based on one or more where clause conditions.
  • Here every incoming record checks all the where clause conditions. So one input record goes to multiple output links.
  • Fallowing keywords are valid in defining where clause conditions  =,<,>,<>,<=,>=,AND,OR,NOT,BETWEEN
  • Output Reject = True or False.
Switch:   
  • It receives the data from one input link and send up to 128 output links.
  • It filters the records based on one or more case conditions.
  • Here every incoming record goes from one case condition only.
  • It supports only for equality operator.
  • If Not Found =Drop / Fail / Output.

Tuesday 14 February 2012

Difference between Lookup, Join and Merge

Lookup stage supports:
  • One stream input link
  • 'n' reference links(n=1 to ...)
  • one output link
  • one optional reject link
Join stage supports:
  • one Left set 
  • one Right set
  • 'n' intermediate links(n=0 to ....)
  • one output link
  • no reject links
Merge stage supports:
  • one Master link
  • n-update links(n=1 to ...)
  • m-optional Reject links(m=0 to n)
  • one output link
In Lookup we can apply Conditions. But in Join & Merge not possible to apply Conditions
In Lookup joining columns may not be with same name. But in Join & Merge Joining columns must be with same name