How To Implement CDC in Datastage 8.1!: Incremental Loading in The Datastage Can Be Performed by Using The Change Data
How To Implement CDC in Datastage 8.1!: Incremental Loading in The Datastage Can Be Performed by Using The Change Data
Incremental loading in the DataStage can be performed by using the Change Data Capture stage. CDC stage takes sorted input on the key fields and generates a change code which denotes the type of change. Based on change code required DMLs can be performed on target database. Figure attached can explain the basic design of a CDC stage. In the attached design we will have to 1) Define the SOURCE and TARGET datasets as after and before. This means the dataset which has got latest value is a after dataset and the one with it is being compared is the before dataset. 2) Define change key columns on which the sorted output would be compared. 3) Define the change mode "Explicit Keys, All Values" this means change in the whole record will be considered. Any change in the any columns of the record will be detected as a change. 4) Define the different DML codes ie DELETE code = 3, INSERT code = 2 and so on. 5) once the above settings done we are good to identify the changes between two datasets. This CDC stage will give us the change code by using change_code() functions in the derivation area of CDC output mapping. 6) Pass this output to filter stage which will be filtering the data based on the change code and would output in three different links. 7) These filtered data would then go to the modify stage to drop the change_code column from the schema to make it database compatible. 8) Finally the datasets would get the all the newly inserted, updated and deleted records. Use appropriate native database stages to read these datasets to load them into the database. Points to remember while using CDC stage: 1) CDC needs sorted input hence if you have large number of records to compare
against then it could impact the performance of the job. Also this kind of job could be very memory intensive hence its placement in the sequence should be correct during the execution. 2) DO NOT forget to drop the change_code field from the schema of the datasets otherwise the next job to load data in the database would generate error.