DataStage Interview Questions

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 3

Reading multiple files through sequential file

This can be achieved by selecting the File pattern option and the path of the the files in the sequential stage. but the meta data should be same. How can we improve performance using copy stage? By using copy stage we can increase the performance. In this stage we can do sorting Removing unwanted columns

Passing parameters from one job to another job using command prompt
We can pass parameter to a job using two ways .. using dsjob- command line or from a sequencer.
Other way would be You configure single parameter set ( version 8.0 onwards) and use the same in both the jobs so that they share the same set of parameters.

How to Create new user in Datastage 7.5? The user created in and for windows will be a user for datastage by default.. You can't crete a user from datastage environment. If you want to create a user create through Control panel user.. The windows user id will be your login id and the corresponding password is used as a datastage login password..

The user of OS becomes a user of DataStage What is the difference between datastage 7.1 & 8.0.1 versions ?

What are the initial values of stage variables? How can we set this values? Is the variable only obtains the value during the execution doesn't pass any value to the target. When we are extracting the flatfiles, What are the basic required validations? Folllowing are some common validations performed: a) Check for blank lines and remove them. b) Check the number of column in each row of the file. c) If there is a trailer line in the flat file containing additional information like total number of records,then a cross check is performed to check if the number of records specified in the trailer and the actual number of records are same. d) Check if a column contains balnk value (If it is expected to have values). If you have Numerical+Characters data in the source, how will you load only Character data to the target? Which functions will you use in Transformer stage? Use following function in derivation of the transformer: convert("123456789" " " StringName)

Please note that the second argument contains nine spaces.Here StringName is the string for which the numerical values are to be removed. Hence if your string's value is "Test1234"(StringName) it will be converted to "Test".In this way only character data will be loaded to the target. What various validations do you perform on the data after extraction? In data validation we check for the data size, data type, removing extra spaces and applying proper null handling. By using the Lookup stage we will combine the data from multiple table. By using join stage also we will combine the data from multiple tables. Then what is the need of Lookup stage? Lookup, Joiner and Merger these 3 stages use for joining the data. Where these 3 stages vary is during capturing of unmatched data and performance. Joiner: If you joiner you can not capture unmatched data but joiner provides good performance because it supports sorted data inputs compare to lookup stage. Lookup: If you lookup stage you can capture unmatched master data. Merge: If you use merger you can capture N no. of unmatched reference data sets. What are the types of nodes in datastage? The degree of parallellism of parallel jobs depends on the number of nodes you define in your configuration file.Nodes are just the logically created processes by the OS. basically two types of nodes exist : a) Conductor node : Datastage engine is loaded into conductor node. b) processing nodes : One section leader is created per node.Section leaders fork the player processes. How will the performance improve by using Hash partition in Aggregator stage? As we know In HASH partition data with same key column/columns are moved to one partition for processing.aggregation is done with grouping on key column. So with Hash partion the data with the same key column are available in same partion for agggregation. and works better. How will the performance affect if we use more number of Transformer stages in Datastage parallel jobs? Transformer stages compile in C++ whereas other stages compile into OSH (Orchestarte scripting language.).If number of transfomers are more,first thing is the compilation time will be impacted.it will take more time to compile the Transformer stage. Practically,transformer stage really does not have performance impact on DS Jobs. If in your jobs,the number of stages are more,the performance will be impacted(not necessarily transformer stages).Hence,try to implement the job logic by using minimum stages in your DS Jobs. Will the performance improve by using dataset in parallel jobs? How? Yes. Since the datasets are internal storage file format for datastage and build using orchestrate scripts the specific scripts for datastage. So the dataset can be processed without any transcoding and hence very faster.

How does config file help in Parallel processing? Why array list is used in Oracle OCI stage? Brief about Important Options in OCI Stage. I suppose it is called "Array Size"....
Array Size is mainly used to increase the buffer during Write operations into a Oracle DB in a Server job using the Oracle OCI stage. This comes of great use to increase the performance of a job especially when the volume of data to be written (Insert, Upsert, Update) is huge. Typically the options that can be "Important" in OCI stage depends on the nature of job, however the other option that can be of general importance is the Transaction Size, but in the newer versions there is separate tab called "Transaction Handling" which takes care of it, the number of rows you mention here will be written into the Target DB before a Commit is made and hence improves the performance.

You might also like