Welcome To The Finest Collection of Informatica Interview Questions With Standard Answers That You Can Count On
Welcome To The Finest Collection of Informatica Interview Questions With Standard Answers That You Can Count On
Welcome To The Finest Collection of Informatica Interview Questions With Standard Answers That You Can Count On
Welcome to the finest collection of Informatica Interview Questions with standard answers that you can count on. Read and understand all the
questions and their answers below and in the following pages to get a good grasp in Informatica.
What are the differences between Connected and Unconnected Lookup?
The differences are illustrated in the below table
Connected Lookup Unconnected Lookup
Connected lookup participates in dataflow and receives input
directly from the pipeline
Unconnected lookup receives input values from the result of a
LKP: expression in another transformation
Connected lookup can use both dynamic and static cache Unconnected Lookup cache can NOT be dynamic
Connected lookup can return more than one column value ( output
port )
Unconnected Lookup can return only one column value i.e. output
port
Connected lookup caches all lookup columns
Unconnected lookup caches only the lookup output ports in the
lookup conditions and the return port
Supports user-defined default values (i.e. value to return when
lookup conditions are not satisfied)
Does not support user defined default values
What is meant by active and passive transformation?
An active transformation is the one that performs any of the
following actions:
1) Change the number of rows between transformation input and
output. Example: Filter transformation.
2) Change the transaction boundary by defining commit or
rollback points., example transaction control transformation.
3) Change the row type, example Update strategy is active
because it flags the rows for insert, delete, update or reject.
On the other hand a passive transformation is the one which does not change the number of rows that pass through it. Example: Expression
transformation.
What is the difference between Router and Filter?
Following differences can be noted,
Router Filter
2
Router transformation divides the incoming records into multiple
groups based on some condition. Such groups can be mutually
inclusive (Different groups may contain same record)
Filter transformation restricts or blocks the incoming record set
based on one given condition.
Router transformation itself does not block any record. If a certain
record does not match any of the routing conditions, the record is
routed to default group
Filter transformation does not have a default group. If one record
does not match filter condition, the record is blocked
Router acts like CASE.. WHEN statement in SQL (Or Switch()..
Case statement in C)
Filter acts like WHERE condition is SQL.
What can we do to improve the performance of Informatica Aggregator Transformation?
Aggregator performance improves dramatically if records are sorted before passing to the aggregator and "sorted input" option under aggregator
properties is checked. The record set should be sorted on those columns that are used in Group By operation.
It is often a good idea to sort the record set in database level (click here to see why?) e.g. inside a source qualifier transformation, unless there is a
chance that already sorted records from source qualifier can again become unsorted before reaching aggregator
You may also read this article to know how to tune the performance of aggregator transformation
What are the different lookup cache(s)?
Informatica Lookups can be cached or un-cached (No cache). And Cached lookup can be either static or dynamic. A static cache is one which
does not modify the cache once it is built and it remains same during the session run. On the other hand, A dynamic cache is refreshed during the
session run by inserting or updating the records in cache based on the incoming source data. By default, Informatica cache is static cache.
A lookup cache can also be divided as persistent or non-persistent based on whether Informatica retains the cache even after the completion of
session run or deletes it
How can we update a record in target table without using Update strategy?
A target table can be updated without using 'Update Strategy'. For this, we need to define the key in the target table in Informatica level and then
we need to connect the key and the field we want to update in the mapping Target. In the session level, we should set the target property as
"Update as Update" and check the "Update" check-box.
Let's assume we have a target table "Customer" with fields as "Customer ID", "Customer Name" and "Customer Address". Suppose we want to
update "Customer Address" without an Update Strategy. Then we have to define "Customer ID" as primary key in Informatica level and we will
have to connect Customer ID and Customer Address fields in the mapping. If the session properties are set correctly as described above, then the
mapping will only update the customer address field for all matching customer IDs.
Under what condition selecting Sorted Input in aggregator may fail the session?
3
If the input data is not sorted correctly, the session will fail.
Also if the input data is properly sorted, the session may fail if the sort order by ports and the group by ports of the aggregator are not in the same
order.
Why is Sorter an Active Transformation?
This is because we can select the "distinct" option in the sorter property.
When the Sorter transformation is configured to treat output rows as distinct, it assigns all ports as part of the sort key. The Integration Service
discards duplicate rows compared during the sort operation. The number of Input Rows will vary as compared with the Output rows and hence it
is an Active transformation.
Is lookup an active or passive transformation?
From Informatica 9x, Lookup transformation can be configured as as "Active" transformation.
Find out How to configure lookup as active transformation
However, in the older versions of Informatica, lookup is a passive transformation
What is the difference between Static and Dynamic Lookup Cache?
We can configure a Lookup transformation to cache the underlying lookup table. In case of static or read-only lookup cache the Integration
Service caches the lookup table at the beginning of the session and does not update the lookup cache while it processes the Lookup
transformation.
In case of dynamic lookup cache the Integration Service dynamically inserts or updates data in the lookup cache and passes the data to the target.
The dynamic cache is synchronized with the target.
In case you are wondering why do we need to make lookup cache dynamic, read this article on dynamic lookup
What is the difference between STOP and ABORT options in Workflow Monitor?
When we issue the STOP command on the executing session task, the Integration Service stops reading data from source. It continues processing,
writing and committing the data to targets. If the Integration Service cannot finish processing and committing data, we can issue the abort
command.
In contrast ABORT command has a timeout period of 60 seconds. If the Integration Service cannot finish processing and committing data within
the timeout period, it kills the DTM process and terminates the session.
What are the new features of Informatica 9.x in developer level?
From a developer's perspective, some of the new features in Informatica 9.x are as follows:
4
Now Lookup can be configured as an active transformation - it can return multiple rows on successful match
Now you can write SQL override on un-cached lookup also. Previously you could do it only on cached lookup
You can control the size of your session log. In a real-time environment you can control the session log file size or time
Database deadlock resilience feature - this will ensure that your session does not immediately fail if it encounters any database deadlock, it will
now retry the operation again. You can configure number of retry attempts.
How to Delete duplicate row using Informatica
Scenario 1: Duplicate rows are present in relational database
Suppose we have Duplicate records in Source System and we want to load only the unique records in the Target System eliminating the duplicate
rows. What will be the approach?
Assuming that the source system is a Relational Database, to eliminate duplicate records, we can check the Distinctoption of the Source
Qualifier of the source table and load the target accordingly.
Scenario 2: Deleting duplicate records from flatfile
To know the answer of the above question (and many more similar high frequency Informatica questions) please continue to,Best Informatica
Interview Questions (Page 2) >> [Only for registered users]
5
You need to Register or Sign In to access the next page of the article, Best Informatica Interview Questions (Page 2) >>
Registration is FREE and takes less than a minute to complete!
Sample Questions from next page ...
1. How to Load Multiple Target Tables Based on Conditions?
2. How to Load Multiple Flat Files using one mapping
3. What happens to a mapping if we alter the datatypes?
4. State the limitations where we cannot use Joiner in the mapping pipeline.
5. How does Joiner transformation treat NULL value matching?
6. What happens when we change a non-reusable Sequence Generator to a resuable one?
And many more high frequency questions!
Deleting duplicate rows / selecting distinct rows for FLAT FILE sources
In the previous page we saw how to choose distinct records from Relational sources. Next we asked the question, how may we select the distinct
records for Flat File sources?
Here since the source system is a Flat File you will not be able to select the distinct option in the source qualifier as it will be disabled due to flat
file source table. Hence the next approach may be we use a Sorter Transformation and check theDistinct option. When we select the distinct
option all the columns will the selected as keys, in ascending order by default.
Deleting Duplicate Record Using Informatica Aggregator
6
Other ways to handle duplicate records in source batch run is to use an Aggregator Transformation and using the Group By checkbox on the
ports having duplicate occurring data. Here you can have the flexibility to select the last or the first of the duplicate column value records.
There is yet another option to ensure duplicate records are not inserted in the target. That is through Dynamic lookup cache. Using Dynamic
Lookup Cache of the target table and associating the input ports with the lookup port and checking the Insert Else Update option will help to
eliminate the duplicate records in source and hence loading unique records in the target.
For more details check, Dynamic Lookup Cache
Loading Multiple Target Tables Based on Conditions
Scenario
Suppose we have some serial numbers in a flat file source. We want to load the serial numbers in two target files one containing the EVEN serial
numbers and the other file having the ODD ones.
Answer
After the Source Qualifier place a Router Transformation. Create two Groups namely EVEN and ODD, with filter conditions as:
MOD(SERIAL_NO,2)=0 and MOD(SERIAL_NO,2)=1
... respectively. Then output the two groups into two flat file targets.
Normalizer Related Questions
7
Scenario 1
Suppose in our Source Table we have data as given below:
Student Name Maths Life Science Physical Science
Sam 100 70 80
John 75 100 85
Tom 80 100 85
We want to load our Target Table as:
Student Name Subject Name Marks
Sam Maths 100
Sam Life Science 70
Sam Physical Science 80
John Maths 75
John Life Science 100
John Physical Science 85
Tom Maths 80
Tom Life Science 100
Tom Physical Science 85
Describe your approach.
8
Answer
Here to convert the Rows to Columns we have to use the Normalizer Transformation followed by an Expression Transformation to Decode the
column taken into consideration. For more details on how the mapping is performed please visit Working with Normalizer
Question
Name the transformations which converts one to many rows i.e increases the i/p:o/p row count. Also what is the name of its reverse
transformation.
Answer
Normalizer as well as Router Transformations are the Active transformation which can increase the number of input rows to output rows.
Aggregator Transformation performs the reverse action of Normalizer transformation.
Scenario 2
Suppose we have a source table and we want to load three target tables based on source rows such that first row moves to first target table, secord
row in second target table, third row in third target table, fourth row again in first target table so on and so forth. Describe your approach.
Answer
We can clearly understand that we need a Router transformation to route or filter source data to the three target tables. Now the question is
what will be the filter conditions. First of all we need an Expression Transformation where we have all the source table columns and along with
that we have another i/o port say seq_num, which is gets sequence numbers for each source row from the port NextVal of a Sequence Generator
start value 0 and increment by 1. Now the filter condition for the three router groups will be:
MOD(SEQ_NUM,3)=1 connected to 1st target table
MOD(SEQ_NUM,3)=2 connected to 2nd target table
MOD(SEQ_NUM,3)=0 connected to 3rd target table
9
Loading Multiple Flat Files using one mapping
Scenario
Suppose we have ten source flat files of same structure. How can we load all the files in target database in a single batch run using a single
mapping.
Answer
After we create a mapping to load data in target database from flat files, next we move on to the session property of the Source Qualifier. To load
a set of source files we need to create a file say final.txt containing the source falt file names, ten files in our case and set the Source
filetype option as Indirect. Next point this flat file final.txt fully qualified throughSource file directory and Source filename.
10
Aggregator Transformation Related Questions
How can we implement Aggregation operation without using an Aggregator Transformation in Informatica?
Answer
We will use the very basic concept of the Expression Transformation that at a time we can access the previous row data as well as the currently
processed data in an expression transformation. What we need is simple Sorter, Expression and Filter transformation to achieve aggregation at
Informatica level.
For detailed understanding visit Aggregation without Aggregator
Scenario
Suppose in our Source Table we have data as given below:
Student Name Subject Name Marks
Sam Maths 100
Tom Maths 80
11
Sam Physical Science 80
John Maths 75
Sam Life Science 70
John Life Science 100
John Physical Science 85
Tom Life Science 100
Tom Physical Science 85
We want to load our Target Table as:
Student Name Maths Life Science Physical Science
Sam 100 70 80
John 75 100 85
Tom 80 100 85
Describe your approach.
Answer
Here our scenario is to convert many rows to one rows, and the transformation which will help us to achieve this isAggregator.
Our Mapping will look like this:
We will sort the source data based on STUDENT_NAME ascending followed by SUBJECT ascending.
12
Now based on STUDENT_NAME in GROUP BY clause the following output subject columns are populated as
MATHS: MAX(MARKS, SUBJECT=Maths)
LIFE_SC: MAX(MARKS, SUBJECT=Life Science)
PHY_SC: MAX(MARKS, SUBJECT=Physical Science)
Revisiting Source Qualifier Transformation
13
What is a Source Qualifier? What are the tasks we can perform using a SQ and why it is an ACTIVE transformation?
Ans. A Source Qualifier is an Active and Connected Informatica transformation that reads the rows from a relational database or flat file source.
We can configure the SQ to join [Both INNER as well as OUTER JOIN] data originating from the same source database.
We can use a source filter to reduce the number of rows the Integration Service queries.
We can specify a number for sorted ports and the Integration Service adds an ORDER BY clause to the default SQL query.
We can choose Select Distinctoption for relational databases and the Integration Service adds a SELECT DISTINCT clause to the default SQL
query.
Also we can write Custom/Used Defined SQL query which will override the default query in the SQ by changing the default settings of the
transformation properties.
Also we have the option to write Pre as well as Post SQL statements to be executed before and after the SQ query in the source database.
Since the transformation provides us with the property Select Distinct, when the Integration Service adds a SELECT DISTINCT clause to the
default SQL query, which in turn affects the number of rows returned by the Database to the Integration Service and hence it is an Active
transformation.
What happens to a mapping if we alter the datatypes between Source and its corresponding Source Qualifier?
Ans. The Source Qualifier transformation displays the transformation datatypes. The transformation datatypes determine how the source database
binds data when the Integration Service reads it.
Now if we alter the datatypes in the Source Qualifier transformation or the datatypes in the source definition and Source Qualifier
transformation do not match, the Designer marks the mapping as invalid when we save it.
Suppose we have used the Select Distinct and the Number Of Sorted Ports property in the SQ and then we add Custom SQL Query. Explain what
will happen.
Ans. Whenever we add Custom SQL or SQL override query it overrides the User-Defined Join, Source Filter, Number of Sorted Ports, and
Select Distinct settings in the Source Qualifier transformation. Hence only the user defined SQL Query will be fired in the database and all
the other options will be ignored .
Describe the situations where we will use the Source Filter, Select Distinct and Number Of Sorted Ports properties of Source Qualifier
transformation.
Ans. Source Filter option is used basically to reduce the number of rows the Integration Service queries so as to improve performance.
Select Distinct option is used when we want the Integration Service to select unique values from a source, filtering out unnecessary data earlier in
the data flow, which might improve performance.
Number Of Sorted Ports option is used when we want the source data to be in a sorted fashion so as to use the same in some following
transformations like Aggregator or Joiner, those when configured for sorted input will improve the performance.
14
What will happen if the SELECT list COLUMNS in the Custom override SQL Query and the OUTPUT PORTS order in SQ transformation do
not match?
Ans. Mismatch or Changing the order of the list of selected columns to that of the connected transformation output ports may result is session
failure.
What happens if in the Source Filter property of SQ transformation we include keyword WHERE say, WHERE CUSTOMERS.CUSTOMER_ID
> 1000.
Ans. We use source filter to reduce the number of source records. If we include the string WHERE in the source filter, the Integration
Service fails the session.
Describe the scenarios where we go for Joiner transformation instead of Source Qualifier transformation.
Ans. While joining Source Data of heterogeneous sources as well as to join flat files we will use the Joiner transformation. Use the Joiner
transformation when we need to join the following types of sources:
Join data from different Relational Databases.
Join data from different Flat Files.
Join relational sources and flat files.
What is the maximum number we can use in Number Of Sorted Ports for Sybase source system.
Ans. Sybase supports a maximum of 16 columns in an ORDER BY clause. So if the source is Sybase, do not sort more than 16 columns.
Suppose we have two Source Qualifier transformations SQ1 and SQ2 connected to Target tables TGT1 and TGT2 respectively. How do you
ensure TGT2 is loaded after TGT1?
Ans. If we have multiple Source Qualifier transformations connected to multiple targets, we can designate the order in which the Integration
Service loads data into the targets.
In the Mapping Designer, We need to configure the Target Load Plan based on the Source Qualifier transformations in a mapping to specify the
required loading order.
15
Suppose we have a Source Qualifier transformation that populates two target tables. How do you ensure TGT2 is loaded after TGT1?
Ans. In the Workflow Manager, we can Configure Constraint based load ordering for a session. The Integration Service orders the target load
on a row-by-row basis. For every row generated by an active source, the Integration Service loads the corresponding transformed row first to the
primary key table, then to the foreign key table.
16
Hence if we have one Source Qualifier transformation that provides data for multiple target tables having primary and foreign key relationships,
we will go for Constraint based load ordering.
Revisiting Filter Transformation
Q19. What is a Filter Transformation and why it is an Active one?
Ans. A Filter transformation is an Active and Connected transformation that can filter rows in a mapping.
Only the rows that meet the Filter Condition pass through the Filter transformation to the next transformation in the pipeline. TRUE and FALSE
are the implicit return values from any filter condition we set. If the filter condition evaluates to NULL, the row is assumed to be FALSE.
The numeric equivalent of FALSE is zero (0) and any non-zero value is the equivalent of TRUE.
As an ACTIVE transformation, the Filter transformation may change the number of rows passed through it. A filter condition returns TRUE or
FALSE for each row that passes through the transformation, depending on whether a row meets the specified condition. Only rows that return
TRUE pass through this transformation. Discarded rows do not appear in the session log or reject files.
Q20. What is the difference between Source Qualifier transformations Source Filter to Filter transformation?
Ans.
17
SQ Source Filter Filter Transformation
Source Qualifier transformation filters
rows when read from a source.
Filter transformation filters rows from within a mapping
Source Qualifier transformation can
only filter rows from Relational
Sources.
Filter transformation filters rows coming from any type of
source system in the mapping level.
Source Qualifier limits the row set
extracted from a source.
Filter transformation limits the row set sent to a target.
Source Qualifier reduces the number
of rows used throughout the mapping
and hence it provides better
performance.
To maximize session performance, include the Filter
transformation as close to the sources in the mapping as
possible to filter out unwanted data early in the flow of
data from sources to targets.
The filter condition in the Source
Qualifier transformation only uses
standard SQL as it runs in the
database.
Filter Transformation can define a condition using any
statement or transformation function that returns either a
TRUE or FALSE value.
Revisiting Joiner Transformation
Q21. What is a Joiner Transformation and why it is an Active one?
Ans. A Joiner is an Active and Connected transformation used to join source data from the same source system or from two related
heterogeneous sources residing in different locations or file systems.
The Joiner transformation joins sources with at least one matching column. The Joiner transformation uses a condition that matches one or more
pairs of columns between the two sources.
The two input pipelines include a master pipeline and a detail pipeline or a master and a detail branch. The master pipeline ends at the Joiner
transformation, while the detail pipeline continues to the target.
In the Joiner transformation, we must configure the transformation properties namely Join Condition, Join Type and Sorted Input option to
improve Integration Service performance.
The join condition contains ports from both input sources that must match for the Integration Service to join two rows. Depending on the type of
join selected, the Integration Service either adds the row to the result set or discards the row.
18
The Joiner transformation produces result sets based on the join type, condition, and input data sources. Hence it is an Active transformation.
Q22. State the limitations where we cannot use Joiner in the mapping pipeline.
Ans. The Joiner transformation accepts input from most transformations. However, following are the limitations:
Joiner transformation cannot be used when either of the input pipeline contains an Update Strategytransformation.
Joiner transformation cannot be used if we connect a Sequence Generator transformation directly before the Joiner transformation.
Q23. Out of the two input pipelines of a joiner, which one will you set as the master pipeline?
Ans. During a session run, the Integration Service compares each row of the master source against the detail source. The master and detail
sources need to be configured for optimal performance.
To improve performance for an Unsorted Joiner transformation, use the source with fewer rows as the master source. The fewer unique rows in
the master, the fewer iterations of the join comparison occur, which speeds the join process.
When the Integration Service processes an unsorted Joiner transformation, it reads all master rows before it reads the detail rows. The Integration
Service blocks the detail source while it caches rows from the master source. Once the Integration Service reads and caches all master rows, it
unblocks the detail source and reads the detail rows.
To improve performance for a Sorted Joiner transformation, use the source with fewer duplicate key values as the master source.
When the Integration Service processes a sorted Joiner transformation, it blocks data based on the mapping configuration and it stores fewer
rows in the cache, increasing performance.
Blocking logic is possible if master and detail input to the Joiner transformation originate from different sources. Otherwise, it does not use
blocking logic. Instead, it stores more rows in the cache.
Q24. What are the different types of Joins available in Joiner Transformation?
Ans. In SQL, a join is a relational operator that combines data from multiple tables into a single result set. The Joiner transformation is similar to
an SQL join except that data can originate from different types of sources.
The Joiner transformation supports the following types of joins :
Normal
Master Outer
Detail Outer
Full Outer
19
Note: A normal or master outer join performs faster than a full outer or detail outer join.
Q25. Define the various Join Types of Joiner Transformation.
Ans.
In a normal join , the Integration Service discards all rows of data from the master and detail source that do not match, based on the
join condition.
A master outer join keeps all rows of data from the detail source and the matching rows from the master source. It discards the
unmatched rows from the master source.
A detail outer join keeps all rows of data from the master source and the matching rows from the detail source. It discards the
unmatched rows from the detail source.
A full outer join keeps all rows of data from both the master and detail sources.
Q26. Describe the impact of number of join conditions and join order in a Joiner Transformation.
Ans. We can define one or more conditions based on equality between the specified master and detail sources. Both ports in a condition must
have the same datatype.
If we need to use two ports in the join condition with non-matching datatypes we must convert the datatypes so that they match. The Designer
validates datatypes in a join condition.
Additional ports in the join condition increases the time necessary to join two sources.
The order of the ports in the join condition can impact the performance of the Joiner transformation. If we use multiple ports in the join condition,
the Integration Service compares the ports in the order we specified.
20
NOTE: Only equality operator is available in joiner join condition.
Q27. How does Joiner transformation treat NULL value matching.
Ans. The Joiner transformation does not match null values.
For example, if both EMP_ID1 and EMP_ID2 contain a row with a null value, the Integration Service does not consider them a match and does
not join the two rows.
To join rows with null values, replace null input with default values in the Ports tab of the joiner, and then join on the default values.
Note: If a result set includes fields that do not contain data in either of the sources, the Joiner transformation populates the empty fields with null
values. If we know that a field will return a NULL and we do not want to insert NULLs in the target, set a default value on the Ports tab for the
corresponding port.
Q28. Suppose we configure Sorter transformations in the master and detail pipelines with the following sorted ports in order: ITEM_NO,
ITEM_NAME, PRICE.
When we configure the join condition, what are the guidelines we need to follow to maintain the sort order?
Ans. If we have sorted both the master and detail pipelines in order of the ports say ITEM_NO, ITEM_NAME and PRICE we must ensure that:
Use ITEM_NO in the First Join Condition.
If we add a Second Join Condition, we must use ITEM_NAME.
If we want to use PRICE as a Join Condition apart from ITEM_NO, we must also use ITEM_NAME in the Second Join Condition.
If we skip ITEM_NAME and join on ITEM_NO and PRICE, we will lose the input sort order and the Integration Service fails the session.
Q29. What are the transformations that cannot be placed between the sort origin and the Joiner transformation so that we do not lose the input
sort order.
Ans. The best option is to place the Joiner transformation directly after the sort origin to maintain sorted data. However do not place any of the
following transformations between the sort origin and the Joiner transformation:
Custom
UnsortedAggregator
Normalizer
Rank
Union transformation
XML Parser transformation
XML Generator transformation
Mapplet [if it contains any one of the above mentioned transformations]
Q30. Suppose we have the EMP table as our source. In the target we want to view those employees whose salary is greater than or equal to the
average salary for their departments. Describe your mapping approach.
Ans. Our Mapping will look like this:
21
ahref="http://png.dwbiconcepts.com/images/tutorial/info_interview/info_interview10.png"
To start with the mapping we need the following transformations:
After the Source qualifier of the EMP table place a Sorter Transformation . Sort based on DEPTNOport.
Next we place a Sorted Aggregator Transformation. Here we will find out the AVERAGE SALARY for each (GROUP BY)DEPTNO.
When we perform this aggregation, we lose the data for individual employees.
To maintain employee data, we must pass a branch of the pipeline to the Aggregator Transformation and pass a branch with the same sorted
source data to the Joiner transformation to maintain the original data.
When we join both branches of the pipeline, we join the aggregated data with the original data.
22
So next we need Sorted Joiner Transformation to join the sorted aggregated data with the original data, based onDEPTNO. Here we will be
taking the aggregated pipeline as the Master and original dataflow as Detail Pipeline.
23
After that we need a Filter Transformation to filter out the employees having salary less than average salary for their department.
Filter Condition: SAL>=AVG_SAL
24
Lastly we have the Target table instance.
Revisiting Sequence Generator Transformation
Q31. What is a Sequence Generator Transformation?
Ans. A Sequence Generator transformation is a Passive and Connected transformation that generates numeric values. It is used to create unique
primary key values, replace missing primary keys, or cycle through a sequential range of numbers. This transformation
by default contains ONLY Two OUTPUT ports namely CURRVAL and NEXTVAL. We cannot edit or delete these ports neither we cannot
add ports to this unique transformation. We can create approximately two billion unique numeric values with the widest range from 1 to
2147483647.
Q32. Define the Properties available in Sequence Generator transformation in brief.
Ans.
Sequence
Generator
Properties
Description
Start Value
Start value of the generated sequence that we want the Integration Service to use
if we use the Cycle option. If we select Cycle, the Integration Service cycles
back to this value when it reaches the end value. Default is 0.
25
Increment By
Difference between two consecutive values from the NEXTVAL port.Default is
1.
End Value
Maximum value generated by SeqGen. After reaching this value the session will
fail if the sequence generator is not configured to cycle.Default is 2147483647.
Current Value
Current value of the sequence. Enter the value we want the Integration Service to
use as the first value in the sequence. Default is 1.
Cycle
If selected, when the Integration Service reaches the configured end value for the
sequence, it wraps around and starts the cycle again, beginning with the
configured Start Value.
Number of
Cached Values
Number of sequential values the Integration Service caches at a time. Default
value for a standard Sequence Generator is 0. Default value for a reusable
Sequence Generator is 1,000.
Reset
Restarts the sequence at the current value each time a session runs.This option is
disabled for reusable Sequence Generator transformations.
Q33. Suppose we have a source table populating two target tables. We connect the NEXTVAL port of the Sequence Generator to the surrogate
keys of both the target tables.
Will the Surrogate keys in both the target tables be same? If not how can we flow the same sequence values in both of them.
Ans. When we connect the NEXTVAL output port of the Sequence Generator directly to the surrogate key columns of the target tables,
the Sequence number will not be the same.
A block of sequence numbers is sent to one target tables surrogate key column. The second targets receives a block of sequence numbers from the
Sequence Generator transformation only after the first target table receives the block of sequence numbers.
Suppose we have 5 rows coming from the source, so the targets will have the sequence values as TGT1 (1,2,3,4,5) and TGT2 (6,7,8,9,10). [Taken
into consideration Start Value 0, Current value 1 and Increment by 1.
Now suppose the requirement is like that we need to have the same surrogate keys in both the targets.
Then the easiest way to handle the situation is to put an Expression Transformation in between the Sequence Generator and the Target tables.
The SeqGen will pass unique values to the expression transformation, and then the rows are routed from the expression transformation to the
targets.
26
Q34. Suppose we have 100 records coming from the source. Now for a target column population we used a Sequence generator.
Suppose the Current Value is 0 and End Value of Sequence generator is set to 80. What will happen?
Ans. End Value is the maximum value the Sequence Generator will generate. After it reaches the End value the session fails with the following
error message:
TT_11009 Sequence Generator Transformation: Overflow error.
Failing of session can be handled if the Sequence Generator is configured to Cycle through the sequence, i.e. whenever the Integration Service
reaches the configured end value for the sequence, it wraps around and starts the cycle again, beginning with the configured Start Value.
Q35. What are the changes we observe when we promote a non resuable Sequence Generator to a resuable one? And what happens if we set the
Number of Cached Values to 0 for a reusable transformation?
Ans. When we convert a non reusable sequence generator to resuable one we observe that the Number of Cached Valuesis set to 1000 by
default; And the Reset property is disabled.
When we try to set the Number of Cached Values property of a Reusable Sequence Generator to 0 in the Transformation Developer we
encounter the following error message:
The number of cached values must be greater than zero for reusable sequence transformation.
Revisiting Aggregator Transformation
Q36. What is an Aggregator Transformation?
27
Ans. An aggregator is an Active, Connected transformation which performs aggregate calculations
like AVG, COUNT, FIRST,LAST, MAX, MEDIAN, MIN, PERCENTILE, STDDEV, SUM and VARIANCE.
Q37. How an Expression Transformation differs from Aggregator Transformation?
Ans. An Expression Transformation performs calculation on a row-by-row basis. An Aggregator Transformation performs calculations on
groups.
Q38. Does an Informatica Transformation support only Aggregate expressions?
Ans. Apart from aggregate expressions Informatica Aggregator also supports non-aggregate expressions and conditional clauses.
Q39. How does Aggregator Transformation handle NULL values?
Ans. By default, the aggregator transformation treats null values as NULL in aggregate functions. But we can specify to treat null values in
aggregate functions as NULL or zero.
Q40. What is Incremental Aggregation?
Ans. We can enable the session option, Incremental Aggregation for a session that includes an Aggregator Transformation. When the Integration
Service performs incremental aggregation, it actually passes changed source data through the mapping and uses the historical cache data to
perform aggregate calculations incrementally.
For reference check Implementing Informatica Incremental Aggregation
Q41. What are the performance considerations when working with Aggregator Transformation?
Ans.
Filter the unnecessary data before aggregating it. Place a Filter transformation in the mapping before the Aggregator transformation to
reduce unnecessary aggregation.
Improve performance by connecting only the necessary input/output ports to subsequent transformations, thereby reducing the size of
the data cache.
Use Sorted input which reduces the amount of data cached and improves session performance.
Q42. What differs when we choose Sorted Input for Aggregator Transformation?
Ans. Integration Service creates the index and data caches files in memory to process the Aggregator transformation. If the Integration Service
requires more space as allocated for the index and data cache sizes in the transformation properties, it stores overflow values in cache files i.e.
paging to disk. One way to increase session performance is to increase the index and data cache sizes in the transformation properties. But when
we check Sorted Input the Integration Service uses memory to process an Aggregator transformation it does not use cache files.
Q43. Under what conditions selecting Sorted Input in aggregator will still not boost session performance?
Ans.
28
Incremental Aggregation, session option is enabled.
The aggregate expression contains nested aggregate functions.
Source data is data driven.
Q44. Under what condition selecting Sorted Input in aggregator may fail the session?
Ans.
If the input data is not sorted correctly, the session will fail.
Also if the input data is properly sorted, the session may fail if the sort order by ports and the group by ports of the aggregator are not in
the same order.
Q45. Suppose we do not group by on any ports of the aggregator what will be the output.
Ans. If we do not group values, the Integration Service will return only the last row for the input rows.
Q46. What is the expected value if the column in an aggregator transform is neither a group by nor an aggregate expression?
Ans. Integration Service produces one row for each group based on the group by ports. The columns which are neither part of the key nor
aggregate expression will return the corresponding value of last record of the group received. However, if we specify particularly the FIRST
function, the Integration Service then returns the value of the specified first row of the group. So default is the LAST function.
Q47. Give one example for each of Conditional Aggregation, Non-Aggregate expression and Nested Aggregation.
Ans.
Use conditional clauses in the aggregate expression to reduce the number of rows used in the aggregation. The conditional clause can be any
clause that evaluates to TRUE or FALSE.
SUM( SALARY, JOB = CLERK )
Use non-aggregate expressions in group by ports to modify or replace groups.
IIF( PRODUCT = Brown Bread, Bread, PRODUCT )
The expression can also include one aggregate function within another aggregate function, such as:
MAX( COUNT( PRODUCT ))
Revisiting Rank Transformation
Q48. What is a Rank Transform?
Ans. Rank is an Active Connected Informatica transformation used to select a set of top or bottom values of data.
Q49. How does a Rank Transform differ from Aggregator Transform functions MAX and MIN?
29
Ans. Like the Aggregator transformation, the Rank transformation lets us group information. The Rank Transform allows us to select a group of
top or bottom values, not just one value as in case of Aggregator MAX, MIN functions.
Q50. What is a RANK port and RANKINDEX?
Ans. Rank port is an input/output port use to specify the column for which we want to rank the source values. By default Informatica creates an
output port RANKINDEX for each Rank transformation. It stores the ranking position for each row in a group.
Q51. How can you get ranks based on different groups?
Ans. Rank transformation lets us group information. We can configure one of its input/output ports as a group by port. For each unique value in
the group port, the transformation creates a group of rows falling within the rank definition (top or bottom, and a particular number in each rank).
Q52. What happens if two rank values match?
Ans. If two rank values match, they receive the same value in the rank index and the transformation skips the next value.
Q53. What are the restrictions of Rank Transformation?
Ans.
We can connect ports from only one transformation to the Rank transformation.
We can select the top or bottom rank.
We need to select the Number of records in each rank.
We can designate only one Rank port in a Rank transformation.
Q54. How does a Rank Cache works?
Ans. During a session, the Integration Service compares an input row with rows in the data cache. If the input row out-ranks a cached row, the
Integration Service replaces the cached row with the input row. If we configure the Rank transformation to rank based on different groups, the
Integration Service ranks incrementally for each group it finds. The Integration Service creates an index cache to stores the group information and
data cache for the row data.
Q55. How does Rank transformation handle string values?
Ans. Rank transformation can return the strings at the top or the bottom of a session sort order. When the Integration Service runs in Unicode
mode, it sorts character data in the session using the selected sort order associated with the Code Page of IS which may be French, German, etc.
When the Integration Service runs in ASCII mode, it ignores this setting and uses a binary sort order to sort character data.
Revisiting Sorter Transformation
Q56. What is a Sorter Transformation?
Ans. Sorter Transformation is an Active, Connected Informatica transformation used to sort data in ascending or descending order according to
specified sort keys. The Sorter transformation contains only input/output ports.
30
Q57. Why is Sorter an Active Transformation?
Ans. When the Sorter transformation is configured to treat output rows as distinct, it assigns all ports as part of the sort key. The Integration
Service discards duplicate rows compared during the sort operation. The number of Input Rows will vary as compared with the Output rows and
hence it is an Active transformation.
Q58. How does Sorter handle Case Sensitive sorting?
Ans. The Case Sensitive property determines whether the Integration Service considers case when sorting data. When we enable the Case
Sensitive property, the Integration Service sorts uppercase characters higher than lowercase characters.
Q59. How does Sorter handle NULL values?
Ans. We can configure the way the Sorter transformation treats null values. Enable the property Null Treated Low if we want to treat null values
as lower than any other value when it performs the sort operation. Disable this option if we want the Integration Service to treat null values as
higher than any other value.
Q60. How does a Sorter Cache works?
Ans. The Integration Service passes all incoming data into the Sorter Cache before Sorter transformation performs the sort operation.
The Integration Service uses the Sorter Cache Size property to determine the maximum amount of memory it can allocate to perform the sort
operation. If it cannot allocate enough memory, the Integration Service fails the session. For best performance, configure Sorter cache size with a
value less than or equal to the amount of available physical RAM on the Integration Service machine.
If the amount of incoming data is greater than the amount of Sorter cache size, the Integration Service temporarily stores data in the Sorter
transformation work directory. The Integration Service requires disk space of at least twice the amount of incoming data when storing data in the
work directory.
Revisiting Union Transformation
Q61. What is a Union Transformation?
Ans. The Union transformation is an Active, Connected non-blocking multiple input group transformation use to merge data from multiple
pipelines or sources into one pipeline branch. Similar to the UNION ALL SQL statement, the Union transformation does not remove duplicate
rows.
Q62. What are the restrictions of Union Transformation?
Ans.
All input groups and the output group must have matching ports. The precision, datatype, and scale must be identical across all groups.
We can create multiple input groups, but only one default output group.
The Union transformation does not remove duplicate rows.
31
We cannot use a Sequence Generator or Update Strategy transformation upstream from a Union transformation.
The Union transformation does not generate transactions.
General questions
Q63. What is the difference between Static and Dynamic Lookup Cache?
Ans. We can configure a Lookup transformation to cache the corresponding lookup table. In case of static or read-only lookup cache the
Integration Service caches the lookup table at the beginning of the session and does not update the lookup cache while it processes the Lookup
transformation.
In case of dynamic lookup cache the Integration Service dynamically inserts or updates data in the lookup cache and passes the data to the target.
The dynamic cache is synchronized with the target.
Q64. What is Persistent Lookup Cache?
Ans. Lookups are cached by default in Informatica. Lookup cache can be either non-persistent or persistent. The Integration Service saves or
deletes lookup cache files after a successful session run based on whether the Lookup cache is checked as persistent or not.
Q65. What is the difference between Reusable transformation and Mapplet?
Ans. Any Informatica Transformation created in the in the Transformation Developer or a non-reusable promoted to reusable transformation
from the mapping designer which can be used in multiple mappings is known as Reusable Transformation. When we add a reusable
transformation to a mapping, we actually add an instance of the transformation. Since the instance of a reusable transformation is a pointer to that
transformation, when we change the transformation in the Transformation Developer, its instances reflect these changes.
A Mapplet is a reusable object created in the Mapplet Designer which contains a set of transformations and lets us reuse the transformation
logic in multiple mappings. A Mapplet can contain as many transformations as we need. Like a reusable transformation when we use a mapplet in
a mapping, we use an instance of the mapplet and any change made to the mapplet is inherited by all instances of the mapplet.
Q66. What are the transformations that are not supported in Mapplet?
Ans. Normalizer, Cobol sources, XML sources, XML Source Qualifier transformations, Target definitions, Pre- and post- session Stored
Procedures, Other Mapplets.
Q67. What are the ERROR tables present in Informatica?
Ans.
PMERR_DATA- Stores data and metadata about a transformation row error and its corresponding source row.
PMERR_MSG- Stores metadata about an error and the error message.
PMERR_SESS- Stores metadata about the session.
PMERR_TRANS- Stores metadata about the source and transformation ports, such as name and datatype, when a transformation error
occurs.
Q68. What is the difference between STOP and ABORT?
32
Ans. When we issue the STOP command on the executing session task, the Integration Service stops reading data from source. It continues
processing, writing and committing the data to targets. If the Integration Service cannot finish processing and committing data, we can issue the
abort command.
In contrast ABORT command has a timeout period of 60 seconds. If the Integration Service cannot finish processing and committing data within
the timeout period, it kills the DTM process and terminates the session.
Q69. Can we copy a session to new folder or new repository?
Ans. Yes we can copy session to new folder or repository provided the corresponding Mapping is already in there.
Q70. What type of join does Lookup support?
Ans. Lookup is just similar like SQL LEFT OUTER JOIN.
SQL-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
What is the difference between inner and outer join? Explain with example.
Inner Join
Inner join is the most common type of Join which is used to combine the rows from two tables and create a result set containing only such records
that are present in both the tables based on the joining condition (predicate).
Inner join returns rows when there is at least one match in both tables
If none of the record matches between two tables, then INNER JOIN will return a NULL set. Below is an example of INNER JOIN and the
resulting set.
SELECT dept.name DEPARTMENT, emp.name EMPLOYEE
FROM DEPT dept, EMPLOYEE emp
WHERE emp.dept_id = dept.id
Department Employee
HR Inno
HR Privy
Engineering Robo
33
Engineering Hash
Engineering Anno
Engineering Darl
Marketing Pete
Marketing Meme
Sales Tomiti
Sales Bhuti
Outer Join
Outer Join can be full outer or single outer
Outer Join, on the other hand, will return matching rows from both tables as well as any unmatched rows from one or both the tables (based on
whether it is single outer or full outer join respectively).
Notice in our record set that there is no employee in the department 5 (Logistics). Because of this if we perform inner join, then Department 5
does not appear in the above result. However in the below query we perform an outer join (dept left outer join emp), and we can see this
department.
SELECT dept.name DEPARTMENT, emp.name EMPLOYEE
FROM DEPT dept, EMPLOYEE emp
WHERE dept.id = emp.dept_id (+)
Department Employee
HR Inno
HR Privy
Engineering Robo
34
Engineering Hash
Engineering Anno
Engineering Darl
Marketing Pete
Marketing Meme
Sales Tomiti
Sales Bhuti
Logistics
The (+) sign on the emp side of the predicate indicates that emp is the outer table here. The above SQL can be alternatively written as below (will
yield the same result as above):
SELECT dept.name DEPARTMENT, emp.name EMPLOYEE
FROM DEPT dept LEFT OUTER JOIN EMPLOYEE emp
ON dept.id = emp.dept_id
What is the difference between JOIN and UNION?
SQL JOIN allows us to lookup records on other table based on the given conditions between two tables. For example, if we have the
department ID of each employee, then we can use this department ID of the employee table to join with the department ID of department table to
lookup department names.
UNION operation allows us to add 2 similar data sets to create resulting data set that contains all the data from the source data sets. Union does
not require any condition for joining. For example, if you have 2 employee tables with same structure, you can UNION them to create one result
set that will contain all the employees from both of the tables.
SELECT * FROM EMP1
UNION
SELECT * FROM EMP2;
What is the difference between UNION and UNION ALL?
35
UNION and UNION ALL both unify for add two structurally similar data sets, but UNION operation returns only the unique records from the
resulting data set whereas UNION ALL will return all the rows, even if one or more rows are duplicated to each other.
In the following example, I am choosing exactly the same employee from the emp table and performing UNION and UNION ALL. Check the
difference in the result.
SELECT * FROM EMPLOYEE WHERE ID = 5
UNION ALL
SELECT * FROM EMPLOYEE WHERE ID = 5
ID MGR_ID DEPT_ID NAME SAL DOJ
5.0 2.0 2.0 Anno 80.0 01-Feb-2012
5.0 2.0 2.0 Anno 80.0 01-Feb-2012
SELECT * FROM EMPLOYEE WHERE ID = 5
UNION
SELECT * FROM EMPLOYEE WHERE ID = 5
ID MGR_ID DEPT_ID NAME SAL DOJ
5.0 2.0 2.0 Anno 80.0 01-Feb-2012
What is the difference between WHERE clause and HAVING clause?
WHERE and HAVING both filters out records based on one or more conditions. The difference is, WHERE clause can only be applied on a static
non-aggregated column whereas we will need to use HAVING for aggregated columns.
To understand this, consider this example.
Suppose we want to see only those departments where department ID is greater than 3. There is no aggregation operation and the condition needs
to be applied on a static field. We will use WHERE clause here:
SELECT * FROM DEPT WHERE ID > 3
ID NAME
4 Sales
36
5 Logistics
Next, suppose we want to see only those Departments where Average salary is greater than 80. Here the condition is associated with a non-static
aggregated information which is average of salary. We will need to use HAVING clause here:
SELECT dept.name DEPARTMENT, avg(emp.sal) AVG_SAL
FROM DEPT dept, EMPLOYEE emp
WHERE dept.id = emp.dept_id (+)
GROUP BY dept.name
HAVING AVG(emp.sal) > 80
DEPARTMENT AVG_SAL
Engineering 90
As you see above, there is only one department (Engineering) where average salary of employees is greater than 80.
What is the difference among UNION, MINUS and INTERSECT?
UNION combines the results from 2 tables and eliminates duplicate records from the result set.
MINUS operator when used between 2 tables, gives us all the rows from the first table except the rows which are present in the second table.
INTERSECT operator returns us only the matching or common rows between 2 result sets.
To understand these operators, lets see some examples. We will use two different queries to extract data from our emp table and then we will
perform UNION, MINUS and INTERSECT operations on these two sets of data.
UNION
SELECT * FROM EMPLOYEE WHERE ID = 5
UNION
SELECT * FROM EMPLOYEE WHERE ID = 6
ID MGR_ID DEPT_ID NAME SAL DOJ
5 2 2.0 Anno 80.0 01-Feb-2012
37
6 2 2.0 Darl 80.0 11-Feb-2012
MINUS
SELECT * FROM EMPLOYEE
MINUS
SELECT * FROM EMPLOYEE WHERE ID > 2
ID MGR_ID DEPT_ID NAME SAL DOJ
1
2 Hash 100.0 01-Jan-2012
2 1 2 Robo 100.0 01-Jan-2012
INTERSECT
SELECT * FROM EMPLOYEE WHERE ID IN (2, 3, 5)
INTERSECT
SELECT * FROM EMPLOYEE WHERE ID IN (1, 2, 4, 5)
ID MGR_ID DEPT_ID NAME SAL DOJ
5 2 2 Anno 80.0 01-Feb-2012
2 1 2 Robo 100.0 01-Jan-2012
What is Self Join and why is it required?
Self Join is the act of joining one table with itself.
Self Join is often very useful to convert a hierarchical structure into a flat structure
In our employee table example above, we have kept the manager ID of each employee in the same row as that of the employee. This is an
example of how a hierarchy (in this case employee-manager hierarchy) is stored in the RDBMS table. Now, suppose if we need to print out the
names of the manager of each employee right beside the employee, we can use self join. See the example below:
38
SELECT e.name EMPLOYEE, m.name MANAGER
FROM EMPLOYEE e, EMPLOYEE m
WHERE e.mgr_id = m.id (+)
EMPLOYEE MANAGER
Pete Hash
Darl Hash
Inno Hash
Robo Hash
Tomiti Robo
Anno Robo
Privy Robo
Meme Pete
Bhuti Tomiti
Hash
The only reason we have performed a left outer join here (instead of INNER JOIN) is we have one employee in this table without a manager
(employee ID = 1). If we perform inner join, this employee will not show-up.
How can we transpose a table using SQL (changing rows to column or vice-versa) ?
The usual way to do it in SQL is to use CASE statement or DECODE statement.
How to generate row number in SQL Without ROWNUM
Generating a row number that is a running sequence of numbers for each row is not easy using plain SQL. In fact, the method I am going to
show below is not very generic either. This method only works if there is at least one unique column in the table. This method will also work if
there is no single unique column, but collection of columns that is unique. Anyway, here is the query:
39
SELECT name, sal, (SELECT COUNT(*) FROM EMPLOYEE i WHERE o.name >= i.name) row_num
FROM EMPLOYEE o
order by row_num
NAME SAL ROW_NUM
Anno 80 1
Bhuti 60 2
Darl 80 3
Hash 100 4
Inno 50 5
Meme 60 6
Pete 70 7
Privy 50 8
Robo 100 9
Tomiti 70 10
The column that is used in the row number generation logic is called sort key. Here sort key is name column. For this technique to work, the
sort key needs to be unique. We have chosen the column name because this column happened to be unique in our Employee table. If it was not
unique but some other collection of columns was, then we could have used those columns as our sort key (by concatenating those columns to
form a single sort key).
Also notice how the rows are sorted in the result set. We have done an explicit sorting on the row_num column, which gives us all the row
numbers in the sorted order. But notice that name column is also sorted (which is probably the reason why this column is referred as sort-key). If
you want to change the order of the sorting from ascending to descending, you will need to change >= sign to <= in the query.
As I said before, this method is not very generic. This is why many databases already implement other methods to achieve this. For example, in
Oracle database, every SQL result set contains a hidden column called ROWNUM. We can just explicitly select ROWNUM to get sequence
numbers.
40
How to select first 5 records from a table?
This question, often asked in many interviews, does not make any sense to me. The problem here is how do you define which record is first and
which is second. Which record is retrieved first from the database is not deterministic. It depends on many uncontrollable factors such as how
database works at that moment of execution etc. So the question should really be how to select any 5 records from the table? But whatever it
is, here is the solution:
In Oracle,
SELECT *
FROM EMP
WHERE ROWNUM <= 5;
In SQL Server,
SELECT TOP 5 * FROM EMP;
Generic solution,
I believe a generic solution can be devised for this problem if and only if there exists at least one distinct column in the table. For example, in our
EMP table ID is distinct. We can use that distinct column in the below way to come up with a generic solution of this question that does not
require database specific functions such as ROWNUM, TOP etc.
SELECT name
FROM EMPLOYEE o
WHERE (SELECT count(*) FROM EMPLOYEE i WHERE i.name < o.name) < 5
name
Inno
Anno
Darl
Meme
Bhuti
I have taken name column in the above example since name is happened to be unique in this table. I could very well take ID column as well.
In this example, if the chosen column was not distinct, we would have got more than 5 records returned in our output.
41
Do you have a better solution to this problem? If yes, post your solution in the comment.
What is the difference between ROWNUM pseudo column and ROW_NUMBER() function?
ROWNUM is a pseudo column present in Oracle database returned result set prior to ORDER BY being evaluated. So ORDER BY ROWNUM
does not work.
ROW_NUMBER() is an analytical function which is used in conjunction to OVER() clause wherein we can specify ORDER BY and also
PARTITION BY columns.
Suppose if you want to generate the row numbers in the order of ascending employee salaries for example, ROWNUM will not work. But you
may use ROW_NUMBER() OVER() like shown below:
SELECT name, sal, row_number() over(order by sal desc) rownum_by_sal
FROM EMPLOYEE o
name Sal ROWNUM_BY_SAL
Hash 100 1
Robo 100 2
Anno 80 3
Darl 80 4
Tomiti 70 5
Pete 70 6
Bhuti 60 7
Meme 60 8
Inno 50 9
Privy 50 10
42
What are the differences among ROWNUM, RANK and DENSE_RANK?
ROW_NUMBER assigns contiguous, unique numbers from 1.. N to a result set.
RANK does not assign unique numbersnor does it assign contiguous numbers. If two records tie for second place, no record will be assigned
the 3rd rank as no one came in third, according to RANK. See below:
SELECT name, sal, rank() over(order by sal desc) rank_by_sal
FROM EMPLOYEE o
name Sal RANK_BY_SAL
Hash 100 1
Robo 100 1
Anno 80 3
Darl 80 3
Tomiti 70 5
Pete 70 5
Bhuti 60 7
Meme 60 7
Inno 50 9
Privy 50 9
DENSE_RANK, like RANK, does not assign unique numbers, but it does assign contiguous numbers. Even though two records tied for second
place, there is a third-place record. See below:
SELECT name, sal, dense_rank() over(order by sal desc) dense_rank_by_sal
FROM EMPLOYEE o
43
name Sal DENSE_RANK_BY_SAL
Hash 100 1
Robo 100 1
Anno 80 2
Darl 80 2
Tomiti 70 3
Pete 70 3
Bhuti 60 4
Meme 60 4
Inno 50 5
Privy 50 5
How to print/display the first line of a file?
There are many ways to do this. However the easiest way to display the first line of a file is using the [head] command.
$> head -1 file.txt
No prize in guessing that if you specify [head -2] then it would print first 2 records of the file.
Another way can be by using [sed] command. [Sed] is a very powerful text editor which can be used for various text manipulation purposes like
this.
$> sed '2,$ d' file.txt
You may be wondering how does the above command work? OK,
the 'd' parameter basically tells [sed] to delete all the records from
display output from line no. 2 to last line of the file (last line is
44
represented by $ symbol). Of course it does not actually delete
those lines from the file, it just does not display those lines in
standard output screen. So you only see the remaining line which
is the first line.
How to print/display the last line of a file?
The easiest way is to use the [tail] command.
$> tail -1 file.txt
If you want to do it using [sed] command, here is what you should write:
$> sed -n '$ p' test
From our previous answer, we already know that '$' stands for the last line of the file. So '$ p' basically prints (p for print) the last line in standard
output screen. '-n' switch takes [sed] to silent mode so that [sed] does not print anything else in the output.
How to display n-th line of a file?
The easiest way to do it will be by using [sed] I guess. Based on what we already know about [sed] from our previous examples, we can quickly
deduce this command:
$> sed n '<n> p' file.txt
You need to replace <n> with the actual line number. So if you want to print the 4th line, the command will be
$> sed n '4 p' test
Of course you can do it by using [head] and [tail] command as well like below:
$> head -<n> file.txt | tail -1
You need to replace <n> with the actual line number. So if you want to print the 4th line, the command will be
$> head -4 file.txt | tail -1
How to remove the first line / header from a file?
We already know how [sed] can be used to delete a certain line from the output by using the'd' switch. So if we want to delete the first line the
command should be:
$> sed '1 d' file.txt
But the issue with the above command is, it just prints out all the lines except the first line of the file on the standard output. It does not really
change the file in-place. So if you want to delete the first line from the file itself, you have two options.
45
Either you can redirect the output of the file to some other file and then rename it back to original file like below:
$> sed '1 d' file.txt > new_file.txt
$> mv new_file.txt file.txt
Or, you can use an inbuilt [sed] switch 'i' which changes the file in-place. See below:
$> sed i '1 d' file.txt
How to remove the last line/ trailer from a file in Unix script?
Always remember that [sed] switch '$' refers to the last line. So using this knowledge we can deduce the below command:
$> sed i '$ d' file.txt
How to remove certain lines from a file in Unix?
If you want to remove line <m> to line <n> from a given file, you can accomplish the task in the similar method shown above. Here is an
example:
$> sed i '5,7 d' file.txt
The above command will delete line 5 to line 7 from the file file.txt
How to remove the last n-th line from a file?
This is bit tricky. Suppose your file contains 100 lines and you want to remove the last 5 lines. Now if you know how many lines are there in the
file, then you can simply use the above shown method and can remove all the lines from 96 to 100 like below:
$> sed i '96,100 d' file.txt # alternative to command [head -95 file.txt]
But not always you will know the number of lines present in the file (the file may be generated dynamically, etc.) In that case there are many
different ways to solve the problem. There are some ways which are quite complex and fancy. But let's first do it in a way that we can understand
easily and remember easily. Here is how it goes:
$> tt=`wc -l file.txt | cut -f1 -d' '`;sed i "`expr $tt - 4`,$tt d" test
As you can see there are two commands. The first one (before the semi-colon) calculates the total number of lines present in the file and stores it
in a variable called tt. The second command (after the semi-colon), uses the variable and works in the exact way as shows in the previous
example.
How to check the length of any line in a file?
We already know how to print one line from a file which is this:
$> sed n '<n> p' file.txt
Where <n> is to be replaced by the actual line number that you want to print. Now once you know it, it is easy to print out the length of this line
by using [wc] command with '-c' switch.
46
$> sed n '35 p' file.txt | wc c
The above command will print the length of 35th line in the file.txt.
How to get the nth word of a line in Unix?
Assuming the words in the line are separated by space, we can use the [cut] command. [cut] is a very powerful and useful command and it's real
easy. All you have to do to get the n-th word from the line is issue the following command:
cut f<n> -d' '
'-d' switch tells [cut] about what is the delimiter (or separator) in the file, which is space ' ' in this case. If the separator was comma, we could have
written -d',' then. So, suppose I want find the 4th word from the below string: A quick brown fox jumped over the lazy cat, we will do
something like this:
$> echo A quick brown fox jumped over the lazy cat | cut f4 d' '
And it will print fox
How to reverse a string in unix?
Pretty easy. Use the [rev] command.
$> echo "unix" | rev
xinu
How to get the last word from a line in Unix file?
We will make use of two commands that we learnt above to solve this. The commands are [rev] and [cut]. Here we go.
Let's imagine the line is: C for Cat. We need Cat. First we reverse the line. We get taC rof C. Then we cut the first word, we get 'taC'. And
then we reverse it again.
$>echo "C for Cat" | rev | cut -f1 -d' ' | rev
Cat
How to get the n-th field from a Unix command output?
We know we can do it by [cut]. Like below command extracts the first field from the output of [wc c] command
$>wc -c file.txt | cut -d' ' -f1
109
But I want to introduce one more command to do this here. That is by using [awk] command. [awk] is a very powerful command for text pattern
scanning and processing. Here we will see how may we use of [awk] to extract the first field (or first column) from the output of another
command. Like above suppose I want to print the first column of the [wc c] output. Here is how it goes like this:
$>wc -c file.txt | awk ' ''{print $1}'
47
109
The basic syntax of [awk] is like this:
awk 'pattern space''{action space}'
The pattern space can be left blank or omitted, like below:
$>wc -c file.txt | awk '{print $1}'
109
In the action space, we have asked [awk] to take the action of printing the first column ($1). More on [awk] later.
How to replace the n-th line in a file with a new line in Unix?
This can be done in two steps. The first step is to remove the n-th line. And the second step is to insert a new line in n-th line position. Here we
go.
Step 1: remove the n-th line
$>sed -i'' '10 d' file.txt # d stands for delete
Step 2: insert a new line at n-th line position
$>sed -i'' '10 i This is the new line' file.txt # i stands for insert
How to show the non-printable characters in a file?
Open the file in VI editor. Go to VI command mode by pressing [Escape] and then [:]. Then type [set list]. This will show you all the non-
printable characters, e.g. Ctrl-M characters (^M) etc., in the file.
How to zip a file in Linux?
Use inbuilt [zip] command in Linux
How to unzip a file in Linux?
Use inbuilt [unzip] command in Linux.
$> unzip j file.zip
How to test if a zip file is corrupted in Linux?
Use -t switch with the inbuilt [unzip] command
$> unzip t file.zip
48
How to check if a file is zipped in Unix?
In order to know the file type of a particular file use the [file] command like below:
$> file file.txt
file.txt: ASCII text
If you want to know the technical MIME type of the file, use -i switch.
$>file -i file.txt
file.txt: text/plain; charset=us-ascii
If the file is zipped, following will be the result
$> file i file.zip
file.zip: application/x-zip
How to connect to Oracle database from within shell script?
You will be using the same [sqlplus] command to connect to database that you use normally even outside the shell script. To understand this, let's
take an example. In this example, we will connect to database, fire a query and get the output printed from the unix shell. Ok? Here we go
$>res=`sqlplus -s username/password@database_name <<EOF
SET HEAD OFF;
select count(*) from dual;
EXIT;
EOF`
$> echo $res
1
If you connect to database in this method, the advantage is, you will be able to pass Unix side shell variables value to the database. See below
example
$>res=`sqlplus -s username/password@database_name <<EOF
SET HEAD OFF;
select count(*) from student_table t where t.last_name=$1;
EXIT;
EOF`
49
$> echo $res
12
How to execute a database stored procedure from Shell script?
$> SqlReturnMsg=`sqlplus -s username/password@database<<EOF
BEGIN
Proc_Your_Procedure( your-input-parameters );
END;
/
EXIT;
EOF`
$> echo $SqlReturnMsg
How to check the command line arguments in a UNIX command in Shell Script?
In a bash shell, you can access the command line arguments using $0, $1, $2, variables, where $0 prints the command name, $1 prints the first
input parameter of the command, $2 the second input parameter of the command and so on.
How to fail a shell script programmatically?
Just put an [exit] command in the shell script with return value other than 0. this is because the exit codes of successful Unix programs is zero.
So, suppose if you write
exit -1
inside your program, then your program will thrown an error and exit immediately.
How to list down file/folder lists alphabetically?
Normally [ls lt] command lists down file/folder list sorted by modified time. If you want to list then alphabetically, then you should simply
specify: [ls l]
How to check if the last command was successful in Unix?
To check the status of last executed command in UNIX, you can check the value of an inbuilt bash variable [$?]. See the below example:
$> echo $?
How to check if a file is present in a particular directory in Unix?
50
Using command, we can do it in many ways. Based on what we have learnt so far, we can make use of [ls] and [$?] command to do this. See
below:
$> ls l file.txt; echo $?
If the file exists, the [ls] command will be successful. Hence [echo $?] will print 0. If the file does not exist, then [ls] command will fail and hence
[echo $?] will print 1.
How to check all the running processes in Unix?
The standard command to see this is [ps]. But [ps] only shows you the snapshot of the processes at that instance. If you need to monitor the
processes for a certain period of time and need to refresh the results in each interval, consider using the [top] command.
$> ps ef
If you wish to see the % of memory usage and CPU usage, then consider the below switches
$> ps aux
If you wish to use this command inside some shell script, or if you want to customize the output of [ps] command, you may use -o switch like
below. By using -o switch, you can specify the columns that you want [ps] to print out.
$>ps -e -o stime,user,pid,args,%mem,%cpu
How to tell if my process is running in Unix?
You can list down all the running processes using [ps] command. Then you can grep your user name or process name to see if the process is
running. See below:
$>ps -e -o stime,user,pid,args,%mem,%cpu | grep "opera"
14:53 opera 29904 sleep 60 0.0 0.0
14:54 opera 31536 ps -e -o stime,user,pid,arg 0.0 0.0
14:54 opera 31538 grep opera 0.0 0.0
How to get the CPU and Memory details in Linux server?
In Linux based systems, you can easily access the CPU and memory details from the /proc/cpuinfo and /proc/meminfo, like this:
$>cat /proc/meminfo
$>cat /proc/cpuinfo
Just try the above commands in your system to see how it works
51
DWH------------------------------------------------------------------------------------------------------------------------------------------------------------
What is data warehouse?
A data warehouse is a electronic storage of an Organization's
historical data for the purpose of reporting, analysis and data
mining or knowledge discovery.
Other than that a data warehouse can also be used for the purpose
of data integration, master data management etc.
According to Bill Inmon, a datawarehouse should be subject-
oriented, non-volatile, integrated and time-variant.
Explanatory Note
Note here, Non-volatile means that the data once loaded in the warehouse will not get deleted later. Time-variant means the data will change with
respect to time.
The above definition of the data warehousing is typically considered as "classical" definition. However, if you are interested, you may want to
read the article - What is a data warehouse - A 101 guide to modern data warehousing - which opens up a broader definition of data warehousing.
What is the benefits of data warehouse?
A data warehouse helps to integrate data (see Data integration) and store them historically so that we can analyze different aspects of business
including, performance analysis, trend, prediction etc. over a given time frame and use the result of our analysis to improve the efficiency of
business processes.
Why Data Warehouse is used?
For a long time in the past and also even today, Data warehouses are built to facilitate reporting on different key business processes of an
organization, known as KPI. Data warehouses also help to integrate data from different sources and show a single-point-of-truth values about the
business measures.
Data warehouse can be further used for data mining which helps trend prediction, forecasts, pattern recognition etc. Check this article to know
more about data mining
What is the difference between OLTP and OLAP?
OLTP is the transaction system that collects business data. Whereas OLAP is the reporting and analysis system on that data.
52
OLTP systems are optimized for INSERT, UPDATE operations and therefore highly normalized. On the other hand, OLAP systems are
deliberately denormalized for fast data retrieval through SELECT operations.
Explanatory Note:
In a departmental shop, when we pay the prices at the check-out counter, the sales person at the counter keys-in all the data into a "Point-Of-
Sales" machine. That data is transaction data and the related system is a OLTP system.
On the other hand, the manager of the store might want to view a report on out-of-stock materials, so that he can place purchase order for them.
Such report will come out from OLAP system
What is data mart?
Data marts are generally designed for a single subject area. An organization may have data pertaining to different departments like Finance, HR,
Marketting etc. stored in data warehouse and each department may have separate data marts. These data marts can be built on top of the data
warehouse.
What is ER model?
ER model or entity-relationship model is a particular methodology of data modeling wherein the goal of modeling is to normalize the data by
reducing redundancy. This is different than dimensional modeling where the main goal is to improve the data retrieval mechanism.
What is dimensional modeling?
Dimensional model consists of dimension and fact tables. Fact tables store different transactional measurements and the foreign keys from
dimension tables that qualifies the data. The goal of Dimensional model is not to achive high degree of normalization but to facilitate easy and
faster data retrieval.
Ralph Kimball is one of the strongest proponents of this very popular data modeling technique which is often used in many enterprise level data
warehouses.
If you want to read a quick and simple guide on dimensional modeling, please check our Guide to dimensional modeling.
What is dimension?
A dimension is something that qualifies a quantity (measure).
For an example, consider this: If I just say 20kg, it does not mean anything. But if I say, "20kg of Rice (Product) is sold to Ramesh
(customer) on 5th April (date)", then that gives a meaningful sense. These product, customer and dates are some dimension that qualified the
measure - 20kg.
53
Dimensions are mutually independent. Technically speaking, a dimension is a data element that categorizes each item in a data set into non-
overlapping regions.
What is Fact?
A fact is something that is quantifiable (Or measurable). Facts are typically (but not always) numerical values that can be aggregated.
What are additive, semi-additive and non-additive measures?
Non-additive Measures
Non-additive measures are those which can not be used inside any numeric aggregation function (e.g. SUM(), AVG() etc.). One example of non-
additive fact is any kind of ratio or percentage. Example, 5% profit margin, revenue to asset ratio etc. A non-numerical data can also be a non-
additive measure when that data is stored in fact tables, e.g. some kind of varchar flags in the fact table.
Semi Additive Measures
Semi-additive measures are those where only a subset of aggregation function can be applied. Lets say account balance. A sum() function on
balance does not give a useful result but max() or min() balance might be useful. Consider price rate or currency rate. Sum is meaningless on rate;
however, average function might be useful.
Additive Measures
Additive measures can be used with any aggregation function like Sum(), Avg() etc. Example is Sales Quantity etc.
At this point, I will request you to pause and make some time to read this article on "Classifying data for successful modeling". This article helps
you to understand the differences between dimensional data/ factual data etc. from a fundamental perspective
What is Star-schema?
This schema is used in data warehouse models where one centralized fact table references number of dimension tables so as the keys (primary
key) from all the dimension tables flow into the fact table (as foreign key) where measures are stored. This entity-relationship diagram looks like
a star, hence the name.
54
Consider a fact table that stores sales quantity for each product and customer on a certain time. Sales quantity will be the measure here and keys
from customer, product and time dimension tables will flow into the fact table.
If you are not very familiar about Star Schema design or its use, we strongly recommend you read our excellent article on this subject - different
schema in dimensional modeling
What is snow-flake schema?
Continue to next page of Top Data Warehousing Interview Questions (Page 2) >> [Only for registered users]
You need to Register or Sign In to access the next page of the article, Best Data Warehousing Interview Questions (Page 2) >>
Registration is FREE and takes less than a minute to complete!
Sample Questions from next page ...
1. What is snow-flake schema?
2. What are the different types of dimension?
3. What is junk dimension?
4. What is a mini dimension? Where is it used?
5. What is fact-less fact and what is coverage fact?
... And many more high frequency questions!
What is snow-flake schema?
This is another logical arrangement of tables in dimensional modeling where a centralized fact table references number of other dimension tables;
however, those dimension tables are further normalized into multiple related tables.
55
Consider a fact table that stores sales quantity for each product and customer on a certain time. Sales quantity will be the measure here and keys
from customer, product and time dimension tables will flow into the fact table. Additionally all the products can be further grouped under
different product families stored in a different table so that primary key of product family tables also goes into the product table as a foreign key.
Such construct will be called a snow-flake schema as product table is further snow-flaked into product family.
Note
Snow-flake increases degree of normalization in the design.
What are the different types of dimension?
In a data warehouse model, dimension can be of following types,
1. Conformed Dimension
2. Junk Dimension
3. Degenerated Dimension
4. Role Playing Dimension
Based on how frequently the data inside a dimension changes, we can further classify dimension as
1. Unchanging or static dimension (UCD)
2. Slowly changing dimension (SCD)
3. Rapidly changing Dimension (RCD)
You may also read, Modeling for various slowly changing dimension and Implementing Rapidly changing dimension to know more about SCD,
RCD dimensions etc.
What is a 'Conformed Dimension'?
56
A conformed dimension is the dimension that is shared across multiple subject area. Consider 'Customer' dimension. Both marketing and sales
department may use the same customer dimension table in their reports. Similarly, a 'Time' or 'Date' dimension will be shared by different subject
areas. These dimensions are conformed dimension.
Theoretically, two dimensions which are either identical or strict mathematical subsets of one another are said to be conformed.
What is degenerated dimension?
A degenerated dimension is a dimension that is derived from fact table and does not have its own dimension table.
A dimension key, such as transaction number, receipt number, Invoice number etc. does not have any more associated attributes and hence can
not be designed as a dimension table.
What is junk dimension?
A junk dimension is a grouping of typically low-cardinality attributes (flags, indicators etc.) so that those can be removed from other tables and
can be junked into an abstract dimension table.
These junk dimension attributes might not be related. The only purpose of this table is to store all the combinations of the dimensional attributes
which you could not fit into the different dimension tables otherwise. Junk dimensions are often used to implement Rapidly Changing
Dimensions in data warehouse.
What is a role-playing dimension?
Dimensions are often reused for multiple applications within the same database with different contextual meaning. For instance, a "Date"
dimension can be used for "Date of Sale", as well as "Date of Delivery", or "Date of Hire". This is often referred to as a 'role-playing dimension'
What is SCD?
SCD stands for slowly changing dimension, i.e. the dimensions where data is slowly changing. These can be of many types, e.g. Type 0, Type 1,
Type 2, Type 3 and Type 6, although Type 1, 2 and 3 are most common. Read this article to gather in-depth knowledge on various SCD tables.
What is rapidly changing dimension?
This is a dimension where data changes rapidly. Read this article to know how to implement RCD.
Describe different types of slowly changing Dimension (SCD)
Type 0:
A Type 0 dimension is where dimensional changes are not considered. This does not mean that the attributes of the dimension do not change in
actual business situation. It just means that, even if the value of the attributes change, history is not kept and the table holds all the previous data.
57
Type 1:
A type 1 dimension is where history is not maintained and the table always shows the recent data. This effectively means that such dimension
table is always updated with recent data whenever there is a change, and because of this update, we lose the previous values.
Type 2:
A type 2 dimension table tracks the historical changes by creating separate rows in the table with different surrogate keys. Consider there is a
customer C1 under group G1 first and later on the customer is changed to group G2. Then there will be two separate records in dimension table
like below,
Key Customer Group Start Date End Date
1 C1 G1 1st Jan 2000 31st Dec 2005
2 C1 G2 1st Jan 2006 NULL
Note that separate surrogate keys are generated for the two records. NULL end date in the second row denotes that the record is the current
record. Also note that, instead of start and end dates, one could also keep version number column (1, 2 etc.) to denote different versions of the
record.
Type 3:
A type 3 dimension stored the history in a separate column instead of separate rows. So unlike a type 2 dimension which is vertically growing, a
type 3 dimension is horizontally growing. See the example below,
Key Customer Previous Group Current Group
1 C1 G1 G2
This is only good when you need not store many consecutive histories and when date of change is not required to be stored.
Type 6:
A type 6 dimension is a hybrid of type 1, 2 and 3 (1+2+3) which acts very similar to type 2, but only you add one extra column to denote which
record is the current record.
Key Customer Group Start Date End Date Current Flag
1 C1 G1 1st Jan 2000 31st Dec 2005 N
58
2 C1 G2 1st Jan 2006 NULL Y
What is a mini dimension?
Mini dimensions can be used to handle rapidly changing dimension scenario. If a dimension has a huge number of rapidly changing attributes it is
better to separate those attributes in different table called mini dimension. This is done because if the main dimension table is designed as SCD
type 2, the table will soon outgrow in size and create performance issues. It is better to segregate the rapidly changing members in different table
thereby keeping the main dimension table small and performing.
What is a fact-less-fact?
A fact table that does not contain any measure is called a fact-less fact. This table will only contain keys from different dimension tables. This is
often used to resolve a many-to-many cardinality issue.
Explanatory Note:
Consider a school, where a single student may be taught by many teachers and a single teacher may have many students. To model this situation
in dimensional model, one might introduce a fact-less-fact table joining teacher and student keys. Such a fact table will then be able to answer
queries like,
1. Who are the students taught by a specific teacher.
2. Which teacher teaches maximum students.
3. Which student has highest number of teachers.etc. etc.
What is a coverage fact?
A fact-less-fact table can only answer 'optimistic' queries (positive query) but can not answer a negative query. Again consider the illustration in
the above example. A fact-less fact containing the keys of tutors and students can not answer a query like below,
1. Which teacher did not teach any student?
2. Which student was not taught by any teacher?
Why not? Because fact-less fact table only stores the positive scenarios (like student being taught by a tutor) but if there is a student who
is not being taught by a teacher, then that student's key does not appear in this table, thereby reducing the coverage of the table.
Coverage fact table attempts to answer this - often by adding an extra flag column. Flag = 0 indicates a negative condition and flag = 1 indicates a
positive condition. To understand this better, let's consider a class where there are 100 students and 5 teachers. So coverage fact table will ideally
store 100 X 5 = 500 records (all combinations) and if a certain teacher is not teaching a certain student, the corresponding flag for that record will
be 0.
What are incident and snapshot facts
A fact table stores some kind of measurements. Usually these measurements are stored (or captured) against a specific time and these
measurements vary with respect to time. Now it might so happen that the business might not able to capture all of its measures always for every
59
point in time. Then those unavailable measurements can be kept empty (Null) or can be filled up with the last available measurements. The first
case is the example of incident fact and the second one is the example of snapshot fact.
What is aggregation and what is the benefit of aggregation?
A data warehouse usually captures data with same degree of details as available in source. The "degree of detail" is termed as granularity. But all
reporting requirements from that data warehouse do not need the same degree of details.
To understand this, let's consider an example from retail business. A certain retail chain has 500 shops accross Europe. All the shops record detail
level transactions regarding the products they sale and those data are captured in a data warehouse.
Each shop manager can access the data warehouse and they can see which products are sold by whom and in what quantity on any given date.
Thus the data warehouse helps the shop managers with the detail level data that can be used for inventory management, trend prediction etc.
Now think about the CEO of that retail chain. He does not really care about which certain sales girl in London sold the highest number of
chopsticks or which shop is the best seller of 'brown breads'. All he is interested is, perhaps to check the percentage increase of his revenue
margin accross Europe. Or may be year to year sales growth on eastern Europe. Such data is aggregated in nature. Because Sales of goods in East
Europe is derived by summing up the individual sales data from each shop in East Europe.
Therefore, to support different levels of data warehouse users, data aggregation is needed.
What is slicing-dicing?
Slicing means showing the slice of a data, given a certain set of dimension (e.g. Product) and value (e.g. Brown Bread) and measures (e.g. sales).
Dicing means viewing the slice with respect to different dimensions and in different level of aggregations.
Slicing and dicing operations are part of pivoting.
What is drill-through?
Drill through is the process of going to the detail level data from summary data.
Consider the above example on retail shops. If the CEO finds out that sales in East Europe has declined this year compared to last year, he then
might want to know the root cause of the decrease. For this, he may start drilling through his report to more detail level and eventually find out
that even though individual shop sales has actually increased, the overall sales figure has decreased because a certain shop in Turkey has stopped
operating the business. The detail level of data, which CEO was not much interested on earlier, has this time helped him to pin point the root
cause of declined sales. And the method he has followed to obtain the details from the aggregated data is called drill through.
60
The Professional Services Group (PSG) Informatica Developer provides ETL (Extract, transform, load)
expertise as well as design and development support for data integration processes. This role is responsible for
streamlining the processes to acquire data, analyzing data, creating ETL mappings and developing SQL statements,
routines and procedures to integrate data from multiple sources.
The Informatica Developer will:
Design and develop the ETL Interface using the Informatica tool
Develop and implement the coding of Informatica mappings for different stages of ETL
Analyze user requirements and proposed potential system solutions
Understand and comply with development standards and the Software Development Life Cycle (SDLC) to ensure
consistency across the project
Collaborate with Client subject matter experts (SMEs), Client teams and other vendor teams
Create and maintain Informatica ETL routines and procedures for various Commercial off the Shelf (COTS) and
State Systems
Develop and configure Informatica software to process data web services and/or conversion
Create mappings and mapplets in Informatica
Work with technical and functional analysts to translate functional and technical requirements, into a design, and
a design into a realized and tested solution
Create data mappings and models to integrate data from multiple sources
Conduct analysis and problem solving to develop, deploy and maintain processes and methodologies
Analyze and modify existing programs to improve existing program performance
Review and update technical design documents
Write and maintain documentation to describe program development, logic, coding, testing changes and
corrections
Create ad hoc reports and work and provide expertise with data mining methodologies
Work in an Agile development environment and collaborate with Vendor Partners, Architects, Developers and
Business Analysts to create data mappings from source systems to the target systems, data warehouses and data
marts
Maintain industry/technical knowledge base and facilitate/maintain industry relationships
Demonstrate commitment to providing customer-focused quality service
Respond to Client requests within agreed upon timeframes
Perform other relevant duties based upon experience