0% found this document useful (0 votes)
160 views

Data Warehouse Schemas

- The document discusses different types of data warehouse schemas, including star schemas and snowflake schemas. - In a star schema, facts are classified along dimensions and stored in a central fact table linked to multiple dimension tables. Dimension tables contain attributes describing the facts. - A snowflake schema is similar to a star schema but with dimensions normalized into multiple related tables rather than a single denormalized table per dimension. This increases the number of joins needed.

Uploaded by

snivas1
Copyright
© © All Rights Reserved
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
160 views

Data Warehouse Schemas

- The document discusses different types of data warehouse schemas, including star schemas and snowflake schemas. - In a star schema, facts are classified along dimensions and stored in a central fact table linked to multiple dimension tables. Dimension tables contain attributes describing the facts. - A snowflake schema is similar to a star schema but with dimensions normalized into multiple related tables rather than a single denormalized table per dimension. This increases the number of joins needed.

Uploaded by

snivas1
Copyright
© © All Rights Reserved
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 87

Data Warehouse Schemas

A schema is a collection of database objects, including tables, views, indexes, and synonyms.
You can arrange schema objects in the schema models designed for data warehousing in a
variety of ways.
Star Schemas :
The star schema (also called star-join schema, data cube, or multi-dimensional schema) is
the simplest style of data warehouse schema. The star schema consists of one or more fact
tables referencing any number of dimension tables

The facts that the data warehouse helps analyze are classified along different dimensions:

 The fact table holds the main data. It includes a large amount of aggregated data, such
as price and units sold. There may be multiple fact tables in a star schema.
 Dimension tables, which are usually smaller than fact tables, include the
 attributes that describe the facts. Often this is a separate table for each dimension.
Dimension tables can be joined to the fact table(s) as needed.
Dimension tables have a simple primary key, while fact tables have a set of foreign keys
which make up a compound primary key consisting of a combination of relevant dimension
keys.
Advantages :

 Provide a direct and intuitive mapping between the business entities being analyzed
by end users and the schema design.
 Provide highly optimized performance for typical star queries.
 Are widely supported by a large number of business intelligence tools, which may
anticipate or even require that the data-warehouse schema contain dimension tables
Snow Flake Schemas : The snowflake schema is represented by centralized fact
tables which are connected to multiple dimensions. In the snowflake schema, dimensions
are normalized into multiple related tables, whereas the star schema's dimensions are
denormalized with each dimension represented by a single table.
Snowflake schemas are often better with more sophisticated query tools that isolate
users from the raw table structures and for environments having numerous queries with
complex criteria.

Advantages :

 Some OLAP multidimensional database modeling tools that use dimensional data
marts as data sources are optimized for snowflake schemas.
 A snowflake schema can sometimes reflect the way in which users think about data.
Users may prefer to generate queries using a star schema in some cases, although this may or
may not be reflected in the underlying organization of the database.
 A multidimensional view is sometimes added to an existing transactional database to
aid reporting. In this case, the tables which describe the dimensions will already exist and
will typically be normalized. A snowflake schema will therefore be easier to implement.
 If a dimension is very sparse (i.e. most of the possible values for the dimension have
no data) and/or a dimension has a very long list of attributes which may be used in a query,
the dimension table may occupy a significant proportion of the database and snowflaking
may be appropriate.

Star Schema v/s Snowflake Schema:

Star Schema v/s Snowflake Schema:


Star Schema is a relational database schema for representing multidimensional data.
It is the simplest form of data warehouse schema that contains one or more dimensions and
fact tables. It is called a star schema because the entity-relationship diagram between
dimensions and fact tables resembles a star where one fact table is connected to multiple
dimensions. The center of the star schema consists of a large fact table and it points towards
the dimension tables. The advantage of star schema are slicing down, performance increase
and easy understanding of data.

* Steps in designing Star Schema Identify a business process for analysis(like sales).
* Identify measures or facts (sales dollar).
* Identify dimensions for facts(product dimension, location dimension, time dimension,
organization dimension).
* List the columns that describe each dimension.(region name, branch name, region name).
A snowflake schema is a term that describes a star schema structure normalized through the
use of outrigger tables. i.e dimension table hierachies are broken into simpler tables. In star
schema example we had 4 dimensions like location, product, time, organization and a fact
table(sales).
In OLAP, this Snowflake schema approach increases the number of joins and poor
performance in retrieval of data. In few organizations, they try to normalize the dimension
tables to save space. Since dimension tables hold less space, Snowflake schema approach
may be avoided.

Important aspects of Star Schema & Snow Flake Schema:


In a star schema every dimension will have a primary key.
* In a star schema, a dimension table will not have any parent table whereas in a snow
flake schema, a dimension table will have one or more parent tables.
* Hierarchies for the dimensions are stored in the dimensional table itself in star schema
whereas hierachies are broken into separate tables in snow flake schema. These hierachies
helps to drill down the data from topmost hierachies to the lowermost hierarchies.

 Fact Table

 Fact Table
The centralized table in a star schema is called as FACT table. A fact table typically
has two types of columns: those that contain facts and those that are foreign keys to
dimension tables. The primary key of a fact table is usually a composite key that is
made up of all of its foreign keys.
 Eg: "Sales Dollar" is a fact(measure) and it can be added across several dimensions.
Fact tables store different types of measures like additive, non additive and semi
additive measures.

* Measure Types Additive - Measures that can be added across all dimensions.
* Non Additive - Measures that cannot be added across all dimensions.
* Semi Additive - Measures that can be added across few dimensions and not with
others.

A fact table might contain either detail level facts or facts that have been aggregated
(fact tables that contain aggregated facts are often instead called summary tables). In
the real world, it is possible to have a fact table that contains no measures or facts.
These tables are called as Factless Fact tables.

Steps in designing Fact Table Identify a business process for analysis(like sales):
 * Identify measures or facts (sales dollar).
* Identify dimensions for facts(product dimension, location dimension, time
dimension, organization dimension).
* List the columns that describe each dimension.(region name, branch name, region
name).
* Determine the lowest level of summary in a fact table(sales dollar).
Informatica Functions

TEST FUNCTIONS

1.1 ISNULL
The ISNULL function returns whether a value is NULL. It is available in the Designer and
the Workflow Manager.

ISNULL( value )

Example : The following example checks for null values in the items table:

ISNULL ISNULL ( ITEM_NAME )

ITEM_NAME RETURN VALUE


Flashlight 0 (FALSE)
NULL 1 (TRUE)
'' 0 (FALSE) Empty string is not NULL
1.2 IS_DATE
The IS_DATE function returns whether a value is a valid date. It is available in the Designer
and the Workflow Manager.

IS_DATE( value )

Example : The following expression checks the INVOICE_DATE port for valid dates:

IS_DATE( INVOICE_DATE )

This expression returns data similar to the following:

INVOICE_DATE RETURN VALUE


NULL NULL
180 0 (FALSE)
'04/01/98' 0 (FALSE)
'04/01/1998 00:12:15' 1 (TRUE)
'02/31/1998 12:13:55' 0 (FALSE) (February does not have 31 days)
'John Smith' 0 (FALSE)
This function can also be used to validate a date for a specified format for which the syntax is

IS_DATE( value, format )


If the format is not specified, ‘MM/DD/YYYY’ is taken as the default format.

1.3 IS_NUMBER
The IS_NUMBER returns whether a string is a valid number. It is available in the Designer
and the Workflow Manager.

IS_NUMBER( value )

Example : The following expression checks the ITEM_PRICE port for valid numbers:

IS_NUMBER( ITEM_PRICE )

ITEM_PRICE RETURN VALUE


123.00 1 (True)
-3.45e+3 1 (True)
'' 0 (False) Empty string
+123abc 0 (False)
ABC 0 (False)
-ABC 0 (False)
NULL NULL
1.4 IS_SPACES
The IS_SPACES function returns whether a value consists entirely of spaces. It is available in
the Designer and the Workflow Manager.

IS_SPACES( value )

Example : The following expression checks the ITEM_NAME port for rows that consist
entirely of spaces:

IS_SPACES IS_SPACES ( ITEM_NAME )

ITEM_NAME RETURN VALUE


Flashlight 0 (False)
1 (True)
Regulator
0 (False)
system

Special Functions
DECODE

The DECODE function searches a port for the specified value. It is available in the Designer
and the Workflow Manager.

DECODE( value, first_search, first_result [, second_search, second_result ]…[, default ] )

Example: We might use DECODE in an expression that searches for a particular ITEM_ID
and returns the ITEM_NAME:

DECODE( ITEM_ID, 10, 'Flashlight',

14, 'Regulator', 20, 'Knife', 40,

'Tank', 'NONE' )
ITEM_ID RETURN VALUE
10 Flashlight
14 Regulator
17 NONE

4.2 IIF

The IIF function returns one of two values we specify, based on the results of a condition. It
is available in the Designer and the Workflow Manager.

IIF( condition, value2 [, value2 ] )


Example : IIF( SALES < 100, 0, SALARY )

SALES SALARY RETURN VALUE


150 50,000.00 50,000
50 20,000.00 0
NULL 50,000.41 50,000

IIF functions can be nested if there is more than one condition to be tested. But it is always a
better option to go for DECODE function when the number of conditions is large since
DECODE function is less costlier compared to IIF function.

For example consider the following expression

IIF(MARKS>=90,'A', (IIF(MARKS>= 75,'B', (IIF(MARKS>=65,'C',

(IIF(MARKS>=55,'D', IIF(MARKS>=45,'E', 'F'))))))))

The same result can be obtained with

DECODE(TRUE,

MARKS>=90,'A',

MARKS>=75,'B',

MARKS>=65,'C',

MARKS>=55,'D',

MARKS>=45,'E',
'F')

When the number of conditions increase we will be able to appreciate the simplicity of the
DECODE function and the complexity of the IIF function.

In both the cases , If MARKS>90 it will return 'A' though it satisfies all the conditions
given. It is because it returns when the first condition is satisfied. Therefore even if a port
satisfies two or more the conditions it will take only the first one. Therefore Ordering is
important in IIF and DECODE functions.

4.3 ERROR:

The ERROR function causes the Informatica Server to skip a record and throws an error
message defined by the user. It is available in the Designer.

ERROR( string )

Example : The following example shows how you can reference a mapping that calculates
theaverage salary for employees in all departments of your company, but skips negative
values. The following expression nests the ERROR function in an IIF expression so that if the
Informatica Server finds a negative salary in the Salary port, it skips the row and displays an
error:

IIF( SALARY < 0, ERROR ('Error. Negative salary found. Row skipped.', EMP_SALARY )

SALARY RETURN VALUE

10000 10000

-15000 'Error. Negative salary found. Row skipped.'


The below example combines two special functions, a test Function and a conversion
function.

IIF(IS_DATE(DATE_PROMISED,'MM/DD/YY'),TO_DATE(DATE_PROMISED),ERROR('
Invalid Date'))

4.4 LOOKUP:

The LOOKUP function searches for a particular value in a lookup source column. It is
available in the Designer.

LOOKUP( result, search1, value1 [, search2, value2]… )

Example : The following expression searches the lookup source :TD.SALES for a
specific itemID and price, and returns the item name if both searches find a match:

LOOKUP( :TD.SALES.ITEM_NAME, :TD.SALES.ITEM_ID, 10, :TD.SALES.PRICE,


15.99 )

ITEM_NAME ITEM_ID PRICE


Regulator 5 100.00
Flashlight 10 15.99

Date Functions

Date Format Strings in the Transformation Reference


D, DD, DDD, DAY, DY, J

Days (01-31). We can use any of these format strings to specify the entire day portion of
adate. For example, if we pass 12-APR-1997 to a date function, we can use any of these
format strings specify 12.

HH, HH12, HH24

Hour of day (0 to 23), where zero is 12 AM (midnight). We can use any of these formats to
specify the entire hour portion of a date. For example, if we pass the date 12-APR-1997
2:01:32 PM, we can use HH, HH12, or HH24 to specify the hour portion of the date.

MI

Minutes.

MM, MON, MONTH

Month portion of date (0 to 59). We can use any of these format strings to specify the entire
month portion of a date. For example, if we pass 12-APR-1997 to a date function, we can use
MM, MON, or MONTH to specify APR.

SS , SSSS

Second portion of date (0 to 59).

Y, YY, YYY, YYYY , RR

Year portion of date (1753 to 9999). We can use any of these format strings to specify the
entire year portion of a date. For example, if we pass 12-APR-1997 to a date function, we can
use Y, YY, YYY, or YYYY to specify 1997.

3.1 ADD_TO_DATE

The ADD_TO_DATE function adds a specified amount to one part of a date/time value, and
returns a date in the same format as the specified date.

Note: If we do not specify the year as YYYY, the Informatica Server assumes the date is in
the current century. It is available in the Designer and the Workflow Manager.

ADD_TO_DATE( date, format, amount )

Example : The following expression adds one month to each date in the DATE_SHIPPED
port. If we pass a value that creates a day that does not exist in a particular month, the
Informatica Server returns the last day of the month. For example, if we add one month to Jan
31 1998, the Informatica Server returns Feb 28 1998.

Also note, ADD_TO_DATE recognizes leap years and adds one month to Jan 29 2000:

ADD_TO_DATE( DATE_SHIPPED, 'MM', 1 )

DATE_SHIPPED RETURN VALUE


Jan 12 1998 12:00:30AM Feb 12 1998 12:00:30AM

The following expression subtracts 10 days from each date in the DATE_SHIPPED port:

ADD_TO_DATE( DATE_SHIPPED, 'D', -10 )

DATE_SHIPPED RETURN VALUE


Jan 1 1997 12:00:30AM Dec 22 1996 12:00AM

The following expression subtracts 15 hours from each date in the DATE_SHIPPED port:

ADD_TO_DATE( DATE_SHIPPED, 'HH', -15 )

DATE_SHIPPED RETURN VALUE


Jan 1 1997 12:00:30AM Dec 31 1996 9:00:30AM

In ADD_TO_DATE function, if the argument passed evaluates to a date that does not exist in
a particular month, the Informatica Server returns the last day of the month.

The following expression reveals this.

ADD_TO_DATE( DATE_SHIPPED, 'MON', 3 )

DATE_SHIPPED RETURN VALUE


Jan 31 1998
Apr 30 1998 6:24:45PM
6:24:45PM

3.2 DATE_COMPARE

The DATE_COMPARE function returns a value indicating the earlier of two dates. It is
available in the Designer and the Workflow Manager.

DATE_COMPARE( date1, date2 )


Example : The following expression compares each date in the DATE_PROMISED and
DATE_SHIPPED ports, and returns an integer indicating which date is earlier:

DA

DATE_COMPARE ( DATE_PROMISED, DATE_SHIPPED )

DATE_PROMISED DATE_SHIPPED RETURN VALUE


Jan 1 1997 Jan 13 1997 -1
Feb 1 1997 Feb 1 1997 0
Dec 22 1997 Dec 15 1997 1

3.3 DATE_DIFF

The DATE_DIFF function returns the length of time between two dates, measured in the
specified increment (years, months, days, hours, minutes, or seconds). It is available in the
Designer and the Workflow Manager.

DATE_DIFF( date1, date2, format )

Example: The following expressions return the number of days between the
DATE_PROMISED and the DATE_SHIPPED ports:

DATE_DIFF

DATE_DIFF ( DATE_PROMISED, DATE_SHIPPED, 'D' )

DATE_DIFF

DATE_DIFF ( DATE_PROMISED, DATE_SHIPPED, 'DD' )


DATE_PROMISED DATE_SHIPPED RETURN VALUE
Jan 1 1997 12:00:00AM Mar 29 1997 12:00:00PM -87.5
Mar 29 1997 12:00:00PM Jan 1 1997 12:00:00AM 87.5

We can combine DATE functions and TEST functions so as to validate the dates.

For example, while using the DATE functions like DATE_COMPARE and DATE_DIFF, the
dates given as inputs can be validated using the TEST function IS_DATE and then passed to
them if valid.

3.4 GET_DATE_PART

The GET_DATE_PART function returns the specified part of a date as an integer value,
based on the default date format of MM/DD/YYYY HH24:MI:SS. It is available in the
Designer and the Workflow Manager.

GET_DATE_PART( date, format )

Example: The following expressions return the day for each date in the DATE_SHIPPED
port:

GE

GET_DATE_PART ( DATE_SHIPPED, 'D' )

GE

GET_DATE_PART ( DATE_SHIPPED, 'DD' )

DATE_SHIPPED RETURN VALUE


Mar 13 1997 12:00:00AM 13
June 3 1997 11:30:44PM 3
NULL NULL

3.5 LAST_DAY

The LAST_DAY function returns the date of the last day of the month for each date in a port.
It is available in the Designer and the Workflow Manager.

LAST_DAY( date )

Example : The following expression returns the last day of the month for each date in the
ORDER_DATE port:

LAST_DAY( ORDER_DATE )

ORDER_DATE RETURN VALUE


Apr 1 1998 12:00:00AM Apr 30 1998 12:00:00AM
Jan 6 1998 12:00:00AM Jan 31 1998 12:00:00AM

DATE functions combine with Conversion functions also.

The following expression has LAST_DAY and TO_DATE functions nested or combined
together.

LAST_DAY( TO_DATE( GIVEN_DATE, 'DD-MON-YY' ))


3.6 MAX

The MAX function returns the latest date found in a group. It is available in the Designer.

MAX( date, filter_condition )

We can return the maximum date for a port or group.

Example: The following expression returns the maximum order date for flashlights:

MAX( ORDERDATE, ITEM_NAME='Flashlight' )

ITEM_NAME ORDER_DATE
Flashlight Apr 20 1998
Regulator System May 15 1998
Flashlight Sep 21 1998
Diving Hood Aug 18 1998
Halogen Flashlight Feb 1 1998
Flashlight Oct 10 1998

RETURN VALUE: Oct 10 1998

3.7 MIN

The MIN function returns the earliest date found in a group. It is available in the Designer.

MIN( date, filter_condition )


Example: The following expression returns the oldest order date for flashlights:

MIN( ORDER_DATE, ITEM_NAME='Flashlight' )

ITEM_NAME ORDER_DATE
Flashlight Apr 20 1998
Regulator System May 15 1998
Flashlight Sep 21 1998
Diving Hood Aug 18 1998
Halogen Flashlight Feb 1 1998
Flashlight Oct 10 1998

RETURN VALUE: Feb 1 1998

3.8 ROUND

The ROUND function rounds one part of a date. It is available in the Designer and the
Workflow Manager.

ROUND( date [, format ] )

Example: The following expressions round the month portion of each date in the
DATE_SHIPPED port.

ROUND( DATE_SHIPPED, 'MM' )

ROUND( DATE_SHIPPED, 'MON' )

DATE_SHIPPED RETURN VALUE


Jan 15 1998 2:10:30AM Jan 1 1998 12:00:00AM
Similarly the ROUND function can be used to round off Year, Day or Time portions.

3.9 SET_DATE_PART

The SET_DATE_PART function sets one part of a date/time value to a specified value. It is
available in the Designer and the Workflow Manager.

SET_DATE_PART( date, format, value )

Example: The following expressions change the month to June for the dates in the
DATE_PROMISED port. The Informatica Server displays an error when we try to create a
date that does not exist, such as changing March 31 to June 31:

SET_DATE_PART( DATE_PROMISED, 'MM', 6 )

SET_DATE_PART( DATE_PROMISED, 'MON', 6 )

DATE_PROMISED RETURN VALUE


Jan 1 1997 12:15:56AM Jun 1 1997 12:15:56AM
NULL NULL

Similarly the SET_DATE_PART function can be used to round off Year, Day or Time
portions.

3.10 TRUNC
The TRUNC function truncates dates to a specific year, month, day, hour, or minute. It is
available in the Designer and the Workflow Manager.

TRUNC( date [, format ] )

Example: The following expressions truncate the year portion of dates in the
DATE_SHIPPED port:

TRUNC( DATE_SHIPPED, 'Y' )

TRUNC( DATE_SHIPPED, 'YY' )

DATE_SHIPPED RETURN VALUE


Jan 15 1998 2:10:30AM Jan 1 1998 12:00:00AM

Similarly the TRUNC function can be used to truncate Month , Day or Time portions.

The functions TRUNC & ROUND can be nested in order to manipulate dates.

Filter Transformation
• Active and connected transformation.
We can filter rows in a mapping with the Filter transformation. We pass all the rows from a
source transformation through the Filter transformation, and then enter a Filter condition for
the transformation. All ports in a Filter transformation are input/output and only rows that
meet the condition pass through the Filter Transformation.
Example: to filter records where SAL>2000
• Import the source table EMP in Shared folder. If it is already there, then don’t Import.
• In shared folder, create the target table Filter_Example. Keep all fields as in EMP table.
• Create the necessary shortcuts in the folder.
Creating Mapping:
1. Open folder where we want to create the mapping.
2. Click Tools -> Mapping Designer.
3. Click Mapping -> Create -> Give mapping name. Ex: m_filter_example
4. Drag EMP from source in mapping.
5. Click Transformation -> Create -> Select Filter from list. Give name and click Create. Now
click done.
6. Pass ports from SQ_EMP to Filter Transformation.
7. Edit Filter Transformation. Go to Properties Tab
8. Click the Value section of the Filter condition, and then click the Open button.
9. The Expression Editor appears.
10. Enter the filter condition you want to apply.
11. Click Validate to check the syntax of the conditions you entered.
12. Click OK -> Click Apply -> Click Ok.
13. Now connect the ports from Filter to target table.
14. Click Mapping -> Validate
15. Repository -> Save
Create Session and Workflow as described earlier. Run the workflow and see the data in
target table.
How to filter out rows with null values?
To filter out rows containing null values or spaces, use the ISNULL and IS_SPACES
Functions to test the value of the port. For example, if we want to filter out rows that Contain
NULLs in the FIRST_NAME port, use the following condition:
IIF (ISNULL (FIRST_NAME), FALSE, TRUE)
This condition states that if the FIRST_NAME port is NULL, the return value is FALSE and
the row should be discarded. Otherwise, the row passes through to the next Transformation.
Performance tuning:
Filter transformation is used to filter off unwanted fields based on conditions we Specify.
1. Use filter transformation as close to source as possible so that unwanted data gets
Eliminated sooner.
2. If elimination of unwanted data can be done by source qualifier instead of filter,Then
eliminate them at Source Qualifier itself.
3. Use conditional filters and keep the filter condition simple, involving TRUE/FALSE or 1/0

Expression Transformation
• Passive and connected transformation.
Use the Expression transformation to calculate values in a single row before we write to the
target.
For example, we might need to adjust employee salaries, concatenate first and last names, or
convert strings to numbers.
Use the Expression transformation to perform any non-aggregate calculations.
Example: Addition, Subtraction, Multiplication, Division, Concat, Uppercase conversion,
lowercase conversion etc. We can also use the Expression transformation to test conditional
statements before we output the results to target tables or other transformations.

Example: IF, Then, Decode There are 3 types of ports in Expression Transformation:
• Input
• Output
• Variable: Used to store any temporary calculation.

Calculating Values : To use the Expression transformation to calculate values for a single
row, we must include the following ports:
• Input or input/output ports for each value used in the calculation: For example: To calculate
Total Salary, we need salary and commission.
• Output port for the expression: We enter one expression for each output port. The return
value for the output port needs to match the return value of the expression. We can enter
multiple expressions in a single Expression transformation. We can create any number of
output ports in the transformation.
Example: Calculating Total Salary of an Employee
• Import the source table EMP in Shared folder. If it is already there, then don’t import.
• In shared folder, create the target table Emp_Total_SAL. Keep all ports as in EMP table
except Sal and Comm in target table. Add Total_SAL port to store the calculation.
• Create the necessary shortcuts in the folder.
Creating Mapping:
1. Open folder where we want to create the mapping.
2. Click Tools -> Mapping Designer.
3. Click Mapping -> Create -> Give mapping name. Ex: m_totalsal
4. Drag EMP from source in mapping.
5. Click Transformation -> Create -> Select Expression from list. Give name and click
Create. Now click done.
6. Link ports from SQ_EMP to Expression Transformation.
7. Edit Expression Transformation. As we do not want Sal and Comm in target, remove check
from output port for both columns.
8. Now create a new port out_Total_SAL. Make it as output port only.
9. Click the small button that appears in the Expression section of the dialog box and enter
the expression in the Expression Editor.
10. Enter expression SAL + COMM. You can select SAL and COMM from Ports tab in
expression editor.
11. Check the expression syntax by clicking Validate.
12. Click OK -> Click Apply -> Click Ok.
13. Now connect the ports from Expression to target table.
14. Click Mapping -> Validate
15. Repository -> Save Create Session and Workflow as described earlier. Run the workflow
and see the data in target table.
As COMM is null, Total_SAL will be null in most cases. Now open your mapping and
expression transformation. Select COMM port, In Default Value give 0. Now apply changes.
Validate Mapping and Save. Refresh the session and validate workflow again. Run the
workflow and see the result again. Now use ERROR in Default value of COMM to skip rows
where COMM is null.
Syntax: ERROR(‘Any message here’) Similarly, we can use ABORT function to abort the
session if COMM is null.
Syntax: ABORT(‘Any message here’) Make sure to double click the session after doing any
changes in mapping. It will prompt that mapping has changed. Click OK to refresh the
mapping. Run workflow after validating and saving the workflow. Performance
tuning :Expression transformation is used to perform simple calculations and also to do
Source lookups. 1. Use operators instead of functions. 2. Minimize the usage of string
functions. 3. If we use a complex expression multiple times in the expression transformer,
then Make that expression as a variable. Then we need to use only this variable for all
computations.

Rank Transformation
• Active and connected transformation.
A Router transformation is similar to a Filter transformation because both transformations
allow you to use a condition to test data. A Filter transformation tests data for one condition
and drops the rows of data that do not meet the Condition. However, a Router transformation
tests data for one or more conditions And gives you the option to route rows of data that do
not meet any of the conditions to a default output group.
Example: If we want to keep employees of France, India, US in 3 different tables, then
we can use 3 Filter transformations or 1 Router transformation.

Mapping A uses three Filter transformations while Mapping B produces the same result with
one Router transformation.
A Router transformation consists of input and output groups, input and output ports, group
filter conditions, and properties that we configure in the Designer.
Working with Groups
A Router transformation has the following types of groups:
• Input: The Group that gets the input ports.
• Output: User Defined Groups and Default Group. We cannot modify or delete Output ports
or their properties.

User-Defined Groups: We create a user-defined group to test a condition based on incoming


data. A user-defined group consists of output ports and a group filter Condition. We can
create and edit user-defined groups on the Groups tab with the Designer. Create one user-
defined group for each condition that we want to specify.

The Default Group: The Designer creates the default group after we create one new user-
defined group. The Designer does not allow us to edit or delete the default group. This group
does not have a group filter condition associated with it. If all of the conditions evaluate to
FALSE, the IS passes the row to the default group.
Example: Filtering employees of Department 10 to EMP_10, Department 20 to EMP_20 and
rest to EMP_REST
• Source is EMP Table.
• Create 3 target tables EMP_10, EMP_20 and EMP_REST in shared folder. Structure should
be same as EMP table.
• Create the shortcuts in your folder.
Creating Mapping:
1. Open folder where we want to create the mapping.
2. Click Tools -> Mapping Designer.
3. Click Mapping-> Create-> Give mapping name. Ex: m_router_example
4. Drag EMP from source in mapping.
5. Click Transformation -> Create -> Select Router from list. Give name and
Click Create. Now click done.
6. Pass ports from SQ_EMP to Router Transformation.
7. Edit Router Transformation. Go to Groups Tab
8. Click the Groups tab, and then click the Add button to create a user-defined Group. The
default group is created automatically..
9. Click the Group Filter Condition field to open the Expression Editor.
10. Enter a group filter condition. Ex: DEPTNO=10
11. Click Validate to check the syntax of the conditions you entered.

12. Create another group for EMP_20. Condition: DEPTNO=20


13. The rest of the records not matching the above two conditions will be passed to
DEFAULT group. See sample mapping
14. Click OK -> Click Apply -> Click Ok.
15. Now connect the ports from router to target tables.
16. Click Mapping -> Validate
17. Repository -> Save
• Create Session and Workflow as described earlier. Run the Workflow and see the data in
target table.
• Make sure to give connection information for all 3 target tables.
Sample Mapping:
Difference between Router and Filter :

We cannot pass rejected data forward in filter but we can pass it in router. Rejected data is in
Default Group of router.

Sorter Transformation
• Connected and Active Transformation
• The Sorter transformation allows us to sort data.
• We can sort data in ascending or descending order according to a specified sort key.
• We can also configure the Sorter transformation for case-sensitive sorting, and specify
whether the output rows should be distinct.
When we create a Sorter transformation in a mapping, we specify one or more ports as a sort
key and configure each sort key port to sort in ascending or descending order. We also
configure sort criteria the Power Center Server applies to all sort key ports and the system
resources it allocates to perform the sort operation.
The Sorter transformation contains only input/output ports. All data passing through the
Sorter transformation is sorted according to a sort key. The sort key is one or more ports that
we want to use as the sort criteria.

Sorter Transformation Properties

1. Sorter Cache Size:


The Power Center Server uses the Sorter Cache Size property to determine the maximum
amount of memory it can allocate to perform the sort operation. The Power Center Server
passes all incoming data into the Sorter transformation Before it performs the sort operation.
• We can specify any amount between 1 MB and 4 GB for the Sorter cache size.
• If it cannot allocate enough memory, the Power Center Server fails the Session.
• For best performance, configure Sorter cache size with a value less than or equal to the
amount of available physical RAM on the Power Center Server machine.
• Informatica recommends allocating at least 8 MB of physical memory to sort data using the
Sorter transformation.
2. Case Sensitive:
The Case Sensitive property determines whether the Power Center Server considers case
when sorting data. When we enable the Case Sensitive property, the Power Center Server
sorts uppercase characters higher than lowercase characters.
3. Work Directory
Directory Power Center Server uses to create temporary files while it sorts data.
4. Distinct:
Check this option if we want to remove duplicates. Sorter will sort data according to all the
ports when it is selected.

Example: Sorting data of EMP by ENAME


• Source is EMP table.
• Create a target table EMP_SORTER_EXAMPLE in target designer. Structure same as EMP
table.
• Create the shortcuts in your folder.
Creating Mapping:
1. Open folder where we want to create the mapping.
2. Click Tools -> Mapping Designer.
3. Click Mapping-> Create-> Give mapping name. Ex: m_sorter_example
4. Drag EMP from source in mapping.
5. Click Transformation -> Create -> Select Sorter from list. Give name and click Create.
Now click done.
6. Pass ports from SQ_EMP to Sorter Transformation.
7. Edit Sorter Transformation. Go to Ports Tab
8. Select ENAME as sort key. CHECK mark on KEY in front of ENAME.
9. Click Properties Tab and Select Properties as needed.
10. Click Apply -> Ok.
11. Drag target table now.
12. Connect the output ports from Sorter to target table.
13. Click Mapping -> Validate
14. Repository -> Save
• Create Session and Workflow as described earlier. Run the Workflow and see the data in
target table.
• Make sure to give connection information for all tables.
Sample Sorter Mapping :

Performance Tuning:
Sorter transformation is used to sort the input data.
1. While using the sorter transformation, configure sorter cache size to be larger than the
input data size.
2. Configure the sorter cache size setting to be larger than the input data size while Using
sorter transformation.
3. At the sorter transformation, use hash auto keys partitioning or hash user keys Partitioning.
Rank Transformation
• Active and connected transformation
The Rank transformation allows us to select only the top or bottom rank of data. It Allows us
to select a group of top or bottom values, not just one value.
During the session, the Power Center Server caches input data until it can perform The rank
calculations.

Rank Transformation Properties :


• Cache Directory where cache will be made.
• Top/Bottom Rank as per need
• Number of Ranks Ex: 1, 2 or any number
• Case Sensitive Comparison can be checked if needed
• Rank Data Cache Size can be set
• Rank Index Cache Size can be set
Ports in a Rank Transformation :

Ports Number Required


Description
I 1 Minimum Port to receive data from another transformation.
O 1 Minimum Port we want to pass to other transformation.
V not needed can use to store values or calculations to use in an expression.
R Only 1 Rank port. Rank is calculated according to it. The Rank port is an input/output port.
We must link the Rank port to another transformation. Example: Total Salary

Rank Index
The Designer automatically creates a RANKINDEX port for each Rank transformation. The
Power Center Server uses the Rank Index port to store the ranking position for Each row in a
group.
For example, if we create a Rank transformation that ranks the top five salaried employees,
the rank index numbers the employees from 1 to 5.
• The RANKINDEX is an output port only.
• We can pass the rank index to another transformation in the mapping or directly to a target.
• We cannot delete or edit it.

Defining Groups

Rank transformation allows us to group information. For example: If we want to select the
top 3 salaried employees of each Department, we can define a group for Department.
• By defining groups, we create one set of ranked rows for each group.
• We define a group in Ports tab. Click the Group By for needed port.
• We cannot Group By on port which is also Rank Port.
1) Example: Finding Top 5 Salaried Employees
• EMP will be source table.
• Create a target table EMP_RANK_EXAMPLE in target designer. Structure should be same
as EMP table. Just add one more port Rank_Index to store RANK INDEX.
• Create the shortcuts in your folder.

Creating Mapping:

1. Open folder where we want to create the mapping.


2. Click Tools -> Mapping Designer.
3. Click Mapping-> Create-> Give mapping name. Ex: m_rank_example
4. Drag EMP from source in mapping.
5. Create an EXPRESSION transformation to calculate TOTAL_SAL.
6. Click Transformation -> Create -> Select RANK from list. Give name and click Create.
Now click done.
7. Pass ports from Expression to Rank Transformation.
8. Edit Rank Transformation. Go to Ports Tab
9. Select TOTAL_SAL as rank port. Check R type in front of TOTAL_SAL.
10. Click Properties Tab and Select Properties as needed.
11. Top in Top/Bottom and Number of Ranks as 5.
12. Click Apply -> Ok.
13. Drag target table now.
14. Connect the output ports from Rank to target table.
15. Click Mapping -> Validate
16. Repository -> Save
• Create Session and Workflow as described earlier. Run the Workflow and see the data in
target table.
• Make sure to give connection information for all tables.

2) Example: Finding Top 2 Salaried Employees for every DEPARTMENT

• Open the mapping made above. Edit Rank Transformation.


• Go to Ports Tab. Select Group By for DEPTNO.
• Go to Properties tab. Set Number of Ranks as 2.
• Click Apply -> Ok.
• Mapping -> Validate and Repository Save.
Refresh the session by double clicking. Save the changed and run workflow to see the new
result.

RANK CACHE

Sample Rank Mapping


When the Power Center Server runs a session with a Rank transformation, it compares an
input row with rows in the data cache. If the input row out-ranks a Stored row, the Power
Center Server replaces the stored row with the input row.
Example: Power Center caches the first 5 rows if we are finding top 5 salaried Employees.
When 6th row is read, it compares it with 5 rows in cache and places it in Cache is needed.
1) RANK INDEX CACHE:
The index cache holds group information from the group by ports. If we are Using Group By
on DEPTNO, then this cache stores values 10, 20, 30 etc.
• All Group By Columns are in RANK INDEX CACHE. Ex. DEPTNO
2) RANK DATA CACHE:
It holds row data until the Power Center Server completes the ranking and is Generally larger
than the index cache. To reduce the data cache size, connect Only the necessary input/output
ports to subsequent transformations.
• All Variable ports if there, Rank Port, All ports going out from RANK Transformations are
stored in RANK DATA CACHE.
• Example: All ports except DEPTNO In our mapping example.
Source Qualifier Transformation

• Active and Connected Transformation.


• The Source Qualifier transformation represents the rows that the Power Center Server reads
when it runs a session.
• It is only transformation that is not reusable.
• Default transformation except in case of XML or COBOL files.

Tasks performed by Source Qualifier:

• Join data originating from the same source database:


We can join two or more tables with primary key-foreign key relationships by linking the
sources to one Source Qualifier transformation.
• Filter rows when the Power Center Server reads source data:
If we Include a filter condition, the Power Center Server adds a WHERE clause to the
Default query.
• Specify an outer join rather than the default inner join:
If we include a User-defined join, the Power Center Server replaces the join information
Specified by the metadata in the SQL query.
• Specify sorted ports: If we specify a number for sorted ports, the Power Center Server adds
an ORDER BY clause to the default SQL query.
• Select only distinct values from the source: If we choose Select Distinct,the Power Center
Server adds a SELECT DISTINCT statement to the default SQL query.
• Create a custom query to issue a special SELECT statement for the Power Center Server to
read source data: For example, you might use a Custom query to perform aggregate
calculations. The entire above are possible in Properties Tab of Source Qualifier t/f.
SAMPLE MAPPING TO BE MADE:
• Source will be EMP and DEPT tables.
• Create target table as showed in Picture above.
• Create shortcuts in your folder as needed.

Creating Mapping:

1. Open folder where we want to create the mapping.


2. Click Tools -> Mapping Designer.
3. Click Mapping-> Create-> Give mapping name. Ex: m_SQ_example
4. Drag EMP, DEPT, Target.
5. Right Click SQ_EMP and Select Delete from the mapping.
6. Right Click SQ_DEPT and Select Delete from the mapping.
7. Click Transformation -> Create -> Select Source Qualifier from List -> Give Name ->
Click Create
8. Select EMP and DEPT both. Click OK.
9. Link all as shown in above picture.
10. Edit SQ -> Properties Tab -> Open User defined Join -> Give Join condition
EMP.DEPTNO=DEPT.DEPTNO. Click Apply -> OK
11. Mapping -> Validate
12. Repository -> Save
• Create Session and Workflow as described earlier. Run the Workflow and see the data in
target table.
• Make sure to give connection information for all tables.

SQ PROPERTIES TAB

1) SOURCE FILTER:

We can enter a source filter to reduce the number of rows the Power Center Server queries.
Note: When we enter a source filter in the session properties, we override the customized
SQL query in the Source Qualifier transformation.
Steps:
1. In the Mapping Designer, open a Source Qualifier transformation.
2. Select the Properties tab.
3. Click the Open button in the Source Filter field.
4. In the SQL Editor Dialog box, enter the filter. Example: EMP.SAL)2000
5. Click OK.
Validate the mapping. Save it. Now refresh session and save the changes. Now run the
workflow and see output.

2) NUMBER OF SORTED PORTS:

When we use sorted ports, the Power Center Server adds the ports to the ORDER BY clause
in the default query.
By default it is 0. If we change it to 1, then the data will be sorted by column that is at the top
in SQ. Example: DEPTNO in above figure.
• If we want to sort as per ENAME, move ENAME to top.
• If we change it to 2, then data will be sorted by top two columns.
Steps:
1. In the Mapping Designer, open a Source Qualifier transformation.
2. Select the Properties tab.
3. Enter any number instead of zero for Number of Sorted ports.
4. Click Apply -> Click OK.
Validate the mapping. Save it. Now refresh session and save the changes. Now run the
workflow and see output.

3) SELECT DISTINCT:

If we want the Power Center Server to select unique values from a source, we can use the
Select Distinct option.
• Just check the option in Properties tab to enable it.

4) PRE and POST SQL Commands

• The Power Center Server runs pre-session SQL commands against the source database
before it reads the source.
• It runs post-session SQL commands against the source database after it writes to the target.
• Use a semi-colon (;) to separate multiple statements.

5) USER DEFINED JOINS

Entering a user-defined join is similar to entering a custom SQL query. However, we only
enter the contents of the WHERE clause, not the entire query.
• We can specify equi join, left outer join and right outer join only. We Cannot specify full
outer join. To use full outer join, we need to write SQL Query.
Steps:
1. Open the Source Qualifier transformation, and click the Properties tab.
2. Click the Open button in the User Defined Join field. The SQL Editor Dialog Box appears.
3. Enter the syntax for the join.
4. Click OK -> Again Ok.
Validate the mapping. Save it. Now refresh session and save the changes. Now run the
workflow and see output.
Join Type Syntax
Equi Join DEPT.DEPTNO=EMP.DEPTNO
Left Outer Join {EMP LEFT OUTER JOIN DEPT ON DEPT.DEPTNO=EMP.DEPTNO}
Right Outer Join {EMP RIGHT OUTER JOIN DEPT ON DEPT.DEPTNO=EMP.DEPTNO}

6) SQL QUERY

For relational sources, the Power Center Server generates a query for each Source Qualifier
transformation when it runs a session. The default query is a SELECT statement for each
source column used in the mapping. In other words, the Power Center Server reads only the
columns that are connected to another Transformation.
In mapping above, we are passing only SAL and DEPTNO from SQ_EMP to Aggregator
transformation. Default query generated will be:
• SELECT EMP.SAL, EMP.DEPTNO FROM EMP
Viewing the Default Query
1. Open the Source Qualifier transformation, and click the Properties tab.
2. Open SQL Query. The SQL Editor displays.
3. Click Generate SQL.
4. The SQL Editor displays the default query the Power Center Server uses to Select source
data.
5. Click Cancel to exit.
Note: If we do not cancel the SQL query, the Power Center Server overrides the default query
with the custom SQL query.
We can enter an SQL statement supported by our source database. Before entering the query,
connect all the input and output ports we want to use in the mapping.
Example: As in our case, we can’t use full outer join in user defined join,
we can write SQL query for FULL OUTER JOIN:
SELECT DEPT.DEPTNO, DEPT.DNAME, DEPT.LOC, EMP.EMPNO, EMP.ENAME,
EMP.JOB, EMP.SAL, EMP.COMM, EMP.DEPTNO FROM EMP FULL OUTER JOIN
DEPT ON DEPT.DEPTNO=EMP.DEPTNO WHERE SAL>2000
• We also added WHERE clause. We can enter more conditions and write More complex
SQL.
We can write any query. We can join as many tables in one query as Required if all are in
same database. It is very handy and used in most of the projects.

Important Points:

• When creating a custom SQL query, the SELECT statement must list the port names in the
order in which they appear in the transformation.
Example: DEPTNO is top column; DNAME is second in our SQ mapping.
So when we write SQL Query, SELECT statement have name DNAME first, DNAME
second and so on. SELECT DEPT.DEPTNO, DEPT.DNAME
• Once we have written a custom query like above, then this query will Always be used to
fetch data from database. In our example, we used WHERE SAL>2000. Now if we use
Source Filter and give condition SAL) 1000 or any other, then it will not work. Informatica
will always use the custom query only.
• Make sure to test the query in database first before using it in SQL Query. If query is not
running in database, then it won’t work in Informatica too.
• Also always connect to the database and validate the SQL in SQL query editor.
Aggrigator Transformation

• Connected and Active Transformation


• The Aggregator transformation allows us to perform aggregate calculations, such as
averages and sums.
• Aggregator transformation allows us to perform calculations on groups.
Components of the Aggregator Transformation
1. Aggregate expression
2. Group by port
3. Sorted Input
4. Aggregate cache

1) Aggregate Expressions

• Entered in an output port.


• Can include non-aggregate expressions and conditional clauses.
The transformation language includes the following aggregate functions:
• AVG, COUNT, MAX, MIN, SUM
• FIRST, LAST
• MEDIAN, PERCENTILE, STDDEV, VARIANCE

Single Level Aggregate Function: MAX(SAL)


Nested Aggregate Function: MAX( COUNT( ITEM ))

Nested Aggregate Functions

• In Aggregator transformation, there can be multiple single level functions or multiple nested
functions.
• An Aggregator transformation cannot have both types of functions together.
• MAX( COUNT( ITEM )) is correct.
• MIN(MAX( COUNT( ITEM ))) is not correct. It can also include one aggregate function
nested within another aggregate function

Conditional Clauses

We can use conditional clauses in the aggregate expression to reduce the number of rows
used in the aggregation. The conditional clause can be any clause that evaluates to TRUE or
FALSE.
• SUM( COMMISSION, COMMISSION > QUOTA )

Non-Aggregate Functions

We can also use non-aggregate functions in the aggregate expression.


• IIF( MAX( QUANTITY ) > 0, MAX( QUANTITY ), 0))

2) Group By Ports

• Indicates how to create groups.


• When grouping data, the Aggregator transformation outputs the last row of each group
unless otherwise specified.
The Aggregator transformation allows us to define groups for aggregations, rather than
performing the aggregation across all input data.
For example, we can find Maximum Salary for every Department.
• In Aggregator Transformation, Open Ports tab and select Group By as needed.

3) Using Sorted Input

• Use to improve session performance.


• To use sorted input, we must pass data to the Aggregator transformation sorted by group by
port, in ascending or descending order.
• When we use this option, we tell Aggregator that data coming to it is already sorted.
• We check the Sorted Input Option in Properties Tab of the transformation.
• If the option is checked but we are not passing sorted data to the transformation, then the
session fails.
4) Aggregator Caches

• The Power Center Server stores data in the aggregate cache until it completes Aggregate
calculations.
• It stores group values in an index cache and row data in the data cache. If the Power Center
Server requires more space, it stores overflow values in cache files.
Note: The Power Center Server uses memory to process an Aggregator transformation with
sorted ports. It does not use cache memory. We do not need to configure cache memory for
Aggregator transformations that use sorted ports.

1) Aggregator Index Cache:

The index cache holds group information from the group by ports. If we are using Group By
on DEPTNO, then this cache stores values 10, 20, 30 etc.
• All Group By Columns are in AGGREGATOR INDEX CACHE. Ex. DEPTNO

2) Aggregator Data Cache:

DATA CACHE is generally larger than the AGGREGATOR INDEX CACHE.


Columns in Data Cache:
• Variable ports if any
• Non group by input/output ports.
• Non group by input ports used in non-aggregate output expression.
• Port containing aggregate function

1) Example: To calculate MAX, MIN, AVG and SUM of salary of EMP table.
• EMP will be source table.
• Create a target table EMP_AGG_EXAMPLE in target designer. Table should contain
DEPTNO, MAX_SAL, MIN_SAL, AVG_SAL and SUM_SAL
• Create the shortcuts in your folder.

Creating Mapping:

1. Open folder where we want to create the mapping.


2. Click Tools -> Mapping Designer.
3. Click Mapping-> Create-> Give mapping name. Ex: m_agg_example
4. Drag EMP from source in mapping.
5. Click Transformation -> Create -> Select AGGREGATOR from list. Give name and click
Create. Now click done.
6. Pass SAL and DEPTNO only from SQ_EMP to AGGREGATOR Transformation.
7. Edit AGGREGATOR Transformation. Go to Ports Tab
8. Create 4 output ports: OUT_MAX_SAL, OUT_MIN_SAL, OUT_AVG_SAL,
OUT_SUM_SAL
9. Open Expression Editor one by one for all output ports and give the
calculations. Ex: MAX(SAL), MIN(SAL), AVG(SAL),SUM(SAL)
10. Click Apply -> Ok.
11. Drag target table now.
12. Connect the output ports from Rank to target table.
13. Click Mapping -> Validate
14. Repository -> Save
• Create Session and Workflow as described earlier. Run the Workflow and see the data in
target table.
• Make sure to give connection information for all tables.
Joiner Transformation
• Connected and Active Transformation
• Used to join source data from two related heterogeneous sources residing in Different
locations or file systems. Or, we can join data from the same source.
• If we need to join 3 tables, then we need 2 Joiner Transformations.
• The Joiner transformation joins two sources with at least one matching port. The Joiner
transformation uses a condition that matches one or more pairs of Ports between the two
sources.

Example: To join EMP and DEPT tables.

• EMP and DEPT will be source table.


• Create a target table JOINER_EXAMPLE in target designer. Table should Contain all ports
of EMP table plus DNAME and LOC as shown below.
• Create the shortcuts in your folder.

Creating Mapping:

1. Open folder where we want to create the mapping.


2. Click Tools -> Mapping Designer.
3. Click Mapping-> Create-> Give mapping name. Ex: m_joiner_example
4. Drag EMP, DEPT, and Target. Create Joiner Transformation. Link as shown below.

5. Specify the join condition in Condition tab. See steps on next page.
6. Set Master in Ports tab. See steps on next page.
7. Mapping -> Validate
8. Repository -> Save.
• Create Session and Workflow as described earlier. Run the Work flow and see the data in
target table.
• Make sure to give connection information for all tables.

JOIN CONDITION:
The join condition contains ports from both input sources that must match for the Power
Center Server to join two rows.
Example: DEPTNO=DEPTNO1 in above.
1. Edit Joiner Transformation -> Condition Tab
2. Add condition
• We can add as many conditions as needed.
• Only = operator is allowed.
If we join Char and Varchar data types, the Power Center Server counts any spaces that pad
Char values as part of the string. So if you try to join the following:
Char (40) = “abcd” and Varchar (40) = “abcd”
Then the Char value is “abcd” padded with 36 blank spaces, and the Power Center Server
does not join the two fields because the Char field contains trailing spaces.
Note: The Joiner transformation does not match null values.
MASTER and DETAIL TABLES
In Joiner, one table is called as MASTER and other as DETAIL.
• MASTER table is always cached. We can make any table as MASTER.
• Edit Joiner Transformation -> Ports Tab -> Select M for Master table.
Table with less number of rows should be made MASTER to improve Performance.
Reason:
• When the Power Center Server processes a Joiner transformation, it reads rows from both
sources concurrently and builds the index and data cache based on the master rows. So table
with fewer rows will be read fast and cache can be made as table with more rows is still being
read.
• The fewer unique rows in the master, the fewer iterations of the join comparison occur,
which speeds the join process.

JOINER TRANSFORMATION PROPERTIES TAB


• Case-Sensitive String Comparison: If selected, the Power Center Server uses case-sensitive
string comparisons when performing joins on string columns.
• Cache Directory: Specifies the directory used to cache master or detail rows and the index
to these rows.
• Join Type: Specifies the type of join: Normal, Master Outer, Detail Outer, or Full Outer.
Tracing Level
Joiner Data Cache Size
Joiner Index Cache Size
Sorted Input

JOIN TYPES

In SQL, a join is a relational operator that combines data from multiple tables into a single
result set. The Joiner transformation acts in much the same manner, except that tables can
originate from different databases or flat files.

Types of Joins:

• Normal
• Master Outer
• Detail Outer
• Full Outer
Note: A normal or master outer join performs faster than a full outer or detail outer join.
Example: In EMP, we have employees with DEPTNO 10, 20, 30 and 50. In DEPT, we have
DEPTNO 10, 20, 30 and 40. DEPT will be MASTER table as it has less rows.
Normal Join:
With a normal join, the Power Center Server discards all rows of data from the master and
detail source that do not match, based on the condition.
• All employees of 10, 20 and 30 will be there as only they are matching.

Master Outer Join:

This join keeps all rows of data from the detail source and the matching rows from the master
source. It discards the unmatched rows from the master source.
• All data of employees of 10, 20 and 30 will be there.
• There will be employees of DEPTNO 50 and corresponding DNAME and LOC Columns
will be NULL.

Detail Outer Join:

This join keeps all rows of data from the master source and the matching rows from the detail
source. It discards the unmatched rows from the detail source.
• All employees of 10, 20 and 30 will be there.
• There will be one record for DEPTNO 40 and corresponding data of EMP columns will be
NULL.

Full Outer Join:

A full outer join keeps all rows of data from both the master and detail sources.
• All data of employees of 10, 20 and 30 will be there.
• There will be employees of DEPTNO 50 and corresponding DNAME and LOC Columns
will be NULL.
• There will be one record for DEPTNO 40 and corresponding data of EMP Columns will be
NULL.

USING SORTED INPUT

• Use to improve session performance.


• to use sorted input, we must pass data to the Joiner transformation sorted by the ports that
are used in Join Condition.
• We check the Sorted Input Option in Properties Tab of the transformation.
• If the option is checked but we are not passing sorted data to the Transformation, then the
session fails.
• We can use SORTER to sort data or Source Qualifier in case of Relational tables.

JOINER CACHES

Joiner always caches the MASTER table. We cannot disable caching. It builds Index cache
and Data Cache based on MASTER table.
1) Joiner Index Cache:
• All Columns of MASTER table used in Join condition are in JOINER INDEX CACHE.
• Example: DEPTNO in our mapping.
2) Joiner Data Cache:
• Master column not in join condition and used for output to other transformation or target
table are in Data Cache.
• Example: DNAME and LOC in our mapping example.

Performance Tuning:

• Perform joins in a database when possible.


• Join sorted data when possible.
• For a sorted Joiner transformation, designate as the master source the source with fewer
duplicate key values.
• Joiner can't be used in following conditions:
1. Either input pipeline contains an Update Strategy transformation.
2. We connect a Sequence Generator transformation directly before the Joiner transformation.

Sequence generator Transformation

Passive and Connected Transformation.


The Sequence Generator transformation generates numeric values.
Use the Sequence Generator to create unique primary key values, replace missing primary
keys, or cycle through a sequential range of numbers.
We use it to generate Surrogate Key in DWH environment mostly. When we want to
Maintain history, then we need a key other than Primary Key to uniquely identify the record.
So we create a Sequence 1,2,3,4 and so on. We use this sequence as the key.

Example: If EMPNO is the key, we can keep only one record in target and can’t maintain
history. So we use Surrogate key as Primary key and not EMPNO.

Sequence Generator Ports :

The Sequence Generator transformation provides two output ports:


NEXTVAL and CURRVAL.

We cannot edit or delete these ports.


Likewise, we cannot add ports to the transformation.
NEXTVAL:

Use the NEXTVAL port to generate sequence numbers by connecting it to a Transformation


or target.

For example, we might connect NEXTVAL to two target tables in a mapping to generate
unique primary key values.

Sequence in Table 1 will be generated first. When table 1 has been loaded, only then
Sequence for table 2 will be generated.

CURRVAL:
CURRVAL is NEXTVAL plus the Increment By value.

We typically only connect the CURRVAL port when the NEXTVAL port is Already
connected to a downstream transformation.
If we connect the CURRVAL port without connecting the NEXTVAL port, the Integration
Service passes a constant value for each row.
when we connect the CURRVAL port in a Sequence Generator Transformation, the
Integration Service processes one row in each block.
We can optimize performance by connecting only the NEXTVAL port in a Mapping.

Example: To use Sequence Generator transformation

EMP will be source.


Create a target EMP_SEQ_GEN_EXAMPLE in shared folder. Structure same as EMP. Add
two more ports NEXT_VALUE and CURR_VALUE to the target table.
Create shortcuts as needed.
Creating Mapping:

1. Open folder where we want to create the mapping.

2. Click Tools -> Mapping Designer.

3. Click Mapping-> Create-> Give name. Ex: m_seq_gen_example

4. Drag EMP and Target table.


5. Connect all ports from SQ_EMP to target table.

6. Transformation -> Create -> Select Sequence Generator for list -> Create -> Done

7. Connect NEXT_VAL and CURR_VAL from Sequence Generator to target.

8. Validate Mapping

9. Repository -> Save

Create Session and then workflow.


Give connection information for all tables.
Run workflow and see the result in table.
Sequence Generator Properties:

Setting Required/Optional
Description

Start Value
Required

Start value of the generated sequence that we want IS to use if we use Cycle option.
Default is 0.

Increment By
Required

Difference between two consecutive values from the NEXTVAL port.

End Value

Optional
Maximum value the Integration Service generates.
Current Value Optional
First value in the sequence.If cycle option used, the value must be greater than or equal
to the start value and less the end value.

Cycle Optional
If selected, the Integration Service cycles through the sequence range. Ex: Start Value:1
End Value 10 Sequence will be from 1-10 and again start from 1.

Reset Optional
By default, last value of sequence during session is saved to repository. Next time the
sequence is started from the valued saved.

If selected, the Integration Service generates values based on the original current value
for each session.

Points to Ponder:

1) If Current value is 1 and end value 10, no cycle option. There are 17 records in
source. A) In this case session will fail.
2) If we connect just CURR_VAL only,
a) the value will be same for all records.
3) If Current value is 1 and end value 10, cycle option there. Start value is 0.
There are 17 records in source.
b) Sequence: 1 2 – 10. 0 1 2 3 –
To make above sequence as 1-10 1-20, give Start Value as 1. Start value is used along
with Cycle option only.
If Current value is 1 and end value 10, cycle option there. Start value is 1.
There are 17 records in source. Session runs. 1-10 1-7. 7 will be saved in repository.
If we run session again, sequence will start from 8.
Use reset option if you want to start sequence from CURR_VAL every time.
Define the Properties available in Sequence Generator transformation in brief.

Ans.

Sequence
Generator Description
Properties

Start value of the generated sequence that we want the Integration Service
to use if we use the Cycle option. If we select Cycle, the Integration
Start Value
Service cycles back to this value when it reaches the end value.
Default is 0.

Difference between two consecutive values from the NEXTVAL port.


Increment By
Default is 1.

Maximum value generated by SeqGen. After reaching this value the


session will fail if the sequence generator is not configured to cycle.
End Value

Default is 2147483647.

Current value of the sequence. Enter the value we want the Integration
Current Value Service to use as the first value in the sequence.
Default is 1.

If selected, when the Integration Service reaches the configured end value
Cycle for the sequence, it wraps around and starts the cycle again, beginning
with the configured Start Value.

Number of sequential values the Integration Service caches at a time.


Number of
Default value for a standard Sequence Generator is 0.
Cached Values
Default value for a reusable Sequence Generator is 1,000.

Restarts the sequence at the current value each time a session runs.
Reset
This option is disabled for reusable Sequence Generator transformations.
SQL Transformation
You can pass the database connection information to the SQL transformation as input data at
run time. The transformation processes external SQL scripts or SQL queries that you create in
an SQL editor. The SQL transformation processes the query and returns rows and database
errors.

When you create an SQL transformation, you configure the following options:

Mode:-
The SQL transformation runs in one of the following modes:
• Script mode. The SQL transformation runs ANSI SQL scripts that are externally located.
You pass a script name to the transformation with each input row. The SQL transformation
outputs one row for each input row.

• Query mode.
The SQL transformation executes a query that you define in a query editor. You can pass
strings or parameters to the query to define dynamic queries or change the selection
parameters. You can output multiple rows when the query has a SELECT statement.
• Passive or active transformation. The SQL transformation is an active transformation by
default. You can configure it as a passive transformation when you create the transformation.

• Database type. The type of database the SQL transformation connects to.

• Connection type. Pass database connection information to the SQL transformation or use a
connection object.

Script Mode

An SQL transformation running in script mode runs SQL scripts from text files. You pass
each script file name from the source to the SQL transformation Script Name port. The script
file name contains the complete path to the script file.
When you configure the transformation to run in script mode, you create a passive
transformation. The transformation returns one row for each input row. The output row
contains results of the query and any database error.

Rules and Guidelines for Script Mode

Use the following rules and guidelines for an SQL transformation that runs in script mode:
• You can use a static or dynamic database connection with script mode.
• To include multiple query statements in a script, you can separate them with a semicolon.
• You can use mapping variables or parameters in the script file name.
• The script code page defaults to the locale of the operating system. You can change the
locale of the script.
• The script file must be accessible by the Integration Service. The Integration Service must
have read permissions on the directory that contains the script.
• The Integration Service ignores the output of any SELECT statement you include in the
SQL script. The SQL transformation in script mode does not output more than one row of
data for each input row.
• You cannot use scripting languages such as Oracle PL/SQL or Microsoft/Sybase T-SQL in
the script.
• You cannot use nested scripts where the SQL script calls another SQL script.
• A script cannot accept run-time arguments.

Query Mode

• When you configure the SQL transformation to run in query mode, you create an active
transformation.
• When an SQL transformation runs in query mode, it executes an SQL query that you define
in the transformation.
• You pass strings or parameters to the query from the transformation input ports to change
the query statement or the query data.
You can create the following types of SQL queries in the SQL transformation:

• Static SQL query. The query statement does not change, but you can use query parameters
to change the data. The Integration Service prepares the query once and runs the query for all
input rows.

• Dynamic SQL query. You can change the query statements and the data. The Integration
Service prepares a query for each input row.

Rules and Guidelines for Query Mode

Use the following rules and guidelines when you configure the SQL transformation to run in
query mode:
• The number and the order of the output ports must match the number and order of the fields
in the query SELECT clause.
• The native data type of an output port in the transformation must match the data type of the
corresponding column in the database. The Integration Service generates a row error when
the data types do not match.
• When the SQL query contains an INSERT, UPDATE, or DELETE clause, the
transformation returns data to the SQL Error port, the pass-through ports, and the Num Rows
Affected port when it is enabled. If you add output ports the ports receive NULL data values.
• When the SQL query contains a SELECT statement and the transformation has a pass-
through port, the transformation returns data to the pass-through port whether or not the
query returns database data. The SQL transformation returns a row with NULL data in the
output ports.
• You cannot add the "_output" suffix to output port names that you create.
• You cannot use the pass-through port to return data from a SELECT query.
• When the number of output ports is more than the number of columns in the SELECT
clause, the extra ports receive a NULL value.
• When the number of output ports is less than the number of columns in the SELECT clause,
the Integration Service generates a row error.
• You can use string substitution instead of parameter binding in a query. However, the input
ports must be string data types.

SQL Transformation Properties

After you create the SQL transformation, you can define ports and set attributes in the
following transformation tabs:
• Ports. Displays the transformation ports and attributes that you create on the SQL Ports tab.
• Properties. SQL transformation general properties.
• SQL Settings. Attributes unique to the SQL transformation.
• SQL Ports. SQL transformation ports and attributes.
Note: You cannot update the columns on the Ports tab. When you define ports on the SQL
Ports tab, they display on the Ports tab.

Properties Tab

Configure the SQL transformation general properties on the Properties tab. Some
transformation properties do not apply to the SQL transformation or are not configurable.

Create Mapping :

Step 1: Creating a flat file and importing the source from the flat file.
• Create a Notepad and in it create a table by name bikes with three columns and three
records in it.
• Create one more notepad and name it as path for the bikes. Inside the Notepad just type in
(C:\bikes.txt) and save it.
• Import the source (second notepad) using the source->import from the file. After which we
are goanna get a wizard with three subsequent windows and follow the on screen instructions
to complete the process of importing the source.

Step 2: Importing the target and applying the transformation.


In the same way as specified above go to the targets->import from file and select an empty
notepad under the name targetforbikes (this is one more blank notepad which we should
create and save under the above specified name in the C :\).
• Create two columns in the target table under the name report and error.
• We are all set here. Now apply the SQL transformation.
• In the first window when you apply the SQL transformation we should select the script
mode.
• Connect the SQ to the ScriptName under inputs and connect the other two fields to the
output correspondingly.
Snapshot for the above discussed things is given below.

Step 3: Design the work flow and run it.


• Create the task and the work flow using the naming conventions.
• Go to the mappings tab and click on the Source on the left hand pane to specify the path for
the output file.

Step 4: Preview the output data on the target table.

Stored Procedure Transformation


• Passive Transformation
• Connected and Unconnected Transformation
• Stored procedures are stored and run within the database.
A Stored Procedure transformation is an important tool for populating and Maintaining
databases. Database administrators create stored procedures to Automate tasks that are too
complicated for standard SQL statements.

Use of Stored Procedure in mapping:

• Check the status of a target database before loading data into it.
• Determine if enough space exists in a database.
• Perform a specialized calculation.
• Drop and recreate indexes. Mostly used for this in projects.
Data Passes Between IS and Stored Procedure One of the most useful features of stored
procedures is the ability to send data to the stored procedure, and receive data from the stored
procedure. There are three types of data that pass between the Integration Service and the
stored procedure:

Input/output parameters: Parameters we give as input and the parameters returned from
Stored Procedure.

Return values: Value returned by Stored Procedure if any.

Status codes: Status codes provide error handling for the IS during a workflow. The stored
procedure issues a status code that notifies whether or not the stored procedure completed
successfully. We cannot see this value. The IS uses it to determine whether to continue
running the session or stop. Specifying when the Stored Procedure Runs

Normal: The stored procedure runs where the transformation exists in the mapping on a row-
by-row basis. We pass some input to procedure and it returns some calculated values.
Connected stored procedures run only in normal mode.

Pre-load of the Source: Before the session retrieves data from the source, the stored
procedure runs. This is useful for verifying the existence of tables or performing joins of data
in a temporary table.
Post-load of the Source: After the session retrieves data from the source, the stored
procedure runs. This is useful for removing temporary tables.
Pre-load of the Target: Before the session sends data to the target, the stored procedure runs.
This is useful for dropping indexes or disabling constraints.
Post-load of the Target: After the session sends data to the target, the stored procedure runs.
This is useful for re-creating indexes on the database.

Using a Stored Procedure in a Mapping :

1. Create the stored procedure in the database.


2. Import or create the Stored Procedure transformation.
3. Determine whether to use the transformation as connected or unconnected.
4. If connected, map the appropriate input and output ports.
5. If unconnected, either configure the stored procedure to run pre- or post-session, or
configure it to run from an expression in another transformation.
6. Configure the session.

Stored Procedures:

Connect to Source database and create the stored procedures given below:

CREATE OR REPLACE procedure sp_agg


(
in_deptno in number,
max_sal out number,
min_sal out number,
avg_sal out number,
sum_sal out number)
As
Begin
select max(Sal),min(sal),avg(sal),sum(sal)
into
max_sal,min_sal,avg_sal,sum_sal
from
emp
where deptno=in_deptno group by deptno;
End;
/
CREATE OR REPLACE procedure sp_unconn_1_value(in_deptno in number, max_sal out
number)
As
Begin
Select max(Sal) into max_sal from EMP where deptno=in_deptno;
End;
/
1. Connected Stored Procedure T/F

Example: To give input as DEPTNO from DEPT table and find the MAX, MIN, AVG and
SUM of SAL from EMP table.
• DEPT will be source table. Create a target table SP_CONN_EXAMPLE with fields
DEPTNO, MAX_SAL, MIN_SAL, AVG_SAL & SUM_SAL.
• Write Stored Procedure in Database first and Create shortcuts as needed.

Creating Mapping:

1. Open folder where we want to create the mapping.


2. Click Tools -> Mapping Designer.
3. Click Mapping-> Create-> Give name. Ex: m_SP_CONN_EXAMPLE
4. Drag DEPT and Target table.
5. Transformation -> Import Stored Procedure -> Give Database Connection -> Connect ->
Select the procedure sp_agg from the list.
6. Drag DEPTNO from SQ_DEPT to the stored procedure input port and also to DEPTNO
port of target.
7. Connect the ports from procedure to target as shown below:
8. Mapping -> Validate
9. Repository -> Save
• Create Session and then workflow.
• Give connection information for all tables.
• Give connection information for Stored Procedure also.
• Run workflow and see the result in table.

2. Unconnected Stored Procedure T/F :


An unconnected Stored Procedure transformation is not directly connected to the flow of data
through the mapping. Instead, the stored procedure runs either:
• From an expression: Called from an expression transformation.
• Pre- or post-session: Runs before or after a session.
Method of returning the value of output parameters to a port:
• Assign the output value to a local variable.
• Assign the output value to the system variable PROC_RESULT. (See Later)
Example 1: DEPTNO as input and get MAX of Sal as output.
• DEPT will be source table.
• Create a target table with fields DEPTNO and MAX_SAL of decimal data type.
• Write Stored Procedure in Database first and Create shortcuts as needed.

Creating Mapping:

1. Open folder where we want to create the mapping.


2. Click Tools -> Mapping Designer.
3. Click Mapping-> Create-> Give name. Ex: m_sp_unconn_1_value
4. Drag DEPT and Target table.
5. Transformation -> Import Stored Procedure -> Give Database Connection -> Connect ->
Select the procedure sp_unconn_1_value from the list. Click OK.
6. Stored Procedure has been imported.
7. T/F -> Create Expression T/F. Pass DEPTNO from SQ_DEPT to Expression T/F.
8. Edit expression and create an output port OUT_MAX_SAL of decimal data type.
9. Open Expression editor and call the stored procedure as below:Click OK and connect the
port from expression to target as in mapping below:

10. Mapping -> Validate


11. Repository Save.
• Create Session and then workflow.
• Give connection information for all tables.
• Give connection information for Stored Procedure also.
• Run workflow and see the result in table.

PROC_RESULT use:

• If the stored procedure returns a single output parameter or a return value, we the reserved
variable PROC_RESULT as the output variable.
Example: DEPTNO as Input and MAX Sal as output :
:SP.SP_UNCONN_1_VALUE(DEPTNO,PROC_RESULT)

• If the stored procedure returns multiple output parameters, you must create variables for
each output parameter.
Example: DEPTNO as Input and MAX_SAL, MIN_SAL, AVG_SAL and SUM_SAL
as output then:
1. Create four variable ports in expression VAR_MAX_SAL,VAR_MIN_SAL,
VAR_AVG_SAL and iVAR_SUM_SAL.
2. Create four output ports in expression OUT_MAX_SAL, OUT_MIN_SAL,
OUT_AVG_SAL and OUT_SUM_SAL.
3. Call the procedure in last variable port says VAR_SUM_SAL.
:SP.SP_AGG (DEPTNO, VAR_MAX_SAL,VAR_MIN_SAL, VAR_AVG_SAL,
PROC_RESULT)

Example 2:

DEPTNO as Input and MAX_SAL, MIN_SAL, AVG_SAL and SUM_SAL as O/P Stored
Procedure to drop index in Pre Load of Target Stored Procedure to create index in Post Load
of Target
• DEPT will be source table. Create a target table SP_UNCONN_EXAMPLE with fields
DEPTNO, MAX_SAL, MIN_SAL, AVG_SAL & SUM_SAL.
• Write Stored Procedure in Database first and Create shortcuts as needed. Stored procedures
are given below to drop and create index on target.Make sure to create target table first.
Stored Procedures to be created in next example in Target Database:
Create or replace procedure CREATE_INDEX
As
Begin
Execute immediate 'create index unconn_dept on SP_UNCONN_EXAMPLE(DEPTNO)';
End;
/

Create or replace procedure DROP_INDEX


As
Begin
Execute immediate 'drop index unconn_dept';
End;
/
Creating Mapping:

1. Open folder where we want to create the mapping.


2. Click Tools -> Mapping Designer.
3. Click Mapping-> Create-> Give name. Ex: m_sp_unconn_1_value
4. Drag DEPT and Target table.
5. Transformation -> Import Stored Procedure -> Give Database Connection -> Connect ->
Select the procedure sp_agg from the list. Click OK.
6. Stored Procedure has been imported.
7. T/F -> Create Expression T/F. Pass DEPTNO from SQ_DEPT to Expression T/F.
8. Edit Expression and create 4 variable ports and 4 output ports as shown below:

9. Call the procedure in last variable port VAR_SUM_SAL.


10. :SP.SP_AGG (DEPTNO, VAR_MAX_SAL, VAR_MIN_SAL, VAR_AVG_SAL,
PROC_RESULT)
11. Click Apply and Ok.
12. Connect to target table as needed.
13. Transformation -> Import Stored Procedure -> Give Database Connection for target ->
Connect -> Select the procedure CREATE_INDEX and DROP_INDEX from the list. Click
OK.
14. Edit DROP_INDEX -> Properties Tab -> Select Target Pre Load as Stored Procedure
Type and in call text write drop_index. Click Apply -> Ok.
15. Edit CREATE_INDEX -> Properties Tab -> Select Target Post Load as Stored Procedure
Type and in call text write create_index. Click Apply -> Ok.
16. Mapping -> Validate
17. Repository -> Save
• Create Session and then workflow.
• Give connection information for all tables.
• Give connection information for Stored Procedures also.
• Also make sure that you execute the procedure CREATE_INDEX on database before using
them in mapping. This is because, if there is no INDEX on target table, DROP_INDEX will
fail and Session will also fail.
• Run workflow and see the result in table.

Transaction Control Transformation


Power Center lets you control commit and roll back transactions based on a set of rows that
pass through a Transaction Control transformation. A transaction is the set of rows bound by
commit or roll back rows. You can define a transaction based on a varying number of input
rows. You might want to define transactions based on a group of rows ordered on a common
key, such as employee ID or order entry date.
In Power Center, you define transaction control at the following levels:

• Within a mapping. Within a mapping, you use the Transaction Control transformation to
define a transaction. You define transactions using an expression in a Transaction Control
transformation. Based on the return value of the expression, you can choose to commit, roll
back, or continue without any transaction changes.

• Within a session. When you configure a session, you configure it for user-defined commit.
You can choose to commit or roll back a transaction if the Integration Service fails to
transform or write any row to the target.
When you run the session, the Integration Service evaluates the expression for each row that
enters the transformation. When it evaluates a commit row, it commits all rows in the
transaction to the target or targets. When the Integration Service evaluates a roll back row, it
rolls back all rows in the transaction from the target or targets. If the mapping has a flat file
target you can generate an output file each time the Integration Service starts a new
transaction. You can dynamically name each target flat file.

Properties Tab
On the Properties tab, you can configure the following properties:
• Transaction control expression
• Tracing level
Enter the transaction control expression in the Transaction Control Condition field. The
transaction control expression uses the IIF function to test each row against the condition.
Use the following syntax for the expression:

IIF (condition, value1, value2)

The expression contains values that represent actions the Integration Service performs based
on the return value of the condition. The Integration Service evaluates the condition on a row-
by-row basis. The return value determines whether the Integration Service commits, rolls
back, or makes no transaction changes to the row.
When the Integration Service issues a commit or roll back based on the return value of the
expression, it begins a new transaction. Use the following built-in variables in the Expression
Editor when you create a transaction control expression:

• TC_CONTINUE_TRANSACTION. The Integration Service does not perform any


transaction change for this row. This is the default value of the expression.
• TC_COMMIT_BEFORE. The Integration Service commits the transaction, begins a new
transaction, and writes the current row to the target. The current row is in the new transaction.
• TC_COMMIT_AFTER. The Integration Service writes the current row to the target,
commits the transaction, and begins a new transaction. The current row is in the committed
transaction.
• TC_ROLLBACK_BEFORE. The Integration Service rolls back the current transaction,
begins a new transaction, and writes the current row to the target. The current row is in the
new transaction.
• TC_ROLLBACK_AFTER. The Integration Service writes the current row to the target,
rolls back the transaction, and begins a new transaction. The current row is in the rolled back
transaction.
If the transaction control expression evaluates to a value other than commit, roll back, or
continue, the Integration Service fails the session.
Mapping Guidelines and Validation

Use the following rules and guidelines when you create a mapping with a Transaction Control
transformation:
• If the mapping includes an XML target, and you choose to append or create a new
document on commit, the input groups must receive data from the same transaction control
point.
• Transaction Control transformations connected to any target other than relational, XML, or
dynamic MQSeries targets are ineffective for those targets.
• You must connect each target instance to a Transaction Control transformation.
• You can connect multiple targets to a single Transaction Control transformation.
• You can connect only one effective Transaction Control transformation to a target.
• You cannot place a Transaction Control transformation in a pipeline branch that starts with a
Sequence Generator transformation.
• If you use a dynamic Lookup transformation and a Transaction Control transformation in
the same mapping, a rolled-back transaction might result in unsynchronized target data.
• A Transaction Control transformation may be effective for one target and ineffective for
another target. If each target is connected to an effective Transaction Control transformation,
the mapping is valid.
• Either all targets or none of the targets in the mapping should be connected to an effective
Transaction Control transformation.

Example to Transaction Control:

Step 1: Design the mapping.

Step 2: Creating a Transaction Control Transformation.


• In the Mapping Designer, click Transformation > Create. Select the Transaction Control
transformation.
• Enter a name for the transformation.[ The naming convention for Transaction Control
transformations is TC_TransformationName].
• Enter a description for the transformation.
• Click Create.
• Click Done.
• Drag the ports into the transformation.
• Open the Edit Transformations dialog box, and select the Ports tab.
Select the Properties tab. Enter the transaction control expression that defines the commit and
roll back behavior.

Go to the Properties tab and click on the down arrow to get in to the expression editor
window. Later go to the Variables tab and Type IIF(EMpno=7654,) select the below things
from the built in functions.

IIF (EMPNO=7654,TC_COMMIT_BEFORE,TC_CONTINUE_TRANSACTION)

• Connect all the columns from the transformation to the target table and save the mapping.
• Select the Metadata Extensions tab. Create or edit metadata extensions for the Transaction
Control transformation.
• Click OK.
Step 3: Create the task and the work flow.
Step 4: Preview the output in the target table.
Lookup Transformaiton
A Lookup is a Passive, Connected or Unconnected Transformation used to look up data in a
relational table, view, synonym or flat file. The integration service queries the lookup table to
retrieve a value based on the input source value and the lookup condition.

All about Informatica LookUp Transformation

A connected lookup recieves source data, performs a lookup and returns data to the pipeline;
While an unconnected lookup is not connected to source or target and is called by a
transformation in the pipeline by :LKP expression which in turn returns only one column
value to the calling transformation.

Lookup can be Cached or Uncached. If we cache the lookup then again we can further go for
static or dynamic or persistent cache,named cache or unnamed cache .
By default lookup transformations are cached and static.
Lookup Ports Tab
The Ports tab of Lookup Transformation contains

 Input Ports:
Create an input port for each lookup port we want to use in the lookup condition. We must
have at least one input or input/output port in a lookup transformation.

 Output Ports:
Create an output port for each lookup port we want to link to another transformation. For
connected lookups, we must have at least one output port. For unconnected lookups, we must
select a lookup port as a return port (R) to pass a return value.
 Lookup Port:

The Designer designates each column of the lookup source as a lookup port.

 Return Port:

An unconnected Lookup transformation has one return port that returns one column of data to
the calling transformation through this port.
Notes:
We can delete lookup ports from a relational lookup if the mapping does not use the lookup
ports which will give us performance gain. But if the lookup source is a flat file then deleting
of lookup ports fails the session.

Lookup Properties Tab

Now let us have a look on the Properties Tab of the Lookup Transformation

 Lookup Sql Override:


Override the default SQL statement to add a WHERE clause or to join multiple tables.

 Lookup table name:


The base table on which the lookup is performed.

 Lookup Source Filter:


We can apply filter conditions on the lookup table so as to reduce the number of records. For
example, we may want to select the active records of the lookup table hence we may use the
condition CUSTOMER_DIM.ACTIVE_FLAG = "Y".

 Lookup caching enabled:


If option is checked it caches the lookup table during the session run. Otherwise it goes for
uncached relational database hit. Remember to implement database index on the columns
used in the lookup condition to provide better performance when the lookup in Uncached.
 Lookup policy on multiple match:
While lookup if the integration service finds multiple match we can configure the lookup to
return the First Value, Last Value, Any Value or to Report Error.

 Lookup condition:
The condition to lookup values from the lookup table based on source input data. For
example, IN_EmpNo=EmpNo.

 Connection Information:

Query the lookup table from the source or target connection. In case of flat file lookup we can
give the file path and name, whether direct or indirect.

 Source Type:

Determines whether the source is relational database table,flat file or source qualifier
pipeline.

 Tracing Level:

It provides the amount of detail in the session log for the transformation. Options available
are Normal, Terse, Vebose Initialization, Verbose Data.
 Lookup cache directory name:
Determines the directory name where the lookup cache files will reside.

 Lookup cache persistent:


Indicates whether we are going for persistent cache or non-persistent cache.

 Dynamic Lookup Cache:


When checked We are going for Dyanamic lookup cache else static lookup cache is used.

 Output Old Value On Update:


Defines whether the old value for output ports will be used to update an existing row in
dynamic cache.

 Cache File Name Prefix:

Lookup will used this named persistent cache file based on the base lookup table.

 Re-cache from lookup source:

When checked, integration service rebuilds lookup cache from lookup source when the
lookup instance is called in the session.
 Insert Else Update:
Insert the record if not found in cache, else update it. Option is available when using dynamic
lookup cache.

 Update Else Insert:


Update the record if found in cache, else insert it. Option is available when using dynamic
lookup cache.

 Datetime Format:
Used when source type is file to determine the date and time format of lookup columns.

 Thousand Separator:
By default it is None, used when source type is file to determine the thousand separator.

 Decimal Separator:
By default it is "." else we can use "," and used when source type is file to determine the
thousand separator.

 Case Sensitive String Comparison:


To be checked when we want to go for Case sensitive String values in lookup comparison.
Used when source type is file.
 Null ordering:
Determines whether NULL is the highest or lowest value. Used when source type is file.

 Sorted Input:
Checked whenever we expect the input data to be sorted and is used when the source type is
flat file.

 Lookup source is static:

When checked it assumes that the lookup source is not going to change during the session
run.

 Pre-build lookup cache:


Default option is Auto. If we want the integration service to start building the cache whenever
the session just begins we can chose the option Always allowed.

Union Transformation
• Active and Connected transformation.
Union transformation is a multiple input group transformation that you can use to merge data
from multiple pipelines or pipeline branches into one pipeline branch. It merges data from
multiple sources similar to the UNION ALL SQL statement to Combine the results from two
or more SQL statements.

Union Transformation Rules and Guidelines

• we can create multiple input groups, but only one output group.
• we can connect heterogeneous sources to a Union transformation.
• all input groups and the output group must have matching ports. The Precision, data type,
and scale must be identical across all groups.
• The Union transformation does not remove duplicate rows. To remove Duplicate rows, we
must add another transformation such as a Router or Filter Transformation.
• we cannot use a Sequence Generator or Update Strategy transformation upstream from a
Union transformation.

Union Transformation Components

When we configure a Union transformation, define the following components:


Transformation tab: We can rename the transformation and add a description.
Properties tab: We can specify the tracing level.
Groups tab: We can create and delete input groups. The Designer displays groups we create
on the Ports tab.
Group Ports tab: We can create and delete ports for the input groups. The Designer displays
ports we create on the Ports tab.
We cannot modify the Ports, Initialization Properties, Metadata Extensions, or Port Attribute
Definitions tabs in a Union transformation.
Create input groups on the Groups tab, and create ports on the Group Ports tab. We can create
one or more input groups on the Groups tab. The Designer creates one output group by
default. We cannot edit or delete the default output group.
Example: to combine data of tables EMP_10, EMP_20 and EMP_REST
• Import tables EMP_10, EMP_20 and EMP_REST in shared folder in Sources.
• Create a target table EMP_UNION_EXAMPLE in target designer. Structure should be same
EMP table.
• Create the shortcuts in your folder.

Creating Mapping:
1. Open folder where we want to create the mapping.
2. Click Tools -> Mapping Designer.
3. Click Mapping-> Create-> Give mapping name. Ex: m_union_example
4. Drag EMP_10, EMP_20 and EMP_REST from source in mapping.
5. Click Transformation -> Create -> Select Union from list. Give name and click Create.
Now click done.
6. Pass ports from SQ_EMP_10 to Union Transformation.
7. Edit Union Transformation. Go to Groups Tab
8. One group will be already there as we dragged ports from SQ_DEPT_10 to Union
Transformation.
9. As we have 3 source tables, we 3 need 3 input groups. Click add button to add 2 more
groups. See Sample Mapping
10. We can also modify ports in ports tab.
11. Click Apply -> Ok.
12. Drag target table now.
13. Connect the output ports from Union to target table.
14. Click Mapping -> Validate
15. Repository -> Save
• Create Session and Workflow as described earlier. Run the Workflow and see the data in
target table.
• Make sure to give connection information for all 3 source Tables.

Normalizer Transformation
• Active and Connected Transformation.
• The Normalizer transformation normalizes records from COBOL and relational sources,
allowing us to organize the data.
• Use a Normalizer transformation instead of the Source Qualifier transformation when we
normalize a COBOL source.
• We can also use the Normalizer transformation with relational sources to create multiple
rows from a single row of data.
Example 1: To create 4 records of every employee in EMP table.
• EMP will be source table.
• Create target table Normalizer_Multiple_Records. Structure same as EMP and datatype of
HIREDATE as VARCHAR2.
• Create shortcuts as necessary.

Creating Mapping :

1. Open folder where we want to create the mapping.


2. Click Tools -> Mapping Designer.
3. Click Mapping-> Create-> Give name. Ex: m_ Normalizer_Multiple_Records
4. Drag EMP and Target table.
5. Transformation->Create->Select Expression-> Give name, Click create, done.
6. Pass all ports from SQ_EMP to Expression transformation.
7. Transformation-> Create-> Select Normalizer-> Give name, create & done.
8. Try dragging ports from Expression to Normalizer. Not Possible.
9. Edit Normalizer and Normalizer Tab. Add columns. Columns equal to columns in EMP
table and datatype also same.
10. Normalizer doesn’t have DATETIME datatype. So convert HIREDATE to char in
expression t/f. Create output port out_hdate and do the conversion.
11. Connect ports from Expression to Normalizer.
12. Edit Normalizer and Normalizer Tab. As EMPNO identifies source records and we want 4
records of every employee, give OCCUR for EMPNO as 4.
13.

14. Click Apply and then OK.


15. Add link as shown in mapping below:
16. Mapping -> Validate
17. Repository -> Save
• Make session and workflow.
• Give connection information for source and target table.
• Run workflow and see result.

Example 2: To break rows into columns

Source:
Roll_Number Name ENG HINDI MATHS
100 Amit 78 76 90
101 Rahul 76 78 87
102 Jessie 65 98 79

Target :
Roll_Number Name Marks
100 Amit 78
100 Amit 76
100 Amit 90
101 Rahul 76
101 Rahul 78
101 Rahul 87
102 Jessie 65
102 Jessie 98
102 Jessie 79

• Make source as a flat file. Import it and create target table.


• Create Mapping as before. In Normalizer tab, create only 3 ports Roll_Number, Name and
Marks as there are 3 columns in target table.
• Also as we have 3 marks in source, give Occurs as 3 for Marks in Normalizer tab.
• Connect accordingly and connect to target.
• Validate and Save
• Make Session and workflow and Run it. Give Source File Directory and Source File name
for source flat file in source properties in mapping tab of session.
• See the result.
Update Strategy Transformation
• Active and Connected Transformation
Till now, we have only inserted rows in our target tables.

What if we want to update, delete or reject rows coming from source based on some
condition?
Example: If Address of a CUSTOMER changes, we can update the old address or keep both
old and new address. One row is for old and one for new. This way we maintain the historical
data.
Update Strategy is used with Lookup Transformation. In DWH, we create a Lookup on target
table to determine whether a row already exists or not. Then we insert, update, delete or reject
the source record as per business need.
In Power Center, we set the update strategy at two different levels:
1. Within a session
2. Within a Mapping
1. Update Strategy within a session:
When we configure a session, we can instruct the IS to either treat all rows in the same way
or use instructions coded into the session mapping to flag rows for different database
operations.

Session Configuration:

Edit Session -> Properties -> Treat Source Rows as: (Insert, Update, Delete, and Data
Driven). Insert is default. Specifying Operations for Individual Target Tables:
You can set the following update strategy options:

Insert: Select this option to insert a row into a target table.


Delete: Select this option to delete a row from a table.
Update: We have the following options in this situation:
• Update as Update. Update each row flagged for update if it exists in the target table.
• Update as Insert. Inset each row flagged for update.
• Update else Insert. Update the row if it exists. Otherwise, insert it.
Truncate table: Select this option to truncate the target table before loading data.

2. Flagging Rows within a Mapping

Within a mapping, we use the Update Strategy transformation to flag rows for insert, delete,
update, or reject.
Operation Constant Numeric Value
INSERT DD_INSERT 0
UPDATE DD_UPDATE 1
DELETE DD_DELETE 2
REJECT DD_REJECT 3

Update Strategy Expressions:

Frequently, the update strategy expression uses the IIF or DECODE function from the
transformation language to test each row to see if it meets a particular condition.
IIF( ( ENTRY_DATE > APPLY_DATE), DD_REJECT, DD_UPDATE )
Or
IIF( ( ENTRY_DATE > APPLY_DATE), 3, 2 )
• The above expression is written in Properties Tab of Update Strategy T/f.
• DD means DATA DRIVEN

Forwarding Rejected Rows:

We can configure the Update Strategy transformation to either pass rejected rows to the next
transformation or drop them.
Steps:
1. Create Update Strategy Transformation
2. Pass all ports needed to it.
3. Set the Expression in Properties Tab.
4. Connect to other transformations or target.

Performance tuning:

1. Use Update Strategy transformation as less as possible in the mapping.


2. Do not use update strategy transformation if we just want to insert into target table, instead
use direct mapping, direct filtering etc.
3. For updating or deleting rows from the target table we can use Update Strategy
transformation itself.

Java Transformation in Informatica

JAVA TRANSFORMATION IN INFORMATICA

Informatica PowerCenter extended its functionality to support the


Java language from the version of 8.0. Informatica PowerCenter
provides a simple programming interface to implement transformation
functionality with java programming language.

Informatica PowerCenter facilitates the developers/users to import


the java packages and use them to achieve specific business rule
which defined in the java package. Looping of data is now easily
possible with help of Java transformation.

The main advantage of this transformation is multiple rows


can be generated for a single input row based on the condition.
The Normalizer can generate Static rows based on the ‘Occurs’
clause where we will specify a static number. If we want to
generate multiple rows based on a dynamic value then we would
depend on java transformation.

Informatica power center client uses JDK to compile java


code and generates byte code. PowerCenter client stores the
byte code in the PowerCenter repository. The Integration service
uses JRE to execute the generated byte code at run time.

Java transformation can be an Active or Passive transformation.


Java transformation has common input/output ports
like other transformations.

Let’s have a simple example to implement java transformation.

Source: PATIENT_PRE_DTL

Target: PATIENT_PRE_TBLS
The source table contains the patient ID, Name and tablets
(Separated by #) which doctor have prescribed for a patient.
The data from source should be populated into the target as
mentioned in the above example. A single row from source table
has to be converted into multiple rows based on number of
tablets, and load into the target table. This is a typical scenario
where we cannot use Normalizer Transformation since we
have no information about the occurrence (Number of tablets
prescribed by doctor may be varying for patient to patient).

We can achieve this functionality in a simple way by


using Java transformation.

1. Create source definition.


2. Create Target Definition.
3. Create a mapping.
a. Create source and target instances
b. Place Java Transformation -> Choose Active;

c. Drag and drop all the ports from Source Qualifier


to Java Transformation. An Input/output will automatically be
created in Java transformation under output tag.
d. Under “On InputRow” tab place the following code

String str=Tablets;
String[] temp;
String delimiter = ‘#’;
temp = str.split(delimiter);
for (int i =0; i< temp.length; i++){
Tablets = temp[i];
generateRow();
}

generateRow() – This is a build-in function which is used


to generate output row. As per our code Java transformation takes
a single record as an input and creates multiple records based
on the number of tablets.

You might also like