Optimizing Lookup Transformations: Using Optimal Database Drivers

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Optimizing Lookup Transformations Page 1 of 4

Optimizing Transformations > Optimizing Lookup Transformations

Optimizing Lookup Transformations

If the lookup table is on the same database as the source table in your mapping and caching is not
feasible, join the tables in the source database rather than using a Lookup transformation.
If you use a Lookup transformation, perform the following tasks to increase performance:

Use the optimal database driver.

Cache lookup tables.

Optimize the lookup condition.

Filter lookup rows.

Index the lookup table.

Optimize multiple lookups.

Create a pipeline Lookup transformation and configure partitions in the pipeline that builds the lookup
source.

Using Optimal Database Drivers


The Integration Service can connect to a lookup table using a native database driver or an ODBC driver.
Native database drivers provide better session performance than ODBC drivers.

Caching Lookup Tables


If a mapping contains Lookup transformations, you might want to enable lookup caching. When you
enable caching, the Integration Service caches the lookup table and queries the lookup cache during the
session. When this option is not enabled, the Integration Service queries the lookup table on a row-by-
row basis.
The result of the Lookup query and processing is the same, whether or not you cache the lookup table.
However, using a lookup cache can increase session performance for smaller lookup tables. In general,
you want to cache lookup tables that need less than 300 MB.
Complete the following tasks to further enhance performance for Lookup transformations:

Use the appropriate cache type.

Enable concurrent caches.

Optimize Lookup condition matching.

Reduce the number of cached rows.

Override the ORDER BY statement.

Use a machine with more memory.

Related Topics:

mk:@MSITStore:C:\Informatica\PowerCenter8.6.0\client\bin\Help\en\PT.chm::/Optimize... 9/26/2010
Optimizing Lookup Transformations Page 2 of 4

Caches

Types of Caches
Use the following types of caches to increase performance:

Shared cache. You can share the lookup cache between multiple transformations. You can share an
unnamed cache between transformations in the same mapping. You can share a named cache
between transformations in the same or different mappings.

Persistent cache. To save and reuse the cache files, you can configure the transformation to use a
persistent cache. Use this feature when you know the lookup table does not change between session
runs. Using a persistent cache can improve performance because the Integration Service builds the
memory cache from the cache files instead of from the database.

Enabling Concurrent Caches


When the Integration Service processes sessions that contain Lookup transformations, the Integration
Service builds a cache in memory when it processes the first row of data in a cached Lookup
transformation. If there are multiple Lookup transformations in a mapping, the Integration Service
creates the caches sequentially when the first row of data is processed by the Lookup transformation.
This slows Lookup transformation processing.
You can enable concurrent caches to improve performance. When the number of additional concurrent
pipelines is set to one or more, the Integration Service builds caches concurrently rather than
sequentially. Performance improves greatly when the sessions contain a number of active
transformations that may take time to complete, such as Aggregator, Joiner, or Sorter transformations.
When you enable multiple concurrent pipelines, the Integration Service no longer waits for active
sessions to complete before it builds the cache. Other Lookup transformations in the pipeline also build
caches concurrently.

Optimizing Lookup Condition Matching


When the Lookup transformation matches lookup cache data with the lookup condition, it sorts and
orders the data to determine the first matching value and the last matching value. You can configure the
transformation to return any value that matches the lookup condition. When you configure the Lookup
transformation to return any matching value, the transformation returns the first value that matches the
lookup condition. It does not index all ports as it does when you configure the transformation to return
the first matching value or the last matching value. When you use any matching value, performance can
improve because the transformation does not index on all ports, which can slow performance.

Reducing the Number of Cached Rows


You can reduce the number of rows included in the cache to increase performance. Use the Lookup SQL
Override option to add a WHERE clause to the default SQL statement.

Overriding the ORDER BY Statement


By default, the Integration Service generates an ORDER BY statement for a cached lookup. The ORDER
BY statement contains all lookup ports. To increase performance, suppress the default ORDER BY
statement and enter an override ORDER BY with fewer columns.
The Integration Service always generates an ORDER BY statement, even if you enter one in the override.
Place two dashes ‘--’ after the ORDER BY override to suppress the generated ORDER BY statement.
For example, a Lookup transformation uses the following lookup condition:
ITEM_ID = IN_ITEM_ID
PRICE <= IN_PRICE
The Lookup transformation includes three lookup ports used in the mapping, ITEM_ID, ITEM_NAME, and
PRICE. When you enter the ORDER BY statement, enter the columns in the same order as the ports in
the lookup condition. You must also enclose all database reserved words in quotes.
Enter the following lookup query in the lookup SQL override:

mk:@MSITStore:C:\Informatica\PowerCenter8.6.0\client\bin\Help\en\PT.chm::/Optimize... 9/26/2010
Optimizing Lookup Transformations Page 3 of 4

SELECT ITEMS_DIM.ITEM_NAME, ITEMS_DIM.PRICE, ITEMS_DIM.ITEM_ID FROM ITEMS_DIM


ORDER BY ITEMS_DIM.ITEM_ID, ITEMS_DIM.PRICE --

Using a Machine with More Memory


To increase session performance, run the session on an Integration Service machine with a large amount
of memory. Increase the index and data cache sizes as high as you can without straining the machine. If
the Integration Service machine has enough memory, increase the cache so it can hold all data in
memory without paging to disk.

Optimizing the Lookup Condition


If you include more than one lookup condition, place the conditions in the following order to optimize
lookup performance:

Equal to (=)

Less than (<), greater than (>), less than or equal to (<=), greater than or equal to (>=)

Not equal to (!=)

Filtering Lookup Rows


Create a filter condition to reduce the number of lookup rows retrieved from the source when the lookup
cache is built.

Indexing the Lookup Table


The Integration Service needs to query, sort, and compare values in the lookup condition columns. The
index needs to include every column used in a lookup condition.
You can improve performance for the following types of lookups:

Cached lookups. To improve performance, index the columns in the lookup ORDER BY statement.
The session log contains the ORDER BY statement.

Uncached lookups. To improve performance, index the columns in the lookup condition. The
Integration Service issues a SELECT statement for each row that passes into the Lookup
transformation.

Optimizing Multiple Lookups


If a mapping contains multiple lookups, even with caching enabled and enough heap memory, the
lookups can slow performance. Tune the Lookup transformations that query the largest amounts of data
to improve overall performance.
To determine which Lookup transformations process the most data, examine the
Lookup_rowsinlookupcache counters for each Lookup transformation. The Lookup transformations that
have a large number in this counter might benefit from tuning their lookup expressions. If those
expressions can be optimized, session performance improves.

Related Topics:
Optimizing Expressions

Creating a Pipeline Lookup Transformation


A mapping that contains a pipeline Lookup transformation includes a partial pipeline that contains the
lookup source and a source qualifier. The Integration Service processes the lookup source data in this

mk:@MSITStore:C:\Informatica\PowerCenter8.6.0\client\bin\Help\en\PT.chm::/Optimize... 9/26/2010
Optimizing Lookup Transformations Page 4 of 4

pipeline. It passes the lookup source data to the pipeline that contains the Lookup transformation and it
creates the cache.
The partial pipeline is a separate target load order group in session properties. You can configure multiple
partitions in this pipeline to improve performance.

Informatica Corporation
http://www.informatica.com
Voice: 650-385-5000
Fax: 650-385-5500

mk:@MSITStore:C:\Informatica\PowerCenter8.6.0\client\bin\Help\en\PT.chm::/Optimize... 9/26/2010

You might also like