Coding Standards

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 9

Informatica Coding Standards

May, 2006

Submitted By

Document 9387709.doc Version.Rev : 1.0


Name. :
Authorized by: Suresh Babu Varadarajan Signature/:
Date
REVISION HISTORY

Ver. Date Prepared By Reviewed By Description


1.00 May 20, Srinivasan Suresh Babu Initial Version
2006 Ganesh Varadarajan,
Sastry K. V. S.
N., Radhika
Katti
1.1 May 22, Srinivasan Incorporated the
2006 Ganesh review comments

9387709.doc Ver.Rev. 1.0 2 of 9


1. NAMING CONVENTIONS FOR TRANSFORMATIONS.............................................4
1.1 MAPPINGS ..................................................................................................................... .....5
1.2 MAPPLETS .................................................................................................................... ......5
1.3 WORKFLOWS..................................................................................................................... ...5
1.4 SESSIONS ............................................................................................... ...........................6
1.5 FOR VARIABLES........................................................................................... .........................6
1.6 PARAMETER FILES .............................................................................................................. ....7
1.7 LOG AND BAD FILES............................................................................................ ...................7
1.8 SERVER VARIABLES................................................................................... .............................7
1.9 PERFORMANCE IMPROVEMENT RELATED STANDARDS: ...................................................................... ......7
1.10 MISCELLANEOUS STANDARDS: ........................................................................... .......................7
1.11 REFERENCE:............................................................................................. .........................9

9387709.doc Ver.Rev. 1.0 3 of 9


1.NAMING CONVENTIONS FOR TRANSFORMATIONS

Type of Naming Convention Description/Example


Transform
Aggregator ‘agg_’ + Aggregate the column(s)
Transformation target_table_name or e.g.: If the revenue column is being
descriptive_process_na aggregated, then the transform
me should be named as agg_REVENUE
Expression ‘exp_’ + If the object of the expression to
Transformation target_table_name or generate sales summary then the
descriptive_process_na transform should be named as
me exp_SALES_SUMMARY
Advanced ‘aep_’+ external An Advanced external procedure
External procedure name transform should be a concatenation
Procedure of ‘aep_’+ the name of the external
Transform procedure being called.
External ‘ext_’ + external An external procedure transform
Procedure procedure name should be a concatenation of ‘ext_’+
Transformation the name of the external procedure
being called.
Filter ‘fil_’+ If the filter transform is based on a
Transformation target_table_name or filter condition for the STATE attribute
descriptive_process_na then the transform should be named
me as fil_STATE
Joiner ‘jnr_’ + Join two disparate sources e.g.: If a
Transformation source_table1_source_ta file custfile.dat is to be joined with the
ble2 or filename1 and customer table the transform should
filename2 if joining files be named as
jnr_CUSTFILE_CUSTOMER
Connected Lookup ‘lkp_’+ table name on If the lookup is being performed on
Transformation which lookup is being PRODUCT_ID using the table
performed DIM_PRODUCT then the transform
should be named as
lkp_DIM_PRODUCT
Unconnected ‘ulkp_’+ table name on
Lookup which lookup is being If the lookup is being performed on
Transformation performed + field PRODUCT_ID using the table
returned DIM_PRODUCT to return
PRODUCTCODE then the transform
should be named as
ulkp_DIM_PRODUCT_PRODUCTCODE
Normalizer ‘nrm_’+ file name being If the MONTHLY_SALES.dat file is
Transformation normalized being normalized then the transform
should be named as
nrm_MONTHLY_SALES
Rank ‘rnk_’+ rnk_MONTHLY_SALES
Transformation target_table_name or
descriptive_process_na
me
Sequence ‘seq_’+ table name If the sequence is being generated for
Transformation column PRODUCT_ID then the
transform should be named as
9387709.doc Ver.Rev. 1.0 4 of 9
seq_DIM_PRODUCT
Source Qualifier ‘sq_’+ table name If the source qualifier is defined on
Transformation table DIM_PRODUCT then the
transform should be named as
sq_DIM_PRODUCT
Stored Procedure ‘sp_’+ stored procedure The stored procedure (sp) transform
name should be a concatenation of ‘sp_’+
the name of the stored procedure
being called. NOTE: If a stored
procedure name already begins with
sp then there is no need to prefix it
with ‘sp_’.
Update Strategy ‘upd_’ + table name for If the update strategy is being defined
Transformation which update strategy is for column CUST_NAME then the
being defined transform should be named as
upd_DIM_CUSTOMER
Union ‘un_’ + un_merge_transaction_records
Transformation description_of_merge
Router ‘rtr_’+ rtr_route_wrap_error_valid_records
Transformation description_of_routing

Ports Naming convention to be followed


Input Ports i_<fieldname> if port is created explicitly
Output Ports o_<fieldname> if port is created
explicitly
Variable Ports v_<fieldname>

1.1Mappings

The naming convention for mappings which relate to specific systems could be like this:
m_<Project name>_<System name>_<Description>

Example:
m_trex_rps_create_transaction_stf_format

1.2Mapplets

The naming convention for mapplets is: mplt_Description.


Mapplet-Input: input_SourceName
Mapplet-output: output_TargetName

Example:
mplt_acr_report_generation

1.3Workflows

Workflows should being with ‘wf_’ and represent the functionality of the workflow.

The naming convention for Workflows is


wf_<Project name>_<System name>_<Description>
9387709.doc Ver.Rev. 1.0 5 of 9
Example:
wf_trex_rps_ create_transaction_stf_format

1.4Sessions

The name of a session is derived from the mapping it is based upon. By default the session
name generated by Informatica will be ‘s_’+mapping name. The default name should be
used as the session name.
For example: A session based on the mapping m_gsr_dim_geography will be named as
s_m_trex_gsr_dim_geography.

The naming convention for sessions is:


s_<mappingname>

If one mapping is being used by two sessions then the session name should be suffixed
suitably to indicate the need for using the same mapping for two or more sessions.

Example:
s_ m_trex_rps_create_transaction_stf_format

1.5For Variables

The naming convention for variables is: v_<fieldname/PurposeOfVariable>

• A variable name should be easy to read and convey the purpose of the variable. This
helps when a number of variables are defined in a single transformation.

• Variable names should follow the sentence case format i.e. the first letter of the name
will be in upper case.

• The use of an underscore is recommended as part of a variable name for clarity and
readability.
• e.g.: The variable for first quarter sales should be named as Q1_sales instead of Q1sales.

Example:

Variable name Variable Expression Description


v_Month GET_DATE_PART( Extract the Month from
DATE_ENTERED, DATE_ENTERED
’mm’)
v_Q1_sales SUM(QUANTITY* Calculate Q1 sales
PRICE-DISCOUNT,
Month=1 or Month=2 or
Month=3)
v_State_counter IIF Increment v_State_counter
( if STATE in previous data
PREVIOUS_STATE= record is same as current
STATE, data record
State_counter+1,1
)

9387709.doc Ver.Rev. 1.0 6 of 9


1.6Parameter files

Prm_<description>

Example:
Prm_RPS_010_job.prm

1.7Log and Bad files

The naming convention for

Log files : <session_name>.log


Bad files : <session_name>.bad

1.8Server Variables

The naming conventions for server variables are

$PMRootDir = the root directory (Example: \\Server\C:\Program Files\Informatica


PowerCenter 7.1\Server\).
$PMRootDir/SessLogs= Session Logs directory.
$PMRootDir/BadFiles =Bad Files directory.
$PMRootDir/Cache= Cache Files directory.
$PMRootDir/TgtFiles= Target Flat files directory (output).
$PMRootDir/SrcFiles= Source Flat files directory (input).
$PMRootDir/ExtProc= External Procedures directory.
$PMRootDir/LkpFiles= Lookup Files directory.
$PMRootDir/WorkflowLogs= Workflow Logs directory.

1.9Performance improvement related standards:

1. Turn off verbose logging.


2. Turn off ‘collect performance statistics’, after performance testing.
3. Try not to read a file over network.
4. In places where an opportunity for code reuse is not present, Consider output
expressions in place of variables.
5. Using a Sequence generator provides an edge in performance over a stored
procedure call to get the sequence from database.
6. An Update strategy slows down the performance of the session.
7. Lookups and aggregators slow down the performance since they involve caching. It is
advisable to calculate and check the Index and data cache size when using them.
8. Consider partitioning of files where ever necessary.

Note: Refer to annexure 1.

1.10Miscellaneous standards:

1. All ports to be in CAPITAL LETTERS for flat files.


2. Description of all the transformations (including sources and targets) is to be filled
appropriately.
3. There should be no unused ports in any transformations.
4. Connection information of lookup should be $SOURCE or $TARGET.
5. For output only ports, all error (Transformation error) messages must be removed.
9387709.doc Ver.Rev. 1.0 7 of 9
6. When Strings are read from a fixed width flat file source and compared, ensure that
the strings are trimmed properly (LTRIM, RTRIM) for blank spaces.
7. Where ever the fields returned from a lookup transformation are used, null values for
these fields should be handled accordingly.
8. When ports are dragged to the successive transformations, the names should not be
changed.
9. Common sources and target tables are to be used from ‘COMMON’ folder.
10. By default “edit null charaters to space” option in session definition for file targets
has to be selected.
11. Check the return status of pmcmd. If non-zero then exit.
12. ALL CONTROL REPORTS TO BE IN CAPS - Please modify the necessary scripts.
RECORDS PROCESSED: 45
RECORDS REJECTED: 0
Note there is a ":" separating the description and the value in the above example.
This should be standard for all control reports.
13. ALL CONTROL REPORTS TO BE IN CAPS
14. The Default values of all the output ports should be removed.
15. All the components in the mapping should have comments.
16. All Ports in should be used and connected.
17. The Tracing level value should be set to 'Terse' in all transformations across the
mapping.
18. All the 'Error (Transformation error)' messages should be removed from default value
in the Expression transformation.
19. The Mapping should not contain any unused Source/Target Definitions.
20. The Override Tracing should be set to 'TERSE' in the Session properties of Error
handling.
21. The 'Enable High Precision' should be disabled in the Session properties of
Performance.
22. The Stop on error property should be greater than 0. There maybe exceptions to the
rule all such cases should be documented in the design document
23. The Session log file directory, Cache directory, File name, database connection stings,
reject file directory, target connection string should start with a $.
24. The port names should be as per Standards.
25. The Data type of ports in source qualifier and source should be matching.
26. The sequence of fields in the sql query of a source qualifier should be in the same
order as that of the order of the ports in it.
27. The filter expression should be coded properly in the following format: IIF (condition,
True, False).
28. The lookup transformation should not be having fields, which are neither checked as
out port nor used as lookup only port.
29. Usage of variables
 Variable expressions can reference any input port. So as a matter of good
design principle, in a transform, variables should be defined after all the input
ports have been defined.
 Variables that will be used by other variable(s) should be the first to be
defined.
 Variables can reference other variable(s) in an expression. NOTE: the ordering
of variables in a transform is very important. All the variables using other
variables in their expression should be defined in the order of the dependency.
In the table above, Month is used to calculate Q1_sales. Hence, Q1_sales
must always be defined after Month.
 Variables are initialized to 0 for numeric variables or empty string for
character/date variables.

9387709.doc Ver.Rev. 1.0 8 of 9


 Variables can be used for temporary storage (for example, PREVIOUS_STATE
is a temporary storage variable which will be overwritten when a new data
record is read and processed).
 Local variables also reduce the overhead of recalculations. If a complex
calculation is to be used throughout a mapping then it is recommended to
write the expression once and designate it as a variable. The variable can
then be passed to other transforms as an input port. This will increase
performance, as the Informatica Server will perform the calculation only once.
For example, the variable Q1_sales can be passed from one transformation to
another rather than redefining the same formula in different transforms.
 Variables can remember values across rows and they retain their value until
the next evaluation of the variable expression. Hence, the order of the
variables can be used to compute procedural computations. For example, let
V1 and V2 be two variables defined with the following expressions and V1
occurs before V2:
V1 has the expression ‘V2+1’
V2 has the expression ‘V1+1’
In this case, V1 gets the previous rows’ value for V2 since V2 occurs after V1
and is evaluated after V1. But V2 gets the current rows’ value of V1 since V1
is already evaluated by the time V2 is evaluated.
30. Informatica can perform data sorting based on a key defined for a source but it
would be beneficial to have an index (matching the sort criteria) on a table for better
performance. This is especially helpful while performing key based aggregations as
well as lookups.
31. An external procedure/function call should always return a value. The return value
will be used to determine the next logical step in the mapping execution.
32. Performance of an aggregator can be improved by presorting the data before passing
it to the aggregator.
33. Filter data as early as possible in the ETL
34. Use custom SQL in Source Qualifier for the WHERE clause to filter data. This lets the
database do the work, where it can be handled the most efficiently
35. A stored procedure should always return a value. The return value will be used by
Informatica to evaluate the success or failure of the event. It can also be used to
control the outcome of the mapping execution.
36. Stored procedures used inside of a mapping usually create poor performance.
Therefore, it is recommended to avoid using stored procedures inside of a mapping.
37. Mapping a string to an integer, or an integer to a string will perform the conversion,
however it will be slower than creating an output port with an expression like:
to_integer(xxxx) and mapping an integer to an integer. It's because PMServer is left
to decide if the conversion can be done mid-stream which seems to slow things down

1.11Reference:

S.No. Description

1.

PerformanceTips

9387709.doc Ver.Rev. 1.0 9 of 9

You might also like