This article will present you with a complete idea about ETL testing and what we do to test ETL process.
It has been observed that Independent erification and alidation is gaining huge mar!et potential and many companies are now seeing this as prospective business gain. "ustomers have been offered different range of products in terms of service offerings, distributed in many areas based on technology, process and solutions. ETL or data warehouse is one of the offerings which are developing rapidly and successfully.
Why do organizations need Data Warehouse? #rgani$ations with organi$ed IT practices are loo!ing forward to create a next level of technology transformation. They are now trying to ma!e themselves much more operational with easy%to%interoperate data. &aving said that data is most important part of any organi$ation, it may be everyday data or historical data. 'ata is bac!bone of any report and reports are the baseline on which all the vital management decisions are ta!en. (ost of the companies are ta!ing a step forward for constructing their data warehouse to store and monitor real time data as well as historical data. "rafting an efficient data warehouse is not an easy )ob. (any organi$ations have distributed departments with different applications running on distributed technology. ETL tool is employed in order to ma!e a flawless integration between different data sources from different departments. ETL tool will wor! as an integrator, extracting data from different sources* transforming it in preferred format based on the business transformation rules and loading it in cohesive '+ !nown are 'ata ,arehouse.
,ell planned, well defined and effective testing scope guarantees smooth conversion of the pro)ect to the production. - business gains the real buoyancy once the ETL processes are verified and validated by independent group of experts to ma!e sure that data warehouse is concrete and robust.
ETL or Data warehouse testing is categorized into four different engagements irrespective of technology or ETL tools used.
New Data Warehouse Testing / 0ew ', is built and verified from scratch. 'ata input is ta!en from customer re1uirements and different data sources and new data warehouse is build and verified with the help of ETL tools. Migration Testing / In this type of pro)ect customer will have an existing ', and ETL performing the )ob but they are loo!ing to bag new tool in order to improve efficiency. Change Request / In this type of pro)ect new data is added from different sources to an existing ',. -lso, there might be a condition where customer needs to change their existing business rule or they might integrate the new rule. Report Testing / 2eport are the end result of any 'ata ,arehouse and the basic propose for which ', is build. 2eport must be tested by validating layout, data in the report and calculation.
-part from these 6 main ETL testing methods other testing methods li!e integration testing and user acceptance testing is also carried out to ma!e sure everything is smooth and reliable.
+usiness and re1uirement understanding alidating Test Estimation Test planning based on the inputs from test estimation and business re1uirement 'esigning test cases and test scenarios from all the available inputs #nce all the test cases are ready and are approved, testing team proceed to perform pre%execution chec! and test data preparation for testing Lastly execution is performed till exit criteria are met 8pon successful completion summary report is prepared and closure process is done.
It is necessary to define test strategy which should be mutually accepted by sta!eholders before starting actual testing. - well defined test strategy will ma!e sure that correct approach has been followed meeting the testing aspiration. ETL testing might re1uire writing 79L statements extensively by testing team or may be tailoring the 79L provided by development team. In any case testing team must be aware of the results they are trying to get using those 79L statements.
Data File loads from Source System on to Source Tables Transform Process that is designed to extract data from Source tables and move them to Staging tables Data Validation of all Mapping Rules/Transformation Rules within the Staging tables Data Validation within Target tables to ensure data is present in required format and there is no data loss from Source to Target tables So ETL Testing implies Testing this entire process using a tool or at table level with the help of test cases and Rules Mapping document.
Typically, data that is loaded into a data warehouse is derived from diverse sources of operational data, which may consist of data from databases, feeds, application files or flat files. The data must be extracted from these diverse sources, transformed to a common format, and loaded into the data warehouse. Typically, it is further aggregated into a data mart for efficient reporting. The ETL (Extract, transform and load) process is a critical step in any data warehouse implementation, and continues to be an area of major significance whenever the ETL code is updated.
An effective data warehouse testing strategy focuses on the main structures within the data warehouse architecture:
New Data Warehouse Testing- a new data warehouse is built from ground up, gathering inputs from customer, extracting different data sources. This is verified with the help of ETL tools Migration Testing In this type of engagement, migrating from the current ETL tool to a better option to improve efficiency Change Request In this type of project new data is added from different sources to an existing DW. Also, there might be a condition where customer needs to change their existing business rule or they might integrate the new rule. Report Testing- Validating report layout, data in the report and calculation
Environment Instability Response time from the query executed, the failure of the jobs, the data set up required for the FIT testing, volume testing Data selection from multiple source systems and analysis that follows pose great challenge Volume and the complexity of the data Inconsistent and redundant data in a data warehouse Inconsistent and Inaccurate reports Non-availability of History data
