The Process of Data Mapping For Data Integration Projects

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

The Process of Data Mapping for Data Integration Projects

Data Mapping - A Key Work Product for


Data Warehouse, Data Integration, and Data Migration Projects
Wayne Yaddow
Data Quality Analyst - Consultant
[email protected]

Contents
Introduction ................................................................................................................................................................................................................. 1
When and How Data Maps Are Used .......................................................................................................................................................................... 2
Common Development Phases for Data Mapping ....................................................................................................................................................... 2
Data Mappings Described ............................................................................................................................................................................................ 2
Meeting the Challenges of Data Mapping Projects ..................................................................................................................................................... 3
Steps to Success in the Data Mapping Planning Process ............................................................................................................................................. 4
Planning Considerations for Data Mapping Projects ................................................................................................................................................... 4
Data Mapping Automation and Associated Tools........................................................................................................................................................ 5
Conclusion ................................................................................................................................................................................................................... 6

Introduction • Identification of data relationships as part of data


lineage analysis
Data mapping is among the most important design steps in data Data mapping bridges the differences between two systems, or data
migration, data integration, and business intelligence projects. models, so that when data is moved from a source, it is accurate and
Mapping source to target data greatly impacts project success – usable at the target destination.
perhaps more than any other task. The outcome of the mapping
process is a primary tool for communications between project Data mapping is the first step in a range of data integration tasks, one
architects, developers, and testers. of them being data transformation between the source and
destination. A data mapping tool or data mapper connects the distinct
We've come a long way from the time when "data mapping" was a applications and governs the way the data from source application
dirty word in e-discovery. But as data becomes more dispersed and will look like when it is mapped to the destination application. It also
voluminous across organizations, having a centralized resource for supports the application of multiple data manipulation functions that
quickly identifying where certain electronically stored information are applied to data when it is transformed from source to destination.
(ESI) resides is extremely valuable. Along with data, a data mapper should handle multiple structured and
unstructured files and formats to map the corresponding fields,
Data mapping: the process of creating data
creating the output in the desired schemas. Thus, it should support
element mappings between source and target data models. Data
complex data integration tasks.
mapping is used as the first step for a variety of data movement tasks
including: Data transformations are among the most common problems facing
systems integrators as source data is often in an inconsistent format
• Data transformation or data mediation between a data or structure for target systems needing to use that data. This requires
source and a destination integrators and migrators to design and implement code for the
• Consolidation of multiple databases into a single database mapping operations required to convert the data from one form to
and identifying redundant columns of data for another (e.g., from one relational database format to another).
consolidation or elimination
• Mappings that document the origins of data, the processing A simple example of data mapping is moving the value from a source
paths through which data flows, and the descriptions of the ‘address’ field in a customer database to a target ‘client address’ field
transformations applied to the data along those different in a sales department database – and changing the target field length
paths. and “cleaning” those addresses at the same time.
• Specifying business transformation/conversion rules to be
Data mapping is required at many stages of data integration, data
applied to source data
migration, and data warehouse life-cycles. Consequently, data
integration professionals must learn data mapping in order to move
1
and test data; often using an ETL (extract, transform, and load) Mapping development rules must be verifiable in order to validate
process. mapping(s), regardless of whether it was accomplished by automated
or manual means. Assumptions must be used with caution while
mapping data; instead full documentation should be used.
When and How Data Maps Are Used

Data mapping is used for many types of data movement projects. Common Development Phases for Data Mapping
However, all of the tasks fall into one of two categories.
Step 1: Discover and define data to be moved — including data
Data migration projects - the process of selecting, preparing, sets, the fields within each table, and the format of each field after
extracting, and transforming data and permanently transferring movement. For data integrations, the frequency of data transfer is
it from one IT storage system to another also defined.
Data integration and conversion projects - combining data Step 2: Map the data — map source fields to target destination
residing in different sources and providing users with a unified fields
view in a target system
Step 3: Transformation data — when fields require
For data mapping success, an important heuristic is a relationship
transformations/conversions, formulas or rules are designed and
between the source and the data target - it could be one source to
coded
many targets, many sources to one target, or many sources to many
targets. Step 4: Test — using a test system and sample data from sources,
run the transfers to see how it all works and make adjustments as
Combined with a well-documented use case describing the need for a
necessary
map and its intended purpose, heuristics are essential for mapping
success. It is imperative that decisions made regarding business rules Step 5: Deploy — once it's determined that data transformations are
and map heuristics are clearly documented so the evidence is working as planned, schedule a migration or integration go-live event
available to describe why each decision was made and by whom.
This serves as an audit trail of decisions made during the mapping Step 6: Maintain and update — for ongoing data integration, data
process. maps will require updates and changes as new data sources are added,
as data sources change, or as requirements at the destination change

Data Mappings Described

It should be assumed that source to target mappings are key for any • Source and target data types, dates, and times (metadata)
ETL solution. In addition to containing the mapping of fields from • Null, not Null, default indicators
sources to targets, data mappings should define the following
• Transformation, aggregation, enrichment description rules for
important basic information. See Figure 1 for a high-level example of
each field
information commonly documented as a source to target data
mapping.
• Error handling conditions and logic for each record, each field
• Columns participating in referential integrity
Data modelers, data and business analysts, ETL developers, and • Primary / foreign key columns that assure source records are
testers have a keen interest in unique
• How tables are joined (the type of SQL join)
• Database connections for source and target tables
• Slowly changing dimension (SCD) and change data capture
• Source and target data descriptions – what each data set
(CDC) attributes and logic
represents
• Change and version log entries to describe additions and
• Source and target field descriptions – what each field represents
changes to the mappings
• Examples of field/attribute contents

2
Figure 1: Sample data mapping template

set. These source-to-target mappings are derived from the ETL


Planning for Data Mapping Projects transformation rules described in a requirements specification
Planning is arguably the most important stage of the entire data document, comments inside the transformation scripts, spreadsheets,
mapping project. ER diagrams, or SQL scripts.

Documentation needed for a project’s sources should include data Data mapping is complex and challenging. So what makes data
maps and data dictionaries to deliver a complete definition and the mapping so difficult? The following are common challenges and
intended use of each source data component. Data dictionaries help to shortcomings associated with data mapping and how they can be
assure that the interpreted meaning of each source data element is mitigated.
correct. For example, a field for “Provider ID” could have many
• The time, people, and tools needed to build data maps can be
different definitions over several elements such as billing
substantial
identification number, national provider identifier, social security
number, etc. Data definitions with similar or exact names could differ The process of connecting data sources, building mappings for data
considerably in meaning. transformation and integration, and validating the transformed data
often require significant resources, particularly when the entire
Analogous to any complex project, planning for data maps requires: process is done manually.
• Define objectives for the mapping project There are several ways to ease the data mapping burden significantly.
• Gain IT and business management buy-in It starts by defining the process for gathering information to be
• Define specific mapping deliverables documented for each source and target. In most cases, systematic
• Assign mapping roles and responsibilities interviews with data stewards are the most efficient way to collect
info for a data map. Interviews with subject matter experts (SME’s)
should be direct using data mapping templates. Meta data and data
Meeting the Challenges of Data Mapping mapping tools should be used to automate as much as possible.
Projects
• The information needed is not always available for building
data maps
Source to target mappings describe how one or more attributes in
source data sets are related to one or more attributes in a target data

3
A common mistake organizations make with data maps is that they values and other unwanted data among sources. This can provide an
omit important information and therefore render the data map far less insight into the state of data quality.
useful than it should be. Before data mapping initiatives get off the
ground, project organizers should assemble key stakeholders and 4. Cleanse and screen source data
gather feedback on what information needs to be included in
Based on the knowledge of the business goals, experiment with
mappings for sources and targets. For example, retention schedules,
different data cleansing strategies that will get the relevant data into a
litigation risk profiles, and accessibility constraints of particular data
usable format. Start with a small, statistically-valid sample to
sources. Privacy officers will want to know which data sources
iteratively experiment with different data prep strategies, refine data
contain sensitive customer information that must be carefully
record filters, and discuss with business stakeholders.
protected.
• Substantial efforts needed to maintain data maps Planning Considerations for Data Mapping
As with all important project documents, data maps should be Projects
constantly evaluated, updated and assessed for quality. One method
to ensure the data mappings are maintained is to make sure the − A typical plan begins with meeting IT to gain an understanding
process is fully integrated into the organization’s master data of systems, assets, retention policy and practice, employee
management program. With every change to requirements, data maps separation procedures, archives, backup system and outsourcing
should be reviewed to assess the impact. of data storage or management. IT is the primary authority on
sources such as corporate email and backups.
• Data mapping with spreadsheets can pose long-term issues
− The next step is to consult with business unit leaders about
Creating manual mappings using spreadsheets is often difficult and needs and general data practices. They will flag the seemingly
time-consuming.: inevitable data repositories and associated software programs
IT doesn’t know about. And, don’t’ forget to meet with records
Mappings specifications built using spreadsheets cannot be managers about specific document management systems,
easily managed databases and file rooms.
Data mappings cannot be easily versioned and auditability of − At this point in the planning process enough information has
what and who has changed mappings remains a constant issue. been gathered to build the data map in outline form. Request
information about the format, volume, security information, etc.
Creating maps internally and using unqualified personnel for map from IT - continue to narrow the focus by gradually filling in
development compromises the integrity of results. Use skilled gaps and resolving inconsistencies.
personnel familiar with data mapping requirements, limitations, and − Enterprise data mapping software solutions automate some
pitfalls to ensure reliable results. parts of the process and can be used to generate the map instead
of manually creating a spreadsheet. Large corporations with
complex IT systems and companies in highly regulated
Steps to Success in the Data Mapping Planning
industries should evaluate investing in data mapping software.
Process − Processes and procedures must be clearly defined and
documentation prepared to explain how the map was developed,
1. Determine which data sources are needed to meet and tested, to work correctly for its intended purpose.
requirements for the target system − All data maps of any kind must be identified, inventoried,
maintained with schema changes, and verified. Poorly designed
General steps to source data discovery:
and out-of-date mappings create significant data integrity
I. Identify the data needed to meet required business tasks problems. Undetected errors in data maps have the potential to
II. Identify potential internal and external sources of that data introduce many problems, including current and “up the line”
III. Assure that each source meets the privacy and regulatory as data is propagated down-stream to other systems.
requirements − When evaluating data integrity issues that involve mapping, it’s
IV. Assure that each source will be adequately available and critical to understand the elements of all the code sets or data
accessible according to required frequencies sets that will be mapped. The characteristics of each source or
code set, their intended use, and how the map is created are all
2. Identify tools for data analysis, data preparation, and data important to building a successful data map. Using a map for a
mapping purpose not intended or misunderstanding the construct of the
source and target can lead to incomplete, incorrect, and
It will be necessary to load (i.e., frequently samples) data sources into inappropriate maps.
an environment of data preparation (DP) tools where the data can be
analyzed and manipulated. It’s important to get the data into an Planning for data mappings should begin after project requirements
environment where it can be examined and readied for the next steps. are “ready” and after all data sources have been identified and data
appropriately “prepared” to meet the needs of requirements and target
3. Conduct data profiling on potential and selected source data data.
This is the vital (but often discounted) step in DP. The project team Plan enough time to evaluate the source data, to compare the
must analyze source data before it can be properly prepared for available data to the needed data, and to drill down to the detail
downstream consumption. Beyond simple visual examination, needed for source-to-target mapping. One goal is to ensure that data
projects often need to profile data, detect outliers, and find null
4
migration development is the shortest task in the project plan. The • An identified organization, department, or individual should be
source data and target data should be well-understood before coding in charge of implementing, maintaining, and updating each data
begins. Project leaders don’t want to experience the very costly map.
surprise of learning that the source data is not “fit to use” at
integration testing or implementation. • There is an increasing range of data mapping tools and software
solutions available in the market and among open source.
Best Practices for Data Mapping Projects Commercial and open source tools should be assessed to aid the
mapping process.
All data maps require an investment of time and resources, some
more than others. When the source data and the target data are similar • Mapping should be reviewed then revised when source and
in structure with a high percentage of exact matches in content and targets are updated. This may require updating maps multiple
meaning, the time needed for data mapping validation will be times per year. Each update must be clearly identified as a
minimal. different version, and documentation should detail the revisions
for both the source and target.
However, when the source and target do not result in an exact match,
time must be spent to determine which of the mapping choices are
appropriate based on the map’s use case. Unless the mapping is a Data Mapping Automation and Associated Tools
one-to-one match from source to target, decisions must be made to
meet the intent of the map. Data mapping is complex and can be accomplished in a variety of
ways. Many software providers offer data mapping software.
In order to optimize the use of data maps, the following practices are However, these various solutions do not each provide comparable or
recommended: comprehensive features. When comparing different types of data
mapping software programs, particular key factors that should be
• Document the map heuristics and business rules surrounding the considered:
development of each mapping. Include use cases for each data
mapping; identify applications that use the maps; document how • Tools should offer advanced data visualization capabilities for
mapping rule is created and deployed in the workflow. selecting functions, selecting sources and targets, and reports for
review of mapping results
Mapping heuristics represent a “rule of thumb” guidance that
provides rules for how to map from source to target in a • Tools should be customizable. Such features allow users to
consistent manner for a specific project. Detailed instructions adapt the software to fit the particulars of their technologies and
should be provided so that consistency is ensured between map business needs better
developers throughout the project. Every mapping project must
have clear instructions to assure map results are • Tools should be easy to use, and not require extensive training in
“understandable, reproducible, and reliable.” order to implement. This improves adoption rates, and saves on
costs associated with the implementation
• Perform a Data Mapping Assessment - What are we moving,
and what are the transformation rules? How much time will it • Tools should support a large variety of data sets (ex., RDBMS,
take the team to complete data mapping? It’s often necessary to JMS, SOAP), and formats (csv, XML, etc.). This assures that
have a general idea of what it will take to create the final design users are able to retrieve and map their variety of data easily
and implementation. The captured business needs, the source
and target system metadata, and the data profiling results from Some organizations continue documenting data mappings on
the data quality assessment all create the information needed to spreadsheets. However, modern data integrations and migrations are
understand the mapping effort. too complex and varied for manual efforts to be effective. With more
data, more mappings, and constant changes, such ”manual” processes
• Prepare a process to test the validity and reproducibility of the should be reconsidered. They often lack transparency and don't easily
mapped data. A verification process should represent the data allow tracking the inevitable changes that occur in project
mapping development process to include tools used from map requirements, data models, and schemas.
development to end-user acceptance testing and approval.
Choosing the Best Data-Mapping Solutions for the Project
• There is no one-size-fits-all data map template. IT professionals
should select an appropriate data mapping template to manage A key to choosing the correct data-mapping solution is product
their data integration or migration. research. Software providers who offer free trail periods make it
easier to understand what kind of value is offered. Online reviews
• Authoritative maps save development costs. Data mapping may be useful for determining which data mapping programs to
templates supported by standards development organizations or investigate further, but business leaders should remember that not
mandated by government agencies usually have been validated every solution is a perfect fit for every user, and some negative
and tested to ensure they work for the purposes for which they reviews may be a result of incompatibility between the users and the
were developed. This saves the cost of creating and testing software for mapping data. The best data mapping software should be
locally developed maps. customizable and adaptable enough to provide value to businesses of
all kinds.

5
Conclusion
Data mapping is always resource-intensive requiring hands-on Data mapping can make a difference when it comes to getting data
development, review, and knowledge about all sources and targets. under control. It makes it easier to generate reports and to figure out
Human intervention is necessary for mapping design and validation how the data coming into the organization’s workplace is organized.
of map outcomes. Commercial and open source mapping tools can This can make a real difference in terms of getting information ready
assist in the process by providing varying degrees of automation. for data integrations or migrations.
Manual review is required, to a varying extent, to map the portions
that failed automated mapping and to validate the results of
automated mapping.

“Understanding Data Mapping and Its Techniques”,


References https://www.astera.com/type/blog/understanding-data-mapping-
“10 Best Data Mapping Tools Useful in an ETL Process”, and-its-techniques/
https://www.softwaretestinghelp.com/data-mapping-tools/ Mohammad Azad, ”Secret to Data Migration: Put Data Mapping
“5 Best Data Mapping Software Tools”, http://intellspot.com/data- First”, https://www.linkedin.com/pulse/secret-successful-data-
mapping-tools/ migration-put-mapping-first-mohammad-azad/

“Data Mapping Tools”, https://www.alooma.com/blog/data-


mapping-tools

You might also like