Talend Examples MDM EN 7.2.1M6

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

Master Data Management

Examples

7.2.1M6
Contents

Copyright........................................................................................................................ 3

Theory into practice: Using Integrated Matching to reconciliate customer


data................................................................................................................................. 4
Integrated Matching in Talend MDM..........................................................................................................................4
Prerequisites..........................................................................................................................................................................4
Configuring the integration of Talend Data Stewardship with Talend MDM...............................................5
Setting up a project in Talend Studio to create golden records using integrated matching..................5
Working with the data in the Talend MDM Web UI............................................................................................. 9
Data transfer between Talend MDM and Talend Data Stewardship............................................................ 12
Mapping tables between Talend MDM and Talend Data Stewardship........................................................13
Copyright

Copyright
Adapted for 7.2.1M6. Supersedes previous releases.
Publication date: May 23, 2019
Copyright © 2019 Talend. All rights reserved.
The content of this document is correct at the time of publication.
However, more recent updates may be available in the online version that can be found on Talend
Help Center.
Notices
Talend is a trademark of Talend, Inc.
All brands, product names, company names, trademarks and service marks are the properties of their
respective owners.
End User License Agreement
The software described in this documentation is provided under Talend 's End User Software and
Subscription Agreement ("Agreement") for commercial products. By using the software, you are
considered to have fully understood and unconditionally accepted all the terms and conditions of the
Agreement.
To read the Agreement now, visit http://www.talend.com/legal-terms/us-eula?
utm_medium=help&utm_source=help_content

3
Theory into practice: Using Integrated Matching to reconciliate customer data

Theory into practice: Using Integrated Matching to


reconciliate customer data
This appendix describes an end-to-end scenario to demonstrate the Integrated Matching feature using
Talend Data Stewardship integrated with Talend MDM.
Talend Data Stewardship can be integrated with Talend MDM to perform the integrated matching
tasks. In this scenario, Talend Data Stewardship is used to handle the merging tasks generated during
the integrated matching process.
For more information about the integrated matching tasks, see the Talend Data Stewardship
documentation on Talend Help Center (https://help.talend.com).
Assume in a real-life project, a company has some customer information gathered from different
sources. To reconciliate the customer data, you can design a match rule to ensure that the customer
data with similar first names and last names are matched and merged into consolidated customer
data (golden records) to be stored in the master database of the company.

Integrated Matching in Talend MDM


Integrated matching groups together similar records and creates a "golden" record which is a
consolidated version of all records in the group.
Integrated matching in Talend MDM enables you to deduplicate or reconciliate data by doing the
following:
• Identify similar records and select the values to survive. This is known as the match and
survivorship process.
• Build golden records.
• Create the merging tasks for records that need human intervention to be merged, and list them
automatically in Talend Data Stewardship.
• Handle the merging tasks in Talend Data Stewardship, for example, change the survived values or
remove extra records.
• Run the validation process again in the staging area.
Integrated matching only deals with records within the staging area, which is actually a mirror of the
SQL storage used to store master data records.
Once the staging area has data records waiting for validation, MDM users can run the validation
process to transfer data from the "staging area" to the "master database".
The integrated matching in MDM is different from the data matching with Jobs in that the native
feature is fully integrated and all the logic is owned by MDM rather than in various Jobs.

Prerequisites
Before you begin this scenario, make sure:
• You have configured the integration of Talend Data Stewardship with Talend MDM. For more
information, see Configuring the integration of Talend Data Stewardship with Talend MDM on
page 5.
• You have created an empty project in Talend Studio.

4
Theory into practice: Using Integrated Matching to reconciliate customer data

• You have established a connection to an MDM server in Talend Studio.


Pay attention to the following:
• The integrated matching feature supports only primary keys of the type string.
• The integrated matching feature does not support matching on entities with composite primary
keys.
• The mapping between Talend MDM and Talend Data Stewardship applies only to simple type
elements in an MDM entity.

Configuring the integration of Talend Data Stewardship


with Talend MDM
Before you can use Talend Data Stewardship to perform integrated matching tasks, you need to
configure the integration of Talend Data Stewardship with MDM.

Procedure
1. If the MDM server is up and running, stop it first. Otherwise, skip this step.
2. Browse to the file <$INSTALLDIR>/conf/mdm.conf and open it.
INSTALLDIR indicates the path where the MDM server has been installed.
3. Locate the properties related to Talend Data Stewardship, configure the URL, username and
password to access Talend Data Stewardship respectively, and leave the other options unchanged.

# TDS settings
######################################################
tds.root.url=http://localhost:19999
[email protected]
tds.password=owner1
tds.core.url=/data-stewardship
tds.schema.url=/schemaservice
tds.api.version=/api/v1
tds.batchsize=50

The value of the property tds.user must be a valid username of a Talend Administration Center
user who also serves as a Data Stewardship User and has the Campaign Owner data stewardship
role.
The password is encrypted during the MDM server startup.
For more information about creating users, see Creating Data Stewardship users.
For more information about the properties, see Talend Installation Guide.
4. Save your changes and restart the MDM server to take into account your updates.

Setting up a project in Talend Studio to create golden


records using integrated matching
In this scenario, preparing a simple project involves working with data models, data containers, views
and match rules.

5
Theory into practice: Using Integrated Matching to reconciliate customer data

Building a simple project in Talend Studio


The beginning of any MDM project involves setting up a data model, and creating business entities in
this data model.

Procedure
1. In Talend Studio, create a data model Customer and its corresponding data container
Customer.
For more information about how to create a data model, see the Talend Studio User Guide on
Talend Help Center (https://help.talend.com).
2. Add a new entity Customer in the Customer data model.
3. Add elements to the Customer entity.
In this example, the detailed information of a customer is added, for example, lname, fname,
city, and gender.
4. Create a view Customer for the Customer entity, which you can use to interact with the data
from Talend MDM Web UI.
Next, you need to create and define a match rule, attach the match rule to the data model, and
then verify that the match rule works well from Talend MDM Web UI.

Creating and defining a match rule


In this scenario, you need to create and define a match rule MatchCustomer to match the staging
data records that belong to the Customer entity based on the fname and lname fields.
In MDM, match rules are used to decide whether two or more data records match, and how to handle
them if they do.

Procedure
1. In the MDM Repository tree view, right-click Match Rule and then select New from the contextual
menu.
2. In the dialog box that opens, define a name for the new match rule.
If needed, enter information in the Purpose and Description fields to better describe your match
rule.
3. Click Finish to close the dialog box.
The newly created match rule is displayed under the Match Rule node. You need to further define
the characteristics of the match rule in the Match Rule Editor that opens.
4. In the Record linkage algorithm section, select T-Swoosh.
You can use the T-Swoosh algorithm to find duplicates and to define how two similar records are
merged to create a master record, using a survivorship function.
5. In the Match and Survivor section, define the criteria to use when matching staging data records.

6
Theory into practice: Using Integrated Matching to reconciliate customer data

In this example, add two match keys Firstname and Lastname, select Jaro-Winkler as the
matching function, set both thresholds to 0.8, and select Longest (for strings) as the survivorship
function.
6. In the Default Survivorship Rules section, define how to survive matches for certain data types:
Boolean, Number and Date.
If you do not specify the behavior for any or all data types, the default behavior is applied.
Once you define the match rule, you must attach it to a specific entity of a data model.
You cannot deploy a match rule directly to the MDM server. Rather, match rules are deployed
along with the data model to which they are attached.

Attaching the match rule to the data model


You need to attach the match rule MatchCustomer to the specific entity Customer of the data
model Customer.

Procedure
1. In the MDM Repository tree view, open the data model Customer to which you want to attach
the match rule MatchCustomer.
2. Select the entity Customer to which you want to attach the match rule and, in the Properties
view, open the Rules tab.
3. In the Match Rule section, select the match rule you want to attach to this data model from the
drop-down list.
If needed, click Open Match Rule to open and view the details of the match rule.
4. In the table, map each match key to the corresponding simple type element at the root level in
the entity using the selection window.

7
Theory into practice: Using Integrated Matching to reconciliate customer data

In this example, the match key Firstname is mapped to Customer/fname, and the match key
Lastname is mapped to Customer/lname.
5. Save your changes.

Results
Now you need to deploy the data model Customer to the MDM server to take into account the
changes you have made, so that the match rule is deployed together with the data model to which it
is attached.

Deploying the data model to the MDM server


You must always deploy your data model to the MDM server for any changes you made to be taken
into account at runtime.

Procedure
1. In the MDM Repository tree view, right-click the data model you want to deploy, and select
Deploy To....
2. In the dialog box that opens, click Add Dependencies and click Continue after adding the
dependencies.
3. In the Select a server location definition dialog box, select the server where you want to deploy
the data model, and then click OK.
A dialog box pops up, indicating that the deployment is successful.
4. Click OK to complete deploying the data model.

8
Theory into practice: Using Integrated Matching to reconciliate customer data

In this example, since Talend MDM is configured with Talend Data Stewardship, when you deploy
the data model Customer to an MDM server, the following changes are propagated to Talend
Data Stewardship automatically:
• A new data stewardship data model named customer - customer - tmdm is created with
attributes that correspond to each simple type element in the Customer entity;
• A new merging campaign named customer - customer - tmdm is created using the new data
stewardship data model.
Meanwhile, MDM data types are mapped into Talend Data Stewardship data types, and MDM
element constraints (if any) are translated into Talend Data Stewardship attribute constraints. For
more information, see Mapping tables between Talend MDM and Talend Data Stewardship on
page 13.
For more information about how the changes of an MDM data model are propagated to Talend
Data Stewardship upon its deployment to an MDM server, see Talend Studio User Guide.
You can also create the relevant campaign(s) and data stewardship data model(s) for an MDM
data model through the REST API. For more information, see Talend Help Center (https://help.t
alend.com).

Results
You have now finished preparing your simple MDM project in Talend Studio.

Working with the data in the Talend MDM Web UI


Once you have prepared your simple project in Talend Studio, you can start to interact with the data
in Talend MDM Web UI, as your business users would do in a real-life project.

Accessing the Talend MDM Web UI


Do the following to access the Talend MDM Web UI.

Procedure
1. In a web browser, enter the URL for your MDM Server.
For example, http://localhost:8180/talendmdm/ui.
2. On the authentication page, enter the default administrator user name and password,
administrator/administrator, and then click the Login button.
The Welcome Page opens. If your MDM role includes a data stewardship role, alerts about newly-
assigned Talend Data Stewardship tasks will be shown on the Welcome page, since you have
configured Talend Data Stewardship with Talend MDM. Otherwise, the alerts say that no tasks are
assigned, which is the case in this example.
3. In the Domain Configuration area, select the data container and data model you want to work
with, Customer in this example, and click Save.

9
Theory into practice: Using Integrated Matching to reconciliate customer data

Validating data and creating tasks in Talend Data Stewardship


Once you have defined a match rule and then attached the match rule to a specific entity of a data
model in Talend Studio, you can check whether the match rule works well from Talend MDM Web UI.
When you run the validation task on the Customer data records in the staging area:
• Similar staging data records that belong to the same Customer entity are matched according to
the criteria you have defined in the match rule.
• If two or more data records are similar enough, they are merged into a golden record.
• A task is created in Talend Data Stewardship for each golden record whose status is not 205. The
status 205 indicates that the record successfully passed the MDM validation phase and also exists
in the master database.
However, the Talend Data Stewardship task is not created if the golden record is built by a group
with only one staging data record.
• The validated golden record whose status is 205 is written into the master database.
For more information about the integrated matching process, see the Talend MDM Web UI User Guide.

Before you begin


• You have been assigned an MDM role with the appropriate user authorization and access rights.
• The staging Customer data container has already been populated with customer data collected
from different sources.
• You have been granted access rights to the merging tasks.
You can also find more information about credentials in Talend Data Stewardship on Talend Help
Center (https://help.talend.com).

Running the match rule from Talend MDM Web UI

Procedure
1. In the Menu panel, click Browse > Staging Data Browser to open the Staging Data Browser page.
2. Select the entity Customer from the list, and click Search.
In this example, the Customer data records in the staging area are listed.
3. In the Menu panel, click Govern > Staging Area to open the Staging Area page.
The details about the records that may require validation are displayed in the Status area.
4. Click the Start Validation button to start the validation process.

Checking match results and match details

Procedure
1. Go back to the Staging Area Browser page, and click Search to refresh the staging data records
of the Customer entity, which allows you to check the results of the match and survivorship
process.

10
Theory into practice: Using Integrated Matching to reconciliate customer data

2. Click one newly generated golden record to view its details.

If the status of a golden record is 205, no Talend Data Stewardship task is created, and in this
case, this option is not available for the golden record or its source staging record(s).
3. Click More... > Match Plan to open the Match Plan dialog box.
You can check how two similar staging data records are matched and merged according to the
match rule.
4. Click More... > Open Task to go to the associated Talend Data Stewardship task page.

11
Theory into practice: Using Integrated Matching to reconciliate customer data

After one or more tasks are resolved, you have to run the Staging validation again to take into
account the record status changes and the record value updates (if any).
For more information about the integrated matching tasks, see the Talend Data Stewardship
documentation on Talend Help Center(https://help.talend.com).
5. In the menu panel, click Govern > Data Stewardship to check all the unassigned tasks listed in
Talend Data Stewardship.

6. In the Menu panel, click Browse > Master Data Browser to open the Master Data Browser page.
7. Select the entity Customer from the list, and click Search.
You can see that there are new master Customer data records that are the result of merging
similar data records according to the match rule attached to the Customer entity.
If a master data record comes from a golden record, any updates made on the master data record
will be synchronized to the golden record in the staging area automatically only if the current
status of the golden record is still 205. However, the associated task will remain unchanged.

Data transfer between Talend MDM and Talend Data


Stewardship
Integrated matching processes involve both Talend MDM and Talend Data Stewardship. Once an MDM
user validates data records in Talend MDM Web UI, tasks are created in Talend Data Stewardship to
list duplicate records which are not merged automatically.

12
Theory into practice: Using Integrated Matching to reconciliate customer data

When working with integrated matching, any modification or validation of data done on one side
provokes changes on data on the other side as follows:
• Every time an MDM user starts a validation process on a record in the Staging Area and if the
record matches an existing group:
• The record is added to the group in the Staging Area,
• The task is reopened in Talend Data Stewardship, if it is already closed, and the new duplicate
record is added to the task.
• Every time an MDM user updates a record field in Talend MDM Web UI, updates are not
automatically propagated to Talend Data Stewardship. The changes are propagated only if the
match rule is applied and the group turns to "suspect", when a new record is integrated for
example.
• Every time an MDM user redeploys a data model on MDM, updates are automatically propagated
to the data model and campaign in Talend Data Stewardship.
• Every time a data steward updates a field in a master record in a task in Talend Data Stewardship,
updates are propagated automatically to the master record in Talend MDM Web UI when an MDM
user starts the validation process on data records in the Staging Area.

Mapping tables between Talend MDM and Talend Data


Stewardship
The following introduce the mapping tables between Talend MDM and Talend Data Stewardship.
Upon the deployment of a data model with one or more match rules attached to it, the changes of the
MDM data model are propagated to Talend Data Stewardship automatically. Meanwhile, MDM data
types are mapped intoTalend Data Stewardship data types, and MDM element constraints (if any) are
translated into Talend Data Stewardship attribute constraints.
The table below shows how Talend MDM data types are mapped into Talend Data Stewardship data
types:

MDM data type Talend Data Stewardship data type

boolean boolean

date date

time time

dateTime timestamp

int integer

integer integer

short integer

long integer

base64Binary integer

byte integer

13
Theory into practice: Using Integrated Matching to reconciliate customer data

MDM data type Talend Data Stewardship data type

negativeInteger integer

nonNegativeInteger integer

positiveInteger integer

nonPositiveInteger integer

unsignedByte integer

unsignedInt integer

unsignedLong integer

unsignedShort integer

decimal decimal

double decimal

float decimal

string text

anyURI text

normalizedtext text

tokentext text

language text

hexBinary text

duration text

AUTO_INCREMENT text

MULTILINGUAL text

PICTURE text

URL text

UUID text

In an MDM data model, an entity may have elements with constraints. Upon deployment of the MDM
data model, those constraints will be translated into the corresponding Talend Data Stewardship
attribute constraints based on their mapped Talend Data Stewardship data types in the data
stewardship data model. However, some MDM element constraints may be ignored because they do
not have the corresponding Talend Data Stewardship attribute constraints.
The table below shows how MDM element constraints are translated into constraints supported by
Talend Data Stewardship attributes:

14
Theory into practice: Using Integrated Matching to reconciliate customer data

Talend Data Stewardship data type MDM element constraint Talend Data Stewardship attribute
constraint

text minLength minLengthText

maxLength maxLengthText

length minLengthText and maxLengthText

pattern patternText

enumeration allowedValues

integer minInclusive minInteger

maxInclusive maxInteger

decimal minInclusive minDecimal

maxInclusive maxDecimal

fractionDigits scaleDecimal

date minInclusive minDate

maxInclusive maxDate

time minInclusive minTime

maxInclusive maxTime

datetime minInclusive minDatetime

maxInclusive maxDatetime

15

You might also like