Talend Examples MDM EN 7.2.1M6
Talend Examples MDM EN 7.2.1M6
Talend Examples MDM EN 7.2.1M6
Examples
7.2.1M6
Contents
Copyright........................................................................................................................ 3
Copyright
Adapted for 7.2.1M6. Supersedes previous releases.
Publication date: May 23, 2019
Copyright © 2019 Talend. All rights reserved.
The content of this document is correct at the time of publication.
However, more recent updates may be available in the online version that can be found on Talend
Help Center.
Notices
Talend is a trademark of Talend, Inc.
All brands, product names, company names, trademarks and service marks are the properties of their
respective owners.
End User License Agreement
The software described in this documentation is provided under Talend 's End User Software and
Subscription Agreement ("Agreement") for commercial products. By using the software, you are
considered to have fully understood and unconditionally accepted all the terms and conditions of the
Agreement.
To read the Agreement now, visit http://www.talend.com/legal-terms/us-eula?
utm_medium=help&utm_source=help_content
3
Theory into practice: Using Integrated Matching to reconciliate customer data
Prerequisites
Before you begin this scenario, make sure:
• You have configured the integration of Talend Data Stewardship with Talend MDM. For more
information, see Configuring the integration of Talend Data Stewardship with Talend MDM on
page 5.
• You have created an empty project in Talend Studio.
4
Theory into practice: Using Integrated Matching to reconciliate customer data
Procedure
1. If the MDM server is up and running, stop it first. Otherwise, skip this step.
2. Browse to the file <$INSTALLDIR>/conf/mdm.conf and open it.
INSTALLDIR indicates the path where the MDM server has been installed.
3. Locate the properties related to Talend Data Stewardship, configure the URL, username and
password to access Talend Data Stewardship respectively, and leave the other options unchanged.
# TDS settings
######################################################
tds.root.url=http://localhost:19999
[email protected]
tds.password=owner1
tds.core.url=/data-stewardship
tds.schema.url=/schemaservice
tds.api.version=/api/v1
tds.batchsize=50
The value of the property tds.user must be a valid username of a Talend Administration Center
user who also serves as a Data Stewardship User and has the Campaign Owner data stewardship
role.
The password is encrypted during the MDM server startup.
For more information about creating users, see Creating Data Stewardship users.
For more information about the properties, see Talend Installation Guide.
4. Save your changes and restart the MDM server to take into account your updates.
5
Theory into practice: Using Integrated Matching to reconciliate customer data
Procedure
1. In Talend Studio, create a data model Customer and its corresponding data container
Customer.
For more information about how to create a data model, see the Talend Studio User Guide on
Talend Help Center (https://help.talend.com).
2. Add a new entity Customer in the Customer data model.
3. Add elements to the Customer entity.
In this example, the detailed information of a customer is added, for example, lname, fname,
city, and gender.
4. Create a view Customer for the Customer entity, which you can use to interact with the data
from Talend MDM Web UI.
Next, you need to create and define a match rule, attach the match rule to the data model, and
then verify that the match rule works well from Talend MDM Web UI.
Procedure
1. In the MDM Repository tree view, right-click Match Rule and then select New from the contextual
menu.
2. In the dialog box that opens, define a name for the new match rule.
If needed, enter information in the Purpose and Description fields to better describe your match
rule.
3. Click Finish to close the dialog box.
The newly created match rule is displayed under the Match Rule node. You need to further define
the characteristics of the match rule in the Match Rule Editor that opens.
4. In the Record linkage algorithm section, select T-Swoosh.
You can use the T-Swoosh algorithm to find duplicates and to define how two similar records are
merged to create a master record, using a survivorship function.
5. In the Match and Survivor section, define the criteria to use when matching staging data records.
6
Theory into practice: Using Integrated Matching to reconciliate customer data
In this example, add two match keys Firstname and Lastname, select Jaro-Winkler as the
matching function, set both thresholds to 0.8, and select Longest (for strings) as the survivorship
function.
6. In the Default Survivorship Rules section, define how to survive matches for certain data types:
Boolean, Number and Date.
If you do not specify the behavior for any or all data types, the default behavior is applied.
Once you define the match rule, you must attach it to a specific entity of a data model.
You cannot deploy a match rule directly to the MDM server. Rather, match rules are deployed
along with the data model to which they are attached.
Procedure
1. In the MDM Repository tree view, open the data model Customer to which you want to attach
the match rule MatchCustomer.
2. Select the entity Customer to which you want to attach the match rule and, in the Properties
view, open the Rules tab.
3. In the Match Rule section, select the match rule you want to attach to this data model from the
drop-down list.
If needed, click Open Match Rule to open and view the details of the match rule.
4. In the table, map each match key to the corresponding simple type element at the root level in
the entity using the selection window.
7
Theory into practice: Using Integrated Matching to reconciliate customer data
In this example, the match key Firstname is mapped to Customer/fname, and the match key
Lastname is mapped to Customer/lname.
5. Save your changes.
Results
Now you need to deploy the data model Customer to the MDM server to take into account the
changes you have made, so that the match rule is deployed together with the data model to which it
is attached.
Procedure
1. In the MDM Repository tree view, right-click the data model you want to deploy, and select
Deploy To....
2. In the dialog box that opens, click Add Dependencies and click Continue after adding the
dependencies.
3. In the Select a server location definition dialog box, select the server where you want to deploy
the data model, and then click OK.
A dialog box pops up, indicating that the deployment is successful.
4. Click OK to complete deploying the data model.
8
Theory into practice: Using Integrated Matching to reconciliate customer data
In this example, since Talend MDM is configured with Talend Data Stewardship, when you deploy
the data model Customer to an MDM server, the following changes are propagated to Talend
Data Stewardship automatically:
• A new data stewardship data model named customer - customer - tmdm is created with
attributes that correspond to each simple type element in the Customer entity;
• A new merging campaign named customer - customer - tmdm is created using the new data
stewardship data model.
Meanwhile, MDM data types are mapped into Talend Data Stewardship data types, and MDM
element constraints (if any) are translated into Talend Data Stewardship attribute constraints. For
more information, see Mapping tables between Talend MDM and Talend Data Stewardship on
page 13.
For more information about how the changes of an MDM data model are propagated to Talend
Data Stewardship upon its deployment to an MDM server, see Talend Studio User Guide.
You can also create the relevant campaign(s) and data stewardship data model(s) for an MDM
data model through the REST API. For more information, see Talend Help Center (https://help.t
alend.com).
Results
You have now finished preparing your simple MDM project in Talend Studio.
Procedure
1. In a web browser, enter the URL for your MDM Server.
For example, http://localhost:8180/talendmdm/ui.
2. On the authentication page, enter the default administrator user name and password,
administrator/administrator, and then click the Login button.
The Welcome Page opens. If your MDM role includes a data stewardship role, alerts about newly-
assigned Talend Data Stewardship tasks will be shown on the Welcome page, since you have
configured Talend Data Stewardship with Talend MDM. Otherwise, the alerts say that no tasks are
assigned, which is the case in this example.
3. In the Domain Configuration area, select the data container and data model you want to work
with, Customer in this example, and click Save.
9
Theory into practice: Using Integrated Matching to reconciliate customer data
Procedure
1. In the Menu panel, click Browse > Staging Data Browser to open the Staging Data Browser page.
2. Select the entity Customer from the list, and click Search.
In this example, the Customer data records in the staging area are listed.
3. In the Menu panel, click Govern > Staging Area to open the Staging Area page.
The details about the records that may require validation are displayed in the Status area.
4. Click the Start Validation button to start the validation process.
Procedure
1. Go back to the Staging Area Browser page, and click Search to refresh the staging data records
of the Customer entity, which allows you to check the results of the match and survivorship
process.
10
Theory into practice: Using Integrated Matching to reconciliate customer data
If the status of a golden record is 205, no Talend Data Stewardship task is created, and in this
case, this option is not available for the golden record or its source staging record(s).
3. Click More... > Match Plan to open the Match Plan dialog box.
You can check how two similar staging data records are matched and merged according to the
match rule.
4. Click More... > Open Task to go to the associated Talend Data Stewardship task page.
11
Theory into practice: Using Integrated Matching to reconciliate customer data
After one or more tasks are resolved, you have to run the Staging validation again to take into
account the record status changes and the record value updates (if any).
For more information about the integrated matching tasks, see the Talend Data Stewardship
documentation on Talend Help Center(https://help.talend.com).
5. In the menu panel, click Govern > Data Stewardship to check all the unassigned tasks listed in
Talend Data Stewardship.
6. In the Menu panel, click Browse > Master Data Browser to open the Master Data Browser page.
7. Select the entity Customer from the list, and click Search.
You can see that there are new master Customer data records that are the result of merging
similar data records according to the match rule attached to the Customer entity.
If a master data record comes from a golden record, any updates made on the master data record
will be synchronized to the golden record in the staging area automatically only if the current
status of the golden record is still 205. However, the associated task will remain unchanged.
12
Theory into practice: Using Integrated Matching to reconciliate customer data
When working with integrated matching, any modification or validation of data done on one side
provokes changes on data on the other side as follows:
• Every time an MDM user starts a validation process on a record in the Staging Area and if the
record matches an existing group:
• The record is added to the group in the Staging Area,
• The task is reopened in Talend Data Stewardship, if it is already closed, and the new duplicate
record is added to the task.
• Every time an MDM user updates a record field in Talend MDM Web UI, updates are not
automatically propagated to Talend Data Stewardship. The changes are propagated only if the
match rule is applied and the group turns to "suspect", when a new record is integrated for
example.
• Every time an MDM user redeploys a data model on MDM, updates are automatically propagated
to the data model and campaign in Talend Data Stewardship.
• Every time a data steward updates a field in a master record in a task in Talend Data Stewardship,
updates are propagated automatically to the master record in Talend MDM Web UI when an MDM
user starts the validation process on data records in the Staging Area.
boolean boolean
date date
time time
dateTime timestamp
int integer
integer integer
short integer
long integer
base64Binary integer
byte integer
13
Theory into practice: Using Integrated Matching to reconciliate customer data
negativeInteger integer
nonNegativeInteger integer
positiveInteger integer
nonPositiveInteger integer
unsignedByte integer
unsignedInt integer
unsignedLong integer
unsignedShort integer
decimal decimal
double decimal
float decimal
string text
anyURI text
normalizedtext text
tokentext text
language text
hexBinary text
duration text
AUTO_INCREMENT text
MULTILINGUAL text
PICTURE text
URL text
UUID text
In an MDM data model, an entity may have elements with constraints. Upon deployment of the MDM
data model, those constraints will be translated into the corresponding Talend Data Stewardship
attribute constraints based on their mapped Talend Data Stewardship data types in the data
stewardship data model. However, some MDM element constraints may be ignored because they do
not have the corresponding Talend Data Stewardship attribute constraints.
The table below shows how MDM element constraints are translated into constraints supported by
Talend Data Stewardship attributes:
14
Theory into practice: Using Integrated Matching to reconciliate customer data
Talend Data Stewardship data type MDM element constraint Talend Data Stewardship attribute
constraint
maxLength maxLengthText
pattern patternText
enumeration allowedValues
maxInclusive maxInteger
maxInclusive maxDecimal
fractionDigits scaleDecimal
maxInclusive maxDate
maxInclusive maxTime
maxInclusive maxDatetime
15