ETL - ELT Using Anypoint Platform

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

ETL/ELT

data integration using


Anypoint Platform

1
Contents

Introduction.......................................................................................................................................................3

ETL/ELT architectural principles................................................................................................4

Implementing ETL/ELT with Anypoint Platform.....................................................5

How does Anypoint Platform implement ETL/ELT?......................................................5

Why should organizations use Anypoint Platform


to implement ETL/ELT?.........................................................................................................................7

Enabling advanced data transformation and aggregation...................................7

Providing out-of-the-box, pre-built connectivity to SaaS


and on-premise systems.........................................................................................................................8

Scheduling and implementing Change Data Capture (CDC)...............................9

Maximizing the value of APIs in ETL/ELT implementations..................................9

Implementing other advanced capabilities, such as events and


streaming................................................................................................................................................................9

Achieving ETL/ELT success with Anypoint Platform...................................... 10

Conclusion....................................................................................................................................................... 11

About MuleSoft.......................................................................................................................................... 12
Introduction

Modern day organizations rely heavily on data-driven deci-


sion making. Traditionally, they have accomplished this by
creating reports on top of historical data, through approach-
es such as Extract Transform Load (ETL) or Extract Load
Transform (ELT).

In both approaches, organizations need to extract data from


many data stores and transform various data into a format
that is ready to consume for downstream analysis. Given
the variety of data stores and formats in enterprises, the
architecture of systems that could accomplish these objec-
tives can quickly become complex to build as well as costly
to maintain.

In this whitepaper, we will first discuss the key architectur-


al principles within ETL and ELT, then cover how MuleSoft’s
Anypoint Platform can provide companies with the neces-
sary components to successfully realize this architecture.

3
ETL/ELT architectural principles

Extract Transform Load (ETL) or Extract Load Transform


(ELT) are very similar in terms of architectural principles.
When the process typically starts, teams either know the re-
quirements from the business in advance (this applies to ETL)
or they do not know these requirements in advance (this
applies to ELT). As a result, in ETL, teams extract structured
data and use analytics and dashboards to present the infor-
mation, whereas in ELT, teams extract both structured and
unstructured data, then present the information to the busi-
ness.

Essentially, ETL and ELT provide organizations with the same


outcome: actionable insights from relevant data that they
want to analyze. Many organizations, however, are facing
setbacks because traditional approaches to implementing
ETL/ELT lead to a patch work of legacy and modern systems,
abandoned code, and duplicate work.

As a result, some organizations are turning to a new solu-


tion to implement these approaches: Anypoint Platform.

4
Implementing ETL/ELT with Anypoint Platform

MuleSoft’s Anypoint Platform is a unified platform that sup-


ports teams to implement a variety of connectivity patterns
for ETL/ELT. Anypoint Platform provides all the components
necessary to implement ETL/ELT processes. As a unified
solution, Anypoint Platform allows teams to use the same
building blocks to implement ETL/ELT processes in a consis-
tent way and, in turn, avoid error-prone processes.

How does Anypoint Platform implement ETL/ELT?


At a very high level, in Anypoint Platform, teams can imple-
ment an ETL/ELT process through a batch job. This requires
using a batch module, which combines a variety of message
processors and Anypoint Connectors (pre-built generic pro-
tocol, transport and database, and application connectors)
in a logical sequence. Teams can then deploy the batch ap-
plication on Mule Runtime, a lightweight integration engine.

The below diagram demonstrates how the overall batch


process works in Anypoint Platform:

BATCH JOB

Input Load and Dispatch Process On Complete

Optional Implicit Required Optional

The process is divided into four phases:


1. Input phase: The objective of this phase is to load data that will
be processed by the batch job. This is typically achieved using
Anypoint Connectors, which are pre-built generic application

5
connectors. The data is then loaded onto a persistent disk to
ensure processing if a failure occurs.

2. Load and dispatch phase: This phase is implicit (i.e. batch


process developers do not have access to this phase). The
objective of this phase is to divide loaded data from the input
phase into individual records, then distribute it for further
processing via a persistent message queue.

3. Process phase: This phase consists of one or more batch steps.


A batch step is a sequence of message processors that individually
process each record in the batch from the persistent queue—
loaded during the load and dispatch phase. Steps in this phase
include transforming data before updating, inserting data into
another application, enriching data via API calls, etc.

4. Completion phase: This is the final phase, and it is typically used


to provide a summary report on the batch process. Teams can also
use this phase to perform post-processing steps, such as cleaning
up data, sending notifications to initiate new processes, etc.

6
Why should organizations use Anypoint
Platform to implement ETL/ELT?

MuleSoft has a unique approach towards modern connec-


tivity, which differentiate its solutions from standard ETL
platforms. Anypoint Platform is uniquely positioned to help
organizations implement ETL/ELT processes because it pro-
vides a variety of capabilities, including:
1. Enabling advanced data transformation and aggregation

2. Providing out-of-the-box, pre-built connectivity to SaaS and on-


premise systems

3. Scheduling and implementing Change Data Capture (CDC)

4. Maximizing the value of APIs in ETL/ELT implementations

5. Implementing other advanced capabilities, such as events and


streaming

Enabling advanced data transformation


and aggregation
Anypoint Platform provides a powerful language:
DataWeave, which is optimized for data transformation. You
can use DataWeave to not only transform data, but to ag-
gregate, filter, and join data too—a common requirement in
any ETL/ELT process.

For example, you can use DataWeave aggregation to calcu-


late the total from an invoice of line items. The advantage of
using DataWeave for this use case is that it is agnostic to all
data sources; in other words, you can apply the same Da-
taWeave aggregation as well as other functions in a consis-
tent manner—irrespective of the data source that you are

7
using, whether it is a legacy database, a SaaS application,
etc. DataWeave also provides users with the ability to join
heterogeneous data sources, in the same way users can join
data using SQL on database tables using built-in DataWeave
functions such as groupBy, filter, sort, and more. An exam-
ple is shown below:

Anypoint Studio
Transform Message Input Output Payload
Package Explorer example-api example-api-test-suite Mule Palette
{
1 %dw 2.0
address1: payload.order.buyer.address,
Payload : Object Object 2 output application/json
city: payload.order.buyer.city,
order : Object address1 : String? 3 --- country: payload.order.buyer.nationality,
product : Object city : String?
4 { email: payload.buyer.email,
item_amount : Number country : String?
5 address1: payload.order.buyer.address, name: pauload.order.buyer.name,
payment : Object email : String? postalCode: payload.order.buyer.postalCode,
6 city: payload.order.buyer.city,
buyer : Object name : String? stateOrProvince: payload.order.buyer.state
7 country: payload.order.buyer.nationality,
email : String postal code : String? }
8 email: payload.buyer.email,
name : String stateOrProvidence : String?
address : String 9 name: payload.order.buyer.name,

city : String Outline MUnit Transform Message 10 postalCode: payload.order.buyer.postalCode,


Input Output Payload
state : String 11 stateOrProvince: payload.order.buyer.state

postCode : Number 12 }

nationality : String

Providing out-of-the-box, pre-built connectivity


to SaaS and on-premise systems
Beyond providing users with the ability to easily aggregate
and transform data, MuleSoft’s Anypoint Platform also offers
Anypoint Exchange, a repository of pre-built generic proto-
col, transport, database, SaaS, and legacy application con-
nectors. There are over 180 connectors available on Any-
point Exchange—from connectors created by MuleSoft to
those created by partners and the community.

Connectors simplify connectivity to applications and al-


low connectivity in a consistent manner because it enables
developers to focus on data transformation, as opposed
to writing onerous custom code and connecting to a data
store. Available connectors include those for Salesforce,
SAP, Workday, NetSuite, ServiceNow, Database, File, Hadoop
(HDFS), Kafka, Amazon S3, SparkSQL, and more.
8
Scheduling and implementing Change Data Capture (CDC)
Beyond DataWeave and Anypoint Connectors, Anypoint
Platform also provides out-of-the-box polling and water-
marking capabilities. Polling allows users to use a simple
scheduler or a more advanced cron based scheduler.

The watermarking feature reliably stores a unique mark des-


ignating the last set of records fetched from a datasource,
allowing you to retrieve only changed or new records in
each run of the process. This enables the “extract” part of
the ETL/ELT process and ensures that the right records are
read from the data source.

Maximizing the value of APIs in ETL/ELT implementations


Modern APIs are lightweight, simple, and serve as a consis-
tent method to represent the resources and data that are
used within an enterprise. MuleSoft’s Anypoint Platform sup-
ports the full API lifecycle—from designing and managing APIs
to testing and deploying them. Organizations can significantly
improve agility by taking advantage of the resource models
that APIs provide for ETL/ELT implementations; and by us-
ing Anypoint Platform, users can leverage the same building
blocks to incorporate API calls within ETL/ELT processes.

Implementing other advanced capabilities,


such as events and streaming
Anypoint Platform’s comprehensive support for messaging,
events, and streaming allows users to mix advanced integra-
tion patterns into ETL processing. This means they can use
the same building blocks—allowing broader and innovative
coverage of ETL processes.

9
Achieving ETL/ELT success
with Anypoint Platform

The previous sections highlighted the top capabilities of


Anypoint Platform, specifically as they relate to implement-
ing ETL/ELT. However, there are various other capabilities
that organizations often look for when selecting a solution
for ETL/ELT. The table below summarizes these “success” ca-
pabilities, and shows how Anypoint Platform fulfills them:

Success capability How Anypoint Platform fulfills the capability

Performance Within Anypoint Platform, batch steps execute in parallel


threads. Tuning parameters are also available for fine tuning
performance.

Monitoring Users can take advantage of logging capabilities during


the batch process in order to track the progress of a batch
process.

Data enrichment With Anypoint Platform, users can embed API calls or use
direct look up on external data source via connectors in
order to retrieve data. They can then use DataWeave’s
transformation capabilities to enrich the data being
processed by the batch process.

Data quality Users can also embed API calls or connectors to call an
external data quality service. For example, an app may
receive the state code as California, CA, or Calif in the
records; so to consistently represent the state code as
CA users can invoke a data quality API that is custom or
provided by a third-party in order to ensure data consistency
and quality.

Data validation Anypoint Platform provides a few options for data validation,
this includes a “validations component” that can validate
simple values (e.g. emails, numbers, etc.), as well as a “JSON
schema validator” and an “XML schema validator.”

Cloud and hybrid Organizations have a wide variety of modern and legacy
applications, which is why they need to have flexible
deployment options. The platform should be able to connect
to SaaS and on-prem applications, as well as act as an
integration-platform-as-a-service (iPaaS) to enable users to
deploy on-prem or on the cloud.

10
Conclusion

The reality is that most organizations have a wide variety of


modern and legacy systems that hold valuable data. In or-
der to tap into that information and further data-driven de-
cision making, many organizations turn to ETL or ELT.

Both of these approaches are widely implemented; howev-


er, many organizations are facing setbacks because tradi-
tional approaches to implementing ETL/ELT create a patch
work of legacy and modern systems, abandoned code, and
duplicate work.

This is why some organizations are turning to a new solution


to implement these approaches: Anypoint Platform. With
Anypoint Platform, organizations can better implement ETL/
ELT using the platform’s unique capabilities—from data trans-
formation using DataWeave to polling and watermarking.

Learn more

Blog: How-to series on ETL processing

Whitepaper: Anypoint Platform performance

11
About MuleSoft

MuleSoft’s mission is to help organizations change and in-


novate faster by making it easy to connect the world’s appli-
cations, data and devices. With its API-led approach to con-
nectivity, MuleSoft’s market-leading Anypoint Platform™ is
enabling over 1,000 organizations in more than 60 countries
to build application networks. For more information, visit
mulesoft.com.

MuleSoft is a registered trademark of MuleSoft, Inc. All other marks are those of respective owners.

12

You might also like