Teradata To Snowflake Migration Guide
Teradata To Snowflake Migration Guide
Teradata To Snowflake Migration Guide
MIGRATION GUIDE
Don’t let your past determine your future
2 Why Migrate?
3 Strategy—Thinking About Your Migration
6 Migrating Your Existing Teradata Warehouse
10 Need Help Migrating?
11 Appendix A—Data Type Conversion Table
12 Appendix B—SQL Considerations
CHAMPION GUIDES
WHY
MIGRATE?
Decades ago, Teradata identified the need to 4. Cost: Snowflake allows true storage and THE CORE OF SNOWFLAKE
manage and analyze large volumes of data. compute scalability without the need for complex
Snowflake delivers the performance, concurrency,
reconfiguration as your data or workloads grow.
But just as the volume, velocity, and variety of and simplicity needed to store and analyze all
data has since changed, the cloud has enabled data available to an organization in one location.
WHY SNOWFLAKE?
what’s possible today with modern data Snowflake’s technology combines the power of data
Snowflake’s innovations break down the technology warehousing, the flexibility of big data platforms,
analytics. For example, by separating compute and architecture barriers that organizations still scalable compute and storage, and live data sharing
from storage, Snowflake Cloud Data Platform experience with traditional data warehouse vendors. at a fraction of the cost of traditional solutions.
has developed a modern cloud data platform Only Snowflake has achieved all six of the defining
that automatically and instantly scales storage qualities of a true cloud data platform, as displayed
in the chart below.
and compute capacity in a way not possible
with Teradata, whether the current Teradata
system is on-premises or hosted in the cloud.
Snowflake accomplishes this with its multi-
cluster, shared data architecture.
3
CHAMPION GUIDES
WHAT YOU DON’T NEED TO WORRY ABOUT Workload management an on-premises Teradata system, you run a risk of
When you migrate to Snowflake from Teradata, you Workload management is unnecessary in a over- or under-configuring your system. Even with
can ignore the following factors because they are no Snowflake environment due to its multi-cluster Teradata Vantage, you have the similar capacity
longer relevant: architecture, which allows you to create separate planning risk as compute and storage are fixed per
virtual warehouses for your disparate workloads and instance. If you need more capacity, you must buy it
Data distribution and primary indexes avoid resource contention completely. in predefined increments. With Snowflake’s elastic
Snowflake does not need primary indexes. Since storage and compute architecture, you never have
compute is separate from storage in Snowflake’s Statistics collection this risk, so you can save money and avoid the time
architecture, the data is not pre-distributed to the Snowflake automatically captures statistics, relieving previously spent on extensive planning.
MPP compute nodes. Snowflake has MPP compute DBAs from having to set up jobs to collect statistics
nodes that do not rely on the data being distributed for performance tuning. It’s automatic in Snowflake,
ahead of time. so you no longer have to remember to add new tables
to the process when your data warehouse grows.
Since Snowflake’s data is not pre-distributed, it can
scale to more parallel compute nodes instantly. With Capacity planning
Teradata, the data would have to run a reconfig, With Snowflake, you pay for only what you use.
with new AMPs/nodes added and new hashmaps Snowflake is a SaaS product that is further enhanced
created before the data from the physical table for efficiency with the option for further cost
could be redistributed to the new AMPs. This is reductions for customers who want to pre-purchase
a process that requires significant planning and usage. On the flip side, with capacity planning for
resources, which impacts performance but is not
necessary with Snowflake.
4
CHAMPION GUIDES
Disaster recovery
Teradata has several disaster recovery scenarios.
Many of them require the purchase of another
system, and purchasing software such as Unity to
implement these scenarios. With Snowflake, none of
this is necessary. Snowflake leverages many of the
built-in features of the cloud, such as the automatic
replication of data built into AWS. Snowflake is
implemented in multiple regions on AWS, Azure, and
Google Cloud and supports cross-cloud replication
for disaster recovery to your cloud provider and
region of choice. There is no work on your part to
establish this.
5
5
CHAMPION GUIDES
MIGRATING YOUR EXISTING
TERADATA WAREHOUSE
To successfully migrate your enterprise data Teradata. Keep in mind, Snowflake is self-tuning control system). You’ll also want to edit these scripts
warehouse to Snowflake, you need to develop and has a unique architecture. You won’t need to to remove code for extraneous features and options
generate code for any indexes, partitions, or storage not needed in Snowflake, such as primary indexes
and follow a logical plan that includes the items
clauses of any kind that you may have needed in and other storage or distribution related clauses.
in this section.
Teradata. You only need basic DDL, such as CREATE Depending on the data types you used in Teradata,
TABLE, CREATE VIEW, and CREATE SEQUENCE. you may also need to do a search-and-replace in
MOVING YOUR DATA MODEL Once you have these scripts, you can log into your the scripts to change some of the data types to
As a starting point for your migration, you’ll need Snowflake account to execute them. Snowflake optimized types. For a list of these data
to move your database objects from Teradata to types, see Appendix A.
If you have a data modeling tool, but the model is
Snowflake. This includes the databases, tables,
not current, we recommend you reverse engineer
views, and sequences in your existing data
the current design into your tool, then follow the
warehouse that you want to move to Snowflake
approach outlined above.
Cloud Data Platform. In addition, you may want
to include all of your user account names, roles, Using existing DDL scripts
and objects grants. At a minimum, create the user You can begin with your existing DDL scripts if you
who owns the Teradata database on the target don’t have a data modeling tool. But you’ll need the
Snowflake system before migrating data. most recent version of the DDL scripts (in a version
Which objects you decide to move will be highly
dependent on the scope of your initial migration.
There are several options for making this happen.
The following sections outline three possible
approaches for moving your data model from
Teradata to Snowflake.
6
CHAMPION GUIDES
Creating new DDL scripts scripts. Rather than do a search and replace after of terabytes, or even a few petabytes of data, a
If you don’t have current DDL scripts or a data the script is generated, you can code these data practical approach may be to extract the data to
modeling tool, you will need to extract the metadata type conversions directly into the metadata extract files and move it via a service such as AWS Snowball,
needed from the Teradata data dictionary to script. The benefit is that you have automated the Azure Data Box, or Google Transfer Appliance. If you
generate these scripts. But for Snowflake, this task extract process so you can do the move iteratively. have to move 100s of petabytes or even exabytes of
is simpler since you won’t need to extract metadata Plus, you will save time editing the script after the data, AWS Snowmobile is likely the more appropriate
for indexes and storage clauses. fact. Additionally, coding the conversions into the option.
script is less error-prone than any manual cleanup
As mentioned above, depending on the data If you choose to move your data manually, you will
process, especially if you are migrating hundreds or
types in your Teradata design, you may also need need to extract the data for each table to one or
even thousands of tables.
to change some of the data types to Snowflake more delimited flat files in text format using Teradata
optimized types. You will likely need to write a Parallel Transporter (TPT). Then upload these files
SQL extract script of some sort to build the DDL MOVING YOUR EXISTING DATA SET using the PUT command into your cloud provider’s
Once you have built your objects in Snowflake, you’ll blob storage. We recommend these files be between
want to move the historical data already loaded in 100 MB and 1 GB to take advantage of Snowflake’s
your Teradata system over to Snowflake. To do this, parallel bulk loading.
you can use a third-party migration tool, an ETL After you have extracted the data and moved it to
(extract, transform, load) tool, or a manual process your cloud provider’s blob storage, you can begin
to move the historical data. Choosing among these loading the data into your table in Snowflake using
options, you should consider how much data you the COPY command. You can check out more details
have to move. For example, to move 10s or 100s about the COPY command in online documentation.
7
CHAMPION GUIDES
Migrating BI tools • Is there a specific time period in which this MOVING THE DATA PIPELINE
Many of your queries and reports are likely to use workload needs to complete? Between certain AND ETL PROCESSES
hours? You can easily schedule any Snowflake
an existing business intelligence (BI) tool. Therefore, Snowflake is optimized for an ELT (extract, load,
virtual warehouse to turn on and off, or just auto
you’ll need to account for migrating those transform) approach. However, Snowflake supports
suspend and automatically resume when needed.
connections from Teradata to Snowflake. You’ll also many traditional ETL (extract, transform, load) and
have to test those queries and reports to be sure • How much compute will you need to meet that data integration solutions. We recommend a basic
you’re getting the expected results. window? Use that to determine the appropriately migration of all existing data pipelines and ETL
sized virtual warehouse. processes to minimize the impact to your project
This should not be too difficult since Snowflake
unless you are planning to significantly enhance or
supports standard ODBC and JDBC connectivity, • How many concurrent connections will this
workload need? If you normally experience
modify them. Given the fact that testing and data
which most modern BI tools use. Many of the
bottlenecks you may want to use the Snowflake validation are key elements of any changes to the
mainstream tools have native connectors to
multi-cluster warehouse for those workloads to data pipeline, maintaining these processes, as is, will
Snowflake. Don’t worry if your tool of choice is
allow automatic scale out during peak workloads. reduce the need for extensive validation.
not available. You should be able to establish a
connection using either ODBC or JDBC. If you have • Think about dedicating at least one large virtual
questions about a specific tool, your Snowflake warehouse for tactical, high-SLA workloads.
contact will be happy to help.
• If you discover a new workload, you can easily add
Handling workload management it on demand with Snowflake’s ability to instantly
As stated earlier, the workload management provision a new virtual warehouse.
required in Teradata is unnecessary with Snowflake.
The multi-cluster architecture of Snowflake allows
you to create separate virtual warehouses for your
disparate workloads to avoid resource contention
completely. Your workload management settings
in Teradata (TASM or TIWM) will give you a good
idea of how you’ll want to set up Snowflake virtual
warehouses. However, you’ll need to consider the
optimal way to distribute these in Snowflake. As a
starting point, create a separate virtual warehouse
for each workload. You will need to size the virtual
warehouse according to resources required to
meet the SLA for that workload. To do so, consider
the following:
8
CHAMPION GUIDES
Snowflake has worked diligently to ensure that CUT OVER
the migration of processes running on traditional Once you migrate your data model, your data, your
ETL platforms is as painless as possible. Native loads, and your reporting over to Snowflake, you
connectors for tools such as Talend and Informatica must plan your switch from Teradata to Snowflake.
make the process quick and easy. Here are the fundamental steps:
Run the data pipeline in both Snowflake and Teradata 1. Execute a historic, one-time load to move all the
during the initial migration. This way, you can simplify existing data.
the validation process by enabling a quick comparison
of the results from the two systems. Once you’re 2. Set up ongoing, incremental loads to collect
new data.
sure queries running against Snowflake are producing
identical results as queries from Teradata, you can 3. Communicate the cut-over to all Teradata users,
be confident that the migration did not affect data so they know what’s changing and what they
quality. But you should see a dramatic improvement should expect.
in the performance.
4. Ensure all development code is checked in/backed
For data pipelines that require re-engineering, up, which is a good development practice.
you can leverage Snowflake’s scalable compute
and bulk-loading capabilities to modernize your 5. Point production BI reports to pull data
from Snowflake.
processes and increase efficiency. You may consider
taking advantage of Snowpipe for loading data 6. Run Snowflake and Teradata in parallel for a few
continuously as it arrives to your cloud storage days and perform verifications.
provider of choice, without any resource contention
or impact to performance. Snowflake makes 7. Turn off the data pipeline and access to Teradata
it easy to bring in large datasets and perform for the affected users and BI tools.
transformations at any scale.
9
CHAMPION GUIDES
NEED HELP
MIGRATING?
Snowflake’s solution partners and Snowflake’s Whether your organization is fully staffed for a
Professional Services team offer several services to platform migration or you need additional staffing,
accelerate your migration and ensure a successful Snowflake’s solution partners and Snowflake’s
implementation. The Snowflake Alliances team is Professional Services team have the skills and tools
working with top-tier system integrators that have to make this process easier, so you can reap the full
experience performing platform migrations. benefits of Snowflake.
Snowflake solution partners and Snowflake’s To find out more, contact Snowflake’s solutions
Professional Services team understand the partner team or the Snowflake sales team.
benefits of Snowflake and apply their experience To understand the business benefits of migrating
and knowledge to the specific challenges that from Teradata to Snowflake, click here.
your organization may face during the migration
process. They offer services ranging from high-
level architecture recommendations to manual code
conversion. Additionally, many Snowflake partners
have built tools to automate and accelerate the
migration process.
10
CHAMPION GUIDES
APPENDIX A:
DATA TYPE CONVERSION TABLE
This appendix contains a sample of some of the data type mappings you need to know when moving
from Teradata to Snowflake. Many are the same, but you will need to change a few.
BYTEINT BYTEINT
SMALLINT SMALLINT
INTEGER INTEGER
BIGINT BIGINT
DECIMAL DECIMAL
FLOAT FLOAT
NUMERIC NUMERIC
CHAR Up to 64K CHAR Up to 16MB
VARCHAR Up to 64K VARCHAR Up to 16MB
LONG VARCHAR Up to 64K VARCHAR Up to 16MB
CHAR VARYING(n) CHAR VARYING(n)
REAL REAL
DATE DATE
TIME TIME
TIMESTAMP TIMESTAMP
BLOB BINARY Up to 8MB
CLOB VARCHAR Up to 16MB
BYTE BINARY
VARBYTE VARBINARY
GRAPHIC VARBINARY
JSON VARIANT
ARRAY ARRAY
11
CHAMPION GUIDES
APPENDIX B:
SQL CONSIDERATIONS
Below are examples of some changes you may need to make to your Teradata SQL queries so
they will run correctly in Snowflake. Note that this is not an all-inclusive list.
DELETE ALL SYNTAX Teradata has SQL syntax with views (DDL) that isn’t
Teradata supports adding ALL to the end of a used in Snowflake:
DELETE statement. In Snowflake, ALL at the end
• LOCKING ROW FOR ACCESS
of a DELETE statement isn’t supported and needs
to be removed. • SEL (must be spelled out as SELECT)
12
ABOUT SNOWFLAKE
Snowflake Cloud Data Platform shatters the barriers that prevent organizations from unleashing the true value from their
data. Thousands of customers deploy Snowflake to advance their businesses beyond what was once possible by deriving all
the insights from all their data by all their business users. Snowflake equips organizations with a single, integrated platform
that offers the only data warehouse built for any cloud; instant, secure, and governed access to their entire network of data;
and a core architecture to enable many other types of data workloads, including a single platform for developing modern
data applications. Snowflake: Data without limits. Find out more at snowflake.com.