Migration Project-Background and Objectives

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 6
At a glance
Powered by AI
Some of the key takeaways from the document are that data migration projects involve understanding the schema of both the existing and new systems, creating mappings between the two, and using migration engines to apply the mappings to convert data. Other important aspects discussed are defining roles like the data owner and choosing an appropriate project methodology.

Some of the core concepts discussed include 'as-is' to refer to existing items, 'to-be' to refer to new items after migration, schemas which define the structure of data, mappings which link 'as-is' and 'to-be' items, and migration engines which apply mappings to convert data.

Some common risks associated with data migration projects include changes to the new schema impacting mappings and already migrated data, some data not being able to be mapped across, and changes made to existing data during migration not being migrated properly.

Background and objectives:

One of the drawbacks to being an early adopter is the risk that technological advances will
impact on your platform, necessitating updates and amends. Like straws on the proverbial
camel’s back, so the march of progress eventually weighs down your application / website
until the time comes to rebuild from the bottom up.

The rebuild project is typically fun and exciting… a new and modern look-and-feel, fancy
responsive user interfaces and some of those functional components you’ve seen your
competitor offer and have been salivating over: all are within reach. From user testing to
design briefs, specifiying functionality to prototypying, the typical redesign project, handled
well, can be a real motivating and unifying force across an organisation.

But then the deformed uncle who has been banished to the attic knocks on the ceiling and a
bucket full of spanners rain down on your great work! Or, to be more prosaic, the realisation
surfaces that years of data that has built up cannot be abandoned and will need to be migrated
to the new platform.

And so the data migration project is born.

The objective of this article is to provide a survival kit for project managers who find
themselves taking on the reigns of a data migration project. Key concepts, terms and
considerations that are common to every migration project will be outlined below.

Some key terminology / core concepts:


As with any niche, a lexicon of terms has been adopted by the migrationists. Most important
amongst these are:

As-is: This refers to the current (pre-migration, pre-rebuild) items. Data, application,
functionality, all can be prefixed with ‘as-is’ to distinguish them from ‘to-be’.
To-be: This refers to the new, post migration items: data, application etc.
Schema: Your schema is the pattern that your data conforms to. All data will conform to
a pattern – from well-constructed relational databases to seemingly unconnected sets of
HTML pages, an underlying pattern can be found. Understanding the pattern (schema) that
underlies / defines both as-is and to-be data sets is key to a successful migration.
Mapping: This term refers to the way that as-is and to-be items are linked. This mapping
tends to encapsulate a set of rules that we apply to the as-is data, thereby converting it to to-
be data.
Migration engine: This term refers to the process(es) / application(s) used to apply
the mapping rules to the as-is data set to generate the to-be data.

It follows that a data migration project involves using a migration engine to


convert an as-is data set into a to-be data set. Key to doing so is understanding
the as-is schemaand the to-be schema and creating
a mapping between them.
A simple example of a data migration
Consider the Y2K example below:

As-is dates were in the format dd/mm/yy (so the ‘schema’ for as-is date data was
DD/MM/YY)
To-be dates are in the format dd/mm/yyyy (so the ‘schema’ for as-is data is dd/mm/yyyy)
As-is data was mapped to to-be data and using the simple mapping rules below:
Rule1: dd = DD
Rule2: mm = mm
Rule3: yyyy = “19” + YY
[Of course you’ll already be wondering about years that begin ‘20’ – hold that
thought!]

Complexity of data migration project:


The core concepts of a data migration project are therefore relatively easy to conceptualise.
The problem with this is that it can (and typically does) lead to the impression that the task of
migrating the data itself is simple, and whilst this may be true in some cases, I’ve yet to come
across one. The majority are a mine-field of politics, conflicting rules and compromises that
will see the migration promoted from being a peripheral activity of the rebuild to being right
at the centre of the programme.

So what makes a seemingly straightforward process so complex?

An exhaustive answer to this question would require a much more extensive article than this
one, but the items listed below illustrate some of the factors that influence the complexity of a
migration project:

Number of rules required to map as-is and to-be schemas.


Each mapping rule carries some overhead:

 Analysis to define the rule.


 Implementation of the rule within the migration engine.
 Processing data using the engine.
 Testing the converted data.
In addition, with every rule added to the mapping there is an increased risk of rule collision.

Number of edge cases not covered by the mapping rules.


Edge cases occur when there is as-is data that does not map neatly to to-be data.
 The Y2K example above (where the year could have been in
the 20th or 21st century) is an example of an edge case: the
rule (to convert all 2 digit years to 19XX) will not work all
the time. Ideally a secondary rule to discriminate when to
substitute ‘19’ or ‘20’ needs to be derived. If no rule can be
defined then human judgement will be needed to resolve
this.
Not only do edge cases need to be identified and defined, there is also usually some manual
overhead associated with dealing with these.

Volume of data.
The volume of as-is data impacts on the complexity of the migration, most saliently in
relation to the number of instances of edge cases.

Sheer volume will also impact on processing requirements and the volume of to-be data has a
direct relationship to the amount of testing required.

It is also likely (but by no means guaranteed) that more data means a more complex mapping.

% completion of the definition of the to-be schema.


If the to-be schema is 100% defined before the migration project commences then the
complexity of the project is significantly less than if the schema is being developed in
parallel.

Having an incomplete schema to migrate into is an example of the classic ‘moving goalposts’
problem, but is typical of a migration project.

Sacredness of the as-is data.


To-be data comes in many forms – articles, images, personal data, web pages, transaction
data, etc. Typically some of the as-is data will be obsolete and need not be migrated, but this
data will not be included in the mapping (and strictly speaking this data is not part of the as-is
data we are interested in).

Data that is in scope generally needs to be migrated without loss of meaning, but a lot of
meaning comes from context (e.g. an article that refers to an adjacent image) and preserving
all this meaning within the to-be system is a challenge for both the to-be schema design and
the migration.

Automated versus manual migration of data:


So you’ve got a chunk of as-is data and you need to convert this to to-be data. You have your
2 schemas and your mapping between the 2. You are ready to begin!

Before you commission developers to work on your engine, its important to investigate
whether a manual migration is more appropriate. Admittedly a manual approach does sound
archaic, but you should give it serious consideration – for reasons too extensive to go into
here I would be very tempted to go manual, if there were tangible benefits in doing so.

Bear in mind that even if you choose to migrate automatically, it is inevitable that some
manual intervention will be required. Edge cases need to be handled, some remediation
undertaken and testing conducted.

Role of the Data Owner:


By and large the structure of a data migration project is similar to any other project, but there
is one key role that should be filled properly: that of the data owner.

Typically the data is a tangible asset that is of huge value to the organisation. It is important
to have someone in a position of ownership over this data whose remit is to protect it and
ensure that the to-be state is within tolerance. Indeed, ‘tolerance’ (or the extent to which loss
of data is acceptable) needs to be defined.

As an example of where data loss would be deemed acceptable, consider a migration of


organisational data. It is feasible that the fax number could be lost without ‘cost’, but losing
phone number would not be acceptable.

Key features of a successful Data Owner:


 This role is NOT a technical one. It is important that
someone can speak out on behalf of data when technical
issues put pressure on what can be migrated and how.
 The data owner should have some veto power. The data
owner needs to have clout within the project team and needs
to be able to exercise a veto over development decisions that
will impact on the integrity of the migrated data.
 The data owner is responsible for UAT and ultimately signs
off the migrated data (bear in mind that signing off the data
is a completely different exercise from signing off the new
development into which the data has been migrated).
Choosing Project management methodology:
Do you go with an Agile or Waterfall approach?
What you should do is weigh up all your options, speak to the project owners, assess the
situation, then choose Agile.

Flippancy aside the migration project is going to present daily challenges and unforeseen
pitfalls, and being able to respond to these in an agile manner will significantly de-risk the
project. Don't even consider waterfall unless the to-be schema is fixed and immutable from
the specification stage of the migration project.

Typical risks:
It is inevitable that some of the risks below will apply to your project:

Risk: Changes to the to-be schema will impact on the mapping, potentially impacting on
the migration engine or even data that has already been migrated.
Mitigation: if the to-be schema is not locked down the migration team and data owner
need to be at the centre of any decisions that will impact on the to-be schema
Impact: Additional time.
Note that there will be a number of different situations which could lead to schema changes
(e.g. scope creep) and this risk will underlie these.

Risk: There will be some data that cannot be mapped across.


Mitigation: The business should be prepared for a manual intervention phase (or the
loss of data).
Impact: Additional time for manual intervention or data that is not migrated.
Risk: There is a risk that amends made to the as-is data during the migration project will
not be migrated
Mitigation: Formal change freezes should be implemented as required, well
communicated and properly 'policed'.
Impact: None (if mitigation is put in place early enough).
A final word:
The messiest of all migrations occur when the as-is data has evolved and developed over
years and years without any guiding principles or long term vision. Usually the ‘schema’ for
such a data set is disparate and inconsistent and therefore complicated to map to the to-be
schema. A website of HTML pages typically falls into this category.

By contrast a migration from a well defined and logical as-is schema into a similarly well
defined to-be schema is orders of magnitude more simple.

So, bear in mind that at some point in the future the technological landscape will have
changed to the extent that another migration exercise will be necessary. Careful design of the
to-be schema will have a major impact on the ease of future migrations, and it will pay
dividends in the long run to ensure that your current to-be schema does not become a thing of
loathing when the next migration exercise is necessary.

I’ve got experience with Netflix OSS Ribbon client. When using Spring
Cloud sometimes usage of Ribbon client is totally transparent for you (if
you have discovery service in your architecture). Ribbon can be easily
integrated with other Netflix OSS tools like Eureka (service discovery) or
Hystrix (circuit breaker). You can define different load balancing rules for
Ribbon: round robin, weighted time or availibility filtering. For some
examples of usage Ribbon client with Spring Cloud you can refer to my
book Mastering Spring Cloud

You might also like