Migration Project-Background and Objectives
Migration Project-Background and Objectives
Migration Project-Background and Objectives
One of the drawbacks to being an early adopter is the risk that technological advances will
impact on your platform, necessitating updates and amends. Like straws on the proverbial
camel’s back, so the march of progress eventually weighs down your application / website
until the time comes to rebuild from the bottom up.
The rebuild project is typically fun and exciting… a new and modern look-and-feel, fancy
responsive user interfaces and some of those functional components you’ve seen your
competitor offer and have been salivating over: all are within reach. From user testing to
design briefs, specifiying functionality to prototypying, the typical redesign project, handled
well, can be a real motivating and unifying force across an organisation.
But then the deformed uncle who has been banished to the attic knocks on the ceiling and a
bucket full of spanners rain down on your great work! Or, to be more prosaic, the realisation
surfaces that years of data that has built up cannot be abandoned and will need to be migrated
to the new platform.
The objective of this article is to provide a survival kit for project managers who find
themselves taking on the reigns of a data migration project. Key concepts, terms and
considerations that are common to every migration project will be outlined below.
As-is: This refers to the current (pre-migration, pre-rebuild) items. Data, application,
functionality, all can be prefixed with ‘as-is’ to distinguish them from ‘to-be’.
To-be: This refers to the new, post migration items: data, application etc.
Schema: Your schema is the pattern that your data conforms to. All data will conform to
a pattern – from well-constructed relational databases to seemingly unconnected sets of
HTML pages, an underlying pattern can be found. Understanding the pattern (schema) that
underlies / defines both as-is and to-be data sets is key to a successful migration.
Mapping: This term refers to the way that as-is and to-be items are linked. This mapping
tends to encapsulate a set of rules that we apply to the as-is data, thereby converting it to to-
be data.
Migration engine: This term refers to the process(es) / application(s) used to apply
the mapping rules to the as-is data set to generate the to-be data.
As-is dates were in the format dd/mm/yy (so the ‘schema’ for as-is date data was
DD/MM/YY)
To-be dates are in the format dd/mm/yyyy (so the ‘schema’ for as-is data is dd/mm/yyyy)
As-is data was mapped to to-be data and using the simple mapping rules below:
Rule1: dd = DD
Rule2: mm = mm
Rule3: yyyy = “19” + YY
[Of course you’ll already be wondering about years that begin ‘20’ – hold that
thought!]
An exhaustive answer to this question would require a much more extensive article than this
one, but the items listed below illustrate some of the factors that influence the complexity of a
migration project:
Volume of data.
The volume of as-is data impacts on the complexity of the migration, most saliently in
relation to the number of instances of edge cases.
Sheer volume will also impact on processing requirements and the volume of to-be data has a
direct relationship to the amount of testing required.
It is also likely (but by no means guaranteed) that more data means a more complex mapping.
Having an incomplete schema to migrate into is an example of the classic ‘moving goalposts’
problem, but is typical of a migration project.
Data that is in scope generally needs to be migrated without loss of meaning, but a lot of
meaning comes from context (e.g. an article that refers to an adjacent image) and preserving
all this meaning within the to-be system is a challenge for both the to-be schema design and
the migration.
Before you commission developers to work on your engine, its important to investigate
whether a manual migration is more appropriate. Admittedly a manual approach does sound
archaic, but you should give it serious consideration – for reasons too extensive to go into
here I would be very tempted to go manual, if there were tangible benefits in doing so.
Bear in mind that even if you choose to migrate automatically, it is inevitable that some
manual intervention will be required. Edge cases need to be handled, some remediation
undertaken and testing conducted.
Typically the data is a tangible asset that is of huge value to the organisation. It is important
to have someone in a position of ownership over this data whose remit is to protect it and
ensure that the to-be state is within tolerance. Indeed, ‘tolerance’ (or the extent to which loss
of data is acceptable) needs to be defined.
Flippancy aside the migration project is going to present daily challenges and unforeseen
pitfalls, and being able to respond to these in an agile manner will significantly de-risk the
project. Don't even consider waterfall unless the to-be schema is fixed and immutable from
the specification stage of the migration project.
Typical risks:
It is inevitable that some of the risks below will apply to your project:
Risk: Changes to the to-be schema will impact on the mapping, potentially impacting on
the migration engine or even data that has already been migrated.
Mitigation: if the to-be schema is not locked down the migration team and data owner
need to be at the centre of any decisions that will impact on the to-be schema
Impact: Additional time.
Note that there will be a number of different situations which could lead to schema changes
(e.g. scope creep) and this risk will underlie these.
By contrast a migration from a well defined and logical as-is schema into a similarly well
defined to-be schema is orders of magnitude more simple.
So, bear in mind that at some point in the future the technological landscape will have
changed to the extent that another migration exercise will be necessary. Careful design of the
to-be schema will have a major impact on the ease of future migrations, and it will pay
dividends in the long run to ensure that your current to-be schema does not become a thing of
loathing when the next migration exercise is necessary.
I’ve got experience with Netflix OSS Ribbon client. When using Spring
Cloud sometimes usage of Ribbon client is totally transparent for you (if
you have discovery service in your architecture). Ribbon can be easily
integrated with other Netflix OSS tools like Eureka (service discovery) or
Hystrix (circuit breaker). You can define different load balancing rules for
Ribbon: round robin, weighted time or availibility filtering. For some
examples of usage Ribbon client with Spring Cloud you can refer to my
book Mastering Spring Cloud