Leading The Transformation
Leading The Transformation
Leading The Transformation
“It’s long past the time when executives who are looking for better performance
from software development can expect an “Agile transformation” to solve their
problems. Today’s wise executive will know enough about the underlying prin-
ciples of software systems to ask the right questions and make sure that their
organization is solving the right problems. That’s where this book comes in—it
contains just enough theory to inform executives about critical issues and just
enough detail to clarify what’s important and why.”
—Mary Poppendieck, author of The Lean Mindset
and the Lean Software Development series
“Leading the Transformation is a critical asset for any leadership in a large devel-
opment environment seeking to transform the organization from the swamp of
restriction to the freeway of efficient delivery. The book provides real-life data
and solid advice for any leader embarking on or in the middle of an enterprise
delivery transformation.”
—Lance Sleeper, Senior Manager of
Operations, American Airlines
“Before you undertake a major change in your development process, you want
to learn from people who have gone before you. Gary and Tommy draw on their
experience to prepare you for how to plan and what to expect as you roll out Agile/
DevOps methodology in your enterprise. Reading this book, I learnt valuable
lessons on planning a scaled-out Agile transformation and what signposts to look
for along the way as we embarked on the transformation journey at Cisco.”
—Vinod Peris, VP Engineering, Routing & Optical, Cisco
Gary Gruver and Tommy Mouser
Foreword by Gene Kim
IT Revolution
Portland, OR
Copyright © 2015 Gary Gruver and Tommy Mouser
ISBN 978-1-942788-01-0
IT Revolution
Portland, OR
[email protected]
Acknowledgements 107
FOREWORD
I first came across Gary and Tommy’s amazing work transforming how HP
developed firmware code for the entire HP LaserJet Enterprise product line when
Jez Humble told me about it. It was an astounding story of Continuous Delivery
and DevOps for many reasons. Their story of being able to increase the develop-
ment of new customer functionality by two to three times was a breathtaking leap
in developer productivity. But that this success story was for embedded firmware
for printers made it almost defy belief.
I believe that the HP LaserJet case study is important because it shows how
Continuous Delivery and DevOps principles transcend all technology and that
DevOps isn’t just for open source software. Instead, it should be equally applicable
for complex enterprises systems, systems of record—even those 30-plus-year-old
COBOL applications that run on mainframes.
This theory was put to the test and proven when Gary became VP of QE,
Release, and Operations at Macys.com. For three years, he helped contribute to
the transformation that went from doing thousands of manual tests every ten days
to thousands of automated tests being run daily, increasing the ability to have all
applications closer to a deployable state.
In fifteen years, I suspect that everything Gary and Tommy have done for large,
complex organizations will be common knowledge for technology executives.
However, in the meantime, the challenges of how large, complex organizations
adopt DevOps will create incredible competitive advantage, and I hope this book
becomes a catalyst for making that knowledge more commonplace.
—Gene Kim
Portland, Oregon
April 2015
FOREWORD 11
CHAPTER 1
UNDERSTANDING THE
TRANSFORMATION
BUSINESS
OBJECTIVES
APPLYING DEVOPS
PRINCIPLES AT SCALE
ENTERPRISE-LEVEL
CONTINUOUS
IMPROVEMENT
INTEGRATION &
REQUIREMENTS SCHEDULING DEVELOPMENT
QUALIFICATION
SCOPE SCHEDULE
technical solutions. While required, they represent a smaller portion of the effort.
Most of the real challenges are in organizational change management and shifting
the culture. Executives need to understand that the capacity of the organization
to absorb change is the biggest constraint to rolling out these improvements. This
means that the organizational change management capacity is the most precious
resource, and it should be actively managed and used sparingly.
The approach to rolling out an enterprise-level Agile transition should focus
on the business breakthroughs the Agile principles are intended to achieve while
taking into consideration the capacity of an organization to change. This is where
executives can add knowledge and expertise.
Summary
What we hope executives walk away with after reading this example is that most
Agile implementations struggle to provide expected business results because they
focus on rolling out Agile teams the “right” way instead of applying Agile princi-
ples at scale. This approach creates a lot of change management challenges in an
organization without fundamentally addressing the basic Agile business principles
of an enterprise backlog and always-releasable code. We believe our approach
BUSINESS
OBJECTIVES
T he reason these two Agile authors say “don’t do Agile” is
that we don’t think you can ever be successful or get all the
possible business improvements if your objective is simply to
do Agile and be done. Agile is such a broad and evolving meth-
odology that it can’t ever be implemented completely. Someone
in your organization can at any time Google “What is Agile
APPLYING DEVOPS
development” and then PRINCIPLES
argue for pair programing or Extreme Program or less
AT SCALE
planning, and you begin a never-ending journey to try all the latest ideas without
anyENTERPRISE-LEVEL
clear reason why. Additionally, Agile is about continuous improvement, so by
CONTINUOUS
IMPROVEMENT
ED
definition you will never be done.
At HP we never set out to do Agile. Our focus was simply on improving pro-
ductivity. The firmware organization had been the bottleneck for the LaserJet
business for a couple of decades. In the few years before this transformation
started, HP tried to spend its way out of the problem by hiring developers around
the world, to no avail. Since throwing money at the problem didn’t work, we
needed to engineer a solution.
We set off on a multiyear journey to transform the way we did develop-
ment with the business objective of freeing up capacity for innovation and
ensuring that, after the transformation, firmware would not be the bottleneck
for shipping new products. This clear objective really helped guide our journey
and prioritize the work along the way. Based on this experience and others like it,
we think the most important first step in any transformation is to develop a clear
set of business objectives tuned to your specific organization to ensure you are
well positioned to maximize the impact of the transformation on business results.
We see many companies that embark on a “do Agile” journey. They plan a big
investment. They hire coaches to start training small Agile teams and plan a big
organizational change. They go to conferences to benchmark how well they are
“doing DevOps or Agile.” They see and feel improvements, but the management
teams struggle to show bottom-line business results to the CFO. Not having clear
business objectives is a key source of the problem. If they started out by focusing on
the business and just using DevOps ideas or implementing some Agile methods that
would provide the biggest improvements, they would find it much easier to show
bottom-line financial results. This worked at HP. When we started, firmware had
been the bottleneck in the business for a couple of decades and we had no capacity
for innovation. At the end of a three-plus-year journey, adding a new product
to our plans was not a major cost driver. We had dramatically reduced costs
from $100M to $55M per year and increased our capacity for innovation by
eight times.
To be clear, achieving these results was a huge team effort. For example, it
required us to move to a common hardware platform so that a single trunk of code
could be applied to the entire product line up. Without the collaboration with our
partners throughout the business we could not have achieved these results. Having
a set of high-level business objectives that the entire organization is focused on is
the only way to get this type of cross-organizational cooperation and partnership.
These types of results will not happen when you “do Agile.” It takes a laser-like
focus on business objectives, a process for identifying inefficiencies in the current
process, and implementing an ongoing, continuous improvement process.
Where to Start
Once you have a clear set of business objectives in place, the next step is deter-
mining where to start the transformation. You can’t do everything at once and this
is going to be a multiyear effort, so it is important to start where you will get the
biggest benefits.
From our perspective, there are two options that make sense for determining
where to start. The first is the activity-based accounting and cycle-time approach
that we used at HP. You start with a clear understanding of how people are
spending their time and the value the software is intended to provide to your
business. This approach addresses the biggest cost and cycle-time drivers that
are not key to your business objectives. The challenge with this approach is that
sometimes it can be very time-consuming to get a good understanding of all the
cost and cycle-time drivers.
The other approach is to focus on the areas that are typically the biggest sources
of inefficiencies in most enterprise software development efforts: maintaining
Cycle-Time Cost
Our build, integration, and testing process was driving both cost and cycle-times,
so we figured that was a good place to start. When we looked in the DevOps and
Agile toolbox, we picked continuous integration as one of our first objectives. We
also realized we were spending a lot of effort doing detailed planning for devel-
opment that never went as planned. We knew we had to do something different
for planning, so the idea of iterations with short-term objectives felt like a great
second improvement idea. We never did set off with the objective to transition
from Waterfall to Agile development or implement DevOps. We never went in
and justified to yet higher levels of management that we wanted to fund a big
transformation. We just set in play a continuous improvement process where we
would set objectives and review results each iteration. Additionally, we started
focusing in on improving our build and integration process because this was
where we thought we would get the best improvements in productivity.
This approach led us on a three-plus-year journey, one monthly iteration at a
time, which ended up providing some pretty dramatic improvements. Over time
2008 2011
2008 2011
2011 intentionally not = 100%. The difference was used for further process improvements.
2008 2011
~70% reduction in FW
Costs out of control
development cost per program
Summary
It is difficult to address the unique and business-specific results that come out
of activity-based accounting and cycle-time driver approach. Therefore, the
rest of this book will focus on applying Agile and DevOps principles at scale
to provide ideas and recommendations to address the traditional challenges of
transforming these parts of your development process. Whether you do the
detailed activity-based accounting and cycle-time view of your process or start
with applying DevOps and Agile principles at scale, it is important that you begin
with clear business objectives. This transformation is going to take a lot of effort,
and if you don’t have clear business objectives driving the journey, you can’t expect
the transformation to provide the expected business results. At HP we were able
to get two to three times the improvement in business results because that was the
focus of all our changes. If we couldn’t see clearly how a change would help meet
those goals, we did not waste any time with it, even though it might have been
popular in the Agile community. It is this type of focus on the business objectives
that enables the expected business results. These objectives also help in the organi-
zational change management process, where you can constantly remind the team
of where you are, where you are going, and the expected benefits.
ENTERPRISE-LEVEL CONTINUOUS
IMPROVEMENT
utives need to establish strategic objectives that make sense and that can be used
to drive plans and track progress at the enterprise level. These should include key
deliverables for the business and process changes for improving the effectiveness
of the organization.
At HP we made sure we had a set of objectives that we used to drive the
organization during each iteration. There were typically four to seven high-level
objectives with measurable sub-bullets we felt were most important to achieve,
things like shipping the next set of printers or taking the next step forward in
our test automation framework. It also included things like improving our test
passing rates, since stability had dropped, or a focus on feature throughput , since
development was getting behind.
The table in figure 5 is a scrubbed version of the actual objectives from MM30,
our 30th monthly iteration. During MM30 we were completing the rearchitecture
of the codebase and getting ready to release the first scanner on Windows XPe with
a MIPS processor. This required our highest priority to be completing the bit release
and improving stability. We were also in the process of supporting product testing
MINI-MILESTONE
OBJECTIVES
LEARNINGS CONVERSATIONS
Summary
A culture of continuous improvement at the enterprise level is important for any
large successful transformation. At HP, we never set off to implement a three-year
detailed plan to transform our business. We just started with our business objec-
tives and worked down a path of continuous improvement one month at a time.
In the end, we looked back and realized we had delivered dramatic business
results. It required setting enterprise-level objectives each iteration, tracking
progress, and making the right adjustments for the next iteration. You are never
going to know all the right answers in the beginning, so you are going to need
an effective enterprise-level process for learning and adjusting along the way. An
enterprise-level continuous improvement process is a key tool executives use for
leading transformations and learning from the organization during the journey.
C
APPLYING DEVOPS
onvincing large, traditionalPRINCIPLES
organizations
AT SCALE to embrace
Agile principles for planning is difficult because most
executivesENTERPRISE-LEVEL
are unfamiliar with the unique characteristics
CONTINUOUS
IMPROVEMENT
of software development or the advantages of Agile. As we
PLANNING & PRIORITIZED
BACKLOG have discussed, they expect to successfully manage software
development using a Waterfall planning and delivery process,
just like they successfully manage other parts of their business. This approach,
though, does not utilize the unique advantages of software or address its
inherent difficulties. Software is infinitely flexible. It can be changed right up
to the time the product is introduced. Sometimes it can be changed even later
than that with things like software or firmware upgrades, websites, and software
as a service (SaaS).
Software does have its disadvantages, too. Accurately scheduling long-term
deliveries is difficult, and more than 50% of all software developed is either not
used or does not meet its business intent. If executives managing software do
not take these differences into account in their planning processes, they are
likely to make the classic mistake of creating detailed, inaccurate plans for
developing unused features. At the same time they are eliminating flexibility,
which is the biggest advantage of software, by locking in commitments to
these long-range plans.
100%
ACCURACY
PLANNING INVESTMENT
planning processes is that this push eats up valuable capacity the organization could be
using for identifying and delivering actual business value. Therefore, organizations
need to decide whether their primary objective is to deliver long-term accurate
plans to its executives or if it is to deliver business value to its customers.
There are two other approaches traditional organizations tend to use when
addressing this dilemma of long-term accuracy in plans. The first, used widely in
most successful Waterfall organizations, is to put ample buffer into the schedule.
This is typically defined as the time between a milestone like “functionality
complete” and the software release. If the analysis shows the development is going
to take six months, then commit to the delivery in nine to twelve months. The
other is to commit to the schedule and when it is not on track, just add resources
and work the Development teams night and day on a death march until the
program ships.
In either case you are not changing the curve in the graph. Your schedule is still
inaccurate. You are just trying to fight that reality with different techniques. Getting
Process Intent
Now that you have a better understanding of the unique characteristics of
software, the key to designing a good software planning process in an enterprise is
being very clear about the Process Intent. The planning process is used primarily
to support different business decisions. So you should be clear about the decisions
required and invest the least amount possible to obtain enough information to
support these decisions.
While the curve of the graph shows it is hard to get a high level of accuracy
for long-range plans, that does not mean that businesses can live without firm
long-range commitments for software. The challenge is to determine what requires
a long-term commitment and ensure these commitments don’t take up the majority
of your capacity. Ideally, if these long-term commitments require less than 50% of
your development capacity, a small planning investment can provide the information
required to make a commitment.
On the other hand, if these firm long-range commitments require 90–110%
of the capacity, this creates two significant problems. First, when the inevitable
discovery does occur during development, you don’t have any built-in capacity to
handle the adjustments. Second, you don’t have any capacity left to respond to new
FIGURE 8
Component 10 (40-50)
Component 11 (20-30)
Component 6 (20-30)
Component 9 (20-30)
Component 4 (30-40)
Component 5 (20-30)
Component 7 (20-30)
Component 3 (30-40)
Component 2 (20-25)
Component 1 (25-30)
Component 8 (15-25)
Other Items
TOTAL
RANK INITIATIVE
1 Initiative A 21 5 3 1 30
2 Initiative B 3 4 17 24
3 Initiative C 5 1 2 1 9
4 Initiative D 10 2 2 2 16
5 Initiative E 20 3 5 28
6 Initiative F 23 5 6 2 36
7 Initiative G 2 2
8 Initiative H 5 5
9 Initiative I 3 3
10 Initiative J 20 27 17 21 39 17 9 150
11 Initiative K 3 30 3 3 14 12 65
12 Initiative L 2 2
13 Initiative M 3 10 6 6 6 31
29 25 51 30 20 25 23 12 38 74 26 59 401
# Time maxed-out
360
MM94
240
MM93
152 Avg
120
0
MM081
MM082
MM083
MM084
MM085
MM086
MM087
MM088
MM089
MM090
MM091
MM092
QUEUE
DELIVERY CYCLE
the long-range commitments, the only requirements details created were the
unique characteristics of the new printer. Then, in the initiative phase, the new
features were only defined in enough detail to support the high-level estimates
described in figure 9. Once these initiatives were getting closer to development,
the system engineers would break the initiatives into more detailed user stories
so that everyone better understood what was expected. Then right before devel-
opment started, these user stories were reviewed in feature kickoff meetings
with all the engineers involved in the development along with the Marketing
person and system engineers. At this time the engineers had a chance to ask any
clarifying questions or recommend potentially better approaches to the design.
Then after everyone was aligned, the Development engineers would break down
the high-level user stories into more detailed developer requirements, including
their schedule estimates.
This just-in-time approach to managing our requirements provided a
couple of key advantages. First, the investment in breaking the requirements
down into more detail was delayed until we knew they would get prioritized
for development. Second, since there were not large piles of requirements
inventory in the system when the understanding of the market changed, there
were not a lot of requirements that needed to be reworked.
Summary
Creating an enterprise planning process that leverages the principles of Agile starts
with embracing the characteristics that are unique to software. It also requires
planning for different time horizons and designing a planning process that takes
advantage of the flexibility software provides. These changes in the planning
processes can also help to eliminate waste in the requirements process by reducing
the amount of inventory that isn’t ever prioritized or requires rework as the under-
standing of the market changes. The HP LaserJet example shows how embracing
the principles of Agile can provide significant business advantages. This example is
not a prescription for how to do it, but it does highlight some important concepts
every organization should consider.
First, the planning process should be broken down into different planning
horizons to support the business decisions required for different time frames.
Second, if the items that must have long-term commitments require more than
50% of your capacity, you should look for architectural and process improve-
ments so that those commitments are not a major capacity driver. Third, since
the biggest inherent advantage of software is its flexibility, you should not
eliminate that benefit by allowing your planning process to lock in all of your
capacity to long-range commitments—especially since these long-range features
are the ones most likely to end up not meeting the business objectives. Lastly
you should consider moving to just-in-time creation of requirements detail to
minimize the risk of rework and investment in requirements that will never get
prioritized for development.
The specifics of how the planning process is designed needs to be tuned to
meet the needs of the business. It is the executives’ role in the organization to
appreciate how software development is different from the rest of their business
Summary
Applying DevOps principles in the enterprise has huge advantages but can take
several months to years, depending on the size of the organization. Therefore, you
need a clear set of business objectives to help define priorities and track progress
to ensure you implement the most valuable changes first.
Because it is not a physical device that has to be manufactured, software can
deliver new capabilities with very low manufacturing and distribution costs.
Most large, traditional organizations, though, have trouble taking advantage of
software’s flexibility because their development processes do not allow them to
economically release small batches of new capabilities. Applying DevOps prin-
ciples at scale is all about evolving the development process to make it easy to
support frequent releases of new capabilities. In the next few chapters, we will
show how to use these objectives to guide your implementation of applying
DevOps principles at scale.
Developing on Trunk
Getting a team to apply DevOps principles at scale is a challenge. In traditional
organizations when you describe the vision and direction of large-scale CD on
trunk to the engineers, they immediately will tell you why it won’t work and
how it will break when bringing in large changes. Most large, traditional orga-
nizations have been working so far away from CD for so long they can’t imagine
how it could possibly work. Therefore, the executive team is going to have to
be very active in leading and driving this change. This happens by sharing the
strategy/vision and showing how other traditional organizations have success-
fully transformed their development processes.
Executives need to understand that driving trunk to a stable state on a day-to-
day basis in the enterprise is going to be a big change management challenge but
this is probably the most important thing they can do to help coordinate the work
across teams and improve the effectiveness of the organization. Once engineers
have worked in an environment like this they can’t imagine having worked any
other way. Before they have experienced it, though, they can’t imagine how it
could ever work. Executives have to help lead this cultural transformation by
starting with an achievable goal that includes a small set of automated tests, then
increasing the minimal level of stability allowed in the system over time.
From a technical perspective the team will have to learn development practices
like versioning services, rearchitecture through abstraction, feature flags, and evo-
lutionary database design techniques.
In versioning services, you don’t modify a service if it is going to break the
existing code. Instead you create a new version of that service with the new capa-
bility. The application code is then written to ensure it calls the version of the
service it is expecting. Then over time as all the application code is updated to the
new service the old version of the service can then be deprecated. This approach
enables always-releasable code and ensures the application layer and services layers
of the code can move independently.
Rearchitecture through abstraction is a technique that allows you to
refactor major parts of the code without breaking existing functionality. In
this case, you find an interface in the code where you can start the refactoring.
This interface should have a nice set of automated tests so you can ensure the
new and old code will behave the same way with the rest of the system and
you can test both the old and new code paths through this interface to make
sure they are working. The old interface is used to keep the code base working
until the new refactored code is ready. Then once the refactoring is complete,
the old code is deprecated and the new code takes over running with the
broader system.
Feature flags are another technique that enables developers to write new code
directly on trunk but not expose it to the rest of the system by using feature flags
to turn it off until it is ready.
Finally, evolutionary database is a technique like versioned services that enables
you to make database schema changes without breaking existing functionality.
Similar to versioning services, instead of modifying the existing data, you add
new versions with the schema changes and then deprecate the old versions when
all the applications are ready. These are not really technically challenging changes,
but they are different ways of working that will require coordination and process
changes across the development groups.
Setting these rules and driving the changes in behavior required a fair amount
of focus and energy by the leadership team. We found that it was taking a lot of
our change management capacity to keep the builds green. Therefore, over time we
learned how to have our tools automatically enforce the behavior we wanted. We
did this by creating gated commits or auto-revert.
Gated commits enabled us to constantly have green builds and avoid train
wrecks of broken builds without using the management team’s change manage-
ment capacity. If a code commit or group of commits did not result in a green
build, instead of having everyone swarm to fix the issue on the trunk, we gated
Shifting Mindsets
When we set out the vision of one main branch for all current and future products
using continuous integration at HP, most of the engineers thought we had lost
our minds. They would avoid making eye contact when we walked down the hall
Summary
Transforming software development processes in a large organization is a big
change management challenge. While technology can help, if the executives are
not willing to lead and drive cultural changes like developing on trunk, then no
amount of technology is going provide the necessary breakthroughs. Developers
need to know they can develop stable code on trunk and should take responsi-
bility for ensuring the code base is always close to release quality by keeping the
Architecture
Executives need to understand the characteristic of their current architecture
before starting to apply DevOps principles at scale. Having software based off
of a clean, well-defined architecture provides a lot of advantages. Almost all of
the organizations presenting leading-edge delivery capabilities at conferences have
architectures that enable them to quickly develop, test, and deploy components
of large systems independently. These smaller components with clean interfaces
enable them to run automated unit or subsystem tests against any changes and to
independently deploy changes for different components. In situations like this,
applying DevOps principles simply involves enabling better collaboration at the
team level.
On the other hand, large, traditional organizations frequently have tightly
coupled legacy applications that can’t be developed and deployed independently.
Ideally traditional organizations would clean up the architecture first, so that they
could have the same benefits of working with smaller, faster-moving independent
teams. The reality is that most organizations can’t hold off process improvements
waiting for these architectural changes. Therefore, executives are going to have to
find a pragmatic balance between improving the development processes in a large,
complex system and fixing the architecture so the systems are less complex over
time. We encourage you to clean up the architecture when and where you can, and
we also appreciate that this is not very realistic in the short term for most tradi-
tional organizations. As a result, we will focus on how to apply DevOps principles
at scale assuming you still have a tightly coupled legacy architectures. In these
situations where you are coordinating the work across hundreds to thousands
of people, the collaboration across Development and Operations requires much
more structured approaches like Continuous Delivery.
Embedded software and firmware has the unique architectural challenge of
leveraging common stable code across the range of products it needs to support.
If the product differences are allowed to propagate throughout the code base,
the Development team will be overwhelmed porting the code from one product
to another. In these cases it is going to be important to either minimize the
product-to-product hardware differences and/or isolate the code differences
to smaller components that support the product variation. The architectural
challenge is to isolate the product variation so as much of the code as possible can
be leveraged unchanged across the product line.
Green Build
Red Build
Test Automation
A large amount of test automation is necessary when changing the development
processes for large, traditional organizations. Without a solid foundation here,
your feedback loops are going to be broken and there won’t be an effective method
for determining when to promote code forward in your pipeline. Writing good
test automation is even more difficult than writing good code because it requires
strong coding skills plus a devious mind to think about how to break the code. It is
frequently done poorly because organizations don’t give it the time and attention
that it requires. Because we know it is important we always try to focus a lot of
attention on test automation. Still, in almost every instance we look back on, we
wish we had invested more because it is so critical.
Test Environment
Running a large number of automated tests on an ongoing basis is going to
require creating environments where it is economically feasible to run all these
tests. These test environments also need to be as much like production as possible
so you are quickly finding any issues that would impact delivery to the customer.
For websites, software as a service, or packaged software this is fairly straightfor-
ward with racks of servers. For embedded software or firmware this is a different
story. There the biggest challenge is running a large number of automated tests
cost-effectively in an environment that is as close as possible to the real opera-
tional environment.
Since the code is being developed in unison with the product it is typically
cost prohibitive, if not impossible, to create a large production-like test farm.
Therefore, the challenge is to create simulators and emulators that can be used
for real time feedback and a deployment pipeline that builds up to testing on the
product. A simulator is code that can be run on a blade server or virtual machine
that can mimic how the product interacts with the code being developed.
The advantage here is that you can set up a server farm that can quickly run
thousands of hours of testing a day in a cost-effective way. The disadvantage is that
it is not fully representative of your product, so you are likely to continue finding
different defects as you move to the next stages of your testing. The objective
here is to speed up and increase the test coverage and provide feedback to enable
Summary
Transforming the software development practices of large, traditional organiza-
tions is a very big task, which has the potential for dramatic improvements in
productivity and efficiency. This transformation requires a lot of changes to how
the software is developed and deployed. Before going too far down this path,
it is important to make sure there is a solid foundation in place to support the
changes. Executives need to understand the basic challenges of their current archi-
tecture and work to improve it over time. The build process needs to support
managing different artifacts in the system as independent entities. Additionally, a
solid, maintainable test automation framework needs to be in place so developers
can trust the ability to quickly localize defects in their code when it fails. Until
these fundamentals are in place, you will have limited success effectively trans-
forming your processes.
Deployment pipeline defines how new code is integrated into the system,
deployed into different environments, and promoted through various stages of
testing. This can start with static code analysis and unit testing at the component
level. If these tests pass, the code can progress to a more integrated application-level
testing with a basic build acceptance-level testing. Once these build acceptance
tests are passing, the code can progress to the next stage, typically full regression
and various forms of non-functional testing like performance and security testing.
If full regression and non-functional testing results reach an acceptable pass rate,
the code is ready to deploy into production. The deployment pipeline in CD
defines the stages and progression model for your software changes.
Orchestrator
Scripted Auto
Trigger Environment Deployment EDD
Testing
Once you have thought through what tools to use for each process, it is important
to think through how to architect scripted environments and scripted deploy-
ments. The objective is to leverage, as much as is possible, a well-architected
set of common scripts across different stages in the deployment pipeline
from Development to Production. These scripts need to be treated just like
application code that is being developed and promoted through the deployment
pipeline. Ideally you would want the exact same script used in each stage of the
deployment pipeline.
While ideal, using the same script is frequently not possible because the
environments tend to change as you progress up the deployment pipeline on
the way to the full production environment. Therefore, your architectural
approach needs to address how to handle the differences across environ-
ments while creating one common script that can be qualified and promoted
through the deployment pipeline. This typically is accomplished by having
one common script where variables are passed in to define the differences
between the environments.
ENVIRONMENTAL
DESCRIPTORS
COMMON SCRIPT
Post-Deployment Validation
The other key principle to include as part of the scripted deployment process
is creating post-deployment validation tests to ensure the deployment was
successful on a server-by-server basis. The traditional approach is to create envi-
ronments, deploy the code, and then run system tests to ensure it is working
correctly. The problem with this approach is that once a system test fails, it leads
to a long and painful triage process. The triage process needs to determine if the
test is failing due to bad code, a failed deployment, or other environmental dif-
ferences. The scripted environment process ensures environmental consistency.
This still leaves differentiating between code and deployment issues, which can
be challenging. The scripted deployment process ensures the same automated
approach is used for every server; however, it does not naturally ensure the
deployments were successful for every server or that the various servers can
actually interface with one another as expected. This becomes especially prob-
lematic when the deployment only fails for one of several common servers,
because this leads to the system tests failing intermittently, depending on if that
run of the tests happens to hit the server where the deployment failed. When
this happens, it results in a very long and frustrating triage process to find the
bad server and/or routing configurations amongst tens to hundreds of servers.
The good news is that there is a better approach. There are techniques to
isolate code issues from deployment issues during the deployment process. You
can create post-deployment validations to ensure the deployment was success-
ful and localize the offending deployment issues down to the fewest number
of potentially offending servers or routing devices as quickly as possible. Each
step in the process is validated before moving on to the next step. Once the servers
and routers are configured, there should be tests to ensure that the configuration has
occurred correctly. If not, fix the issues before deploying the code.
The next step is to validate that the deployment was successful for every server
individually. This can be done by automatically checking log files for warnings or
errors and writing specific tests to ensure the deployment was successful. These
checks should be done for every server and routing configuration before starting
1 Configure servers/routing
device and validate data
2 Deploy code
1 Configure servers/routing
device and validate data
2
Deploy code & validate
successful deployment
rolling it out broadly. This will work well if that component can be developed,
qualified, and deployed independently. If instead you are working with a tightly
coupled architecture, this approach is not going to work. It will take awhile to
implement all those steps for one application, and since it still can’t be deployed
independently, it won’t provide any benefits to the business. When this happens,
your transformation is at risk of losing momentum because everyone knows you
are investing in CD but they are not seeing or feeling any improvements in their
day-to-day work.
This is why it is important to use the business objectives from chapter 6 to pri-
oritize the implementation. Providing feedback to developers in an operation-like
environment is a key first step that starts helping the organization right away. It
forces the different teams to resolve the integration and production issues on a
daily basis when they are cheaper and easier to fix. It also encourages the cultural
change of having the organization prioritize coordinating the work across teams
to deliver value to the customer instead of local optimizations in dedicated envi-
ronments. It gets Development and Operation teams focused on the common
objective of proving the code will work in a production-like environment, which
Summary
Applying DevOps principles at scale for enterprise solutions delivered across
servers is going to require implementing CD. This is a big effort for most large
organizations. Because it requires cultural changes, you want to make sure this
transition is as efficient as possible to avoid resistance to the changes that happen
when it takes too long. Architecting the approach to efficiently leverage common
code and knowing where to start is essential. The pipeline should be designed with
P roviding quick feedback to developers and maintaining a more stable code base
are essential to improving the productivity of software development in large,
traditional organizations. Developers want to do a good job, and they assume they
have until they get feedback to the contrary. If this feedback is delayed by weeks
or months, then it can be seen as beating up developers for defects they don’t even
remember creating. If feedback comes within a few hours of the developer commit
and the tools and tests can accurately identify which commits introduced the
problem, the feedback gets to engineers while they are still thinking about and
working on that part of the code.
This type of feedback actually helps the developers become better coders
instead of just beating them up for creating defects. Even better, if it is very good
feedback with an automated test that they can replicate on their desktop, then
they can quickly identify the problem and verify the fix before recommitting the
code. For large, traditional organizations with tightly coupled architectures, it is
going to take time and effort to design the test stages of the deployment pipeline
to appropriately build up a stable system.
Testing Layers
The first step in any testing process is unit testing and static code analysis to catch
as many defects as quickly as possible. In fact, we can’t think of any good reason
why a developer would ever be allowed to check in code with broken unit tests.
The advantage of unit tests is that if properly designed, they run very fast and
can quickly localize any problems. The challenges with unit tests for lots of large,
traditional organizations are twofold: First, due to their tightly coupled architec-
tures, traditional organizations are not able to effectively find most of the defects
with unit tests. Second, going back through lots of legacy code to add unit tests is
seldom realistic until that part of the code is being updated.
The second step in the testing process to qualify code before final integration is
component- or service layer-testing at clean interfaces. This should be used where
possible because these tests run faster, and the interfaces should be more stable
than the true user interface-based system testing.
The final step that can be very effective for finding the important integration
issues in large, traditional organizations is user interface-based system testing. The
challenge with these tests is that they typically run slowly and can require running
the entire enterprise software system. Therefore you have to think carefully about
when and where they are used.
SERVICE VIRTUALIZATOIN
Waterfall Waterfall Waterfall
Comp 1 Comp 4 IT 1 Comp 1 Comp 4 IT 1 Comp 1 Comp 4 IT 1
take a subset of the automated tests that will fully test the interface and have them
run against the full enterprise system with the virtual service removed. Running
a subset of tests that fully exercise this interface in the entire system, ideally on a
daily basis, has the advantage of ensuring there are no disconnects between the
virtual service and the actual code. The other advantage of this approach is that
the majority of the full regression testing can be run on smaller, less expensive
environments that only represent the Agile components. The more expensive
and complex enterprise environment is only used for the subset of tests that
are required to validate the interface. For example, there may be hundreds of
automated tests required to validate the Agile components that have the same
use case against the interface. In this case, run the hundreds against the virtual
service and then pick one of them to be part of the enterprise system test.
This saves money and reduces the complexity of managing end-to-end testing.
This is a simple example, but it shows how you can use build frequency and
service virtualization in combination to help break down and build up complex
software systems to localize feedback while moving the organization to more
stable system.
Component 1
Component 2
Component 3
Component 4
Component 5
Component 6
Component 7
Component 8
Component 9
Regression, security,
Component gating Subsystem gating and performance
Application gating
at the SCM at the SCM gating into
production
This has the advantage of keeping the enterprise system stable while allowing
components to quickly get code onto trunk. The disadvantage occurs when there
is code that makes it through subsystem testing but breaks the enterprise-level
system testing. The broken artifact creates a train wreck for this application by
blocking new versions of the application from integrating into the enterprise
system. The developers, however, can still keep developing and committing code
to the SCM as long as they pass the application build acceptance tests. In this
case, you need someone working to ensure the application is fixed and then letting
the build process know this new version of the application is ready for another
enterprise-level integration, as depicted in figure 18. This is a manual process
and, therefore, you are at risk of nobody taking ownership of fixing the issue and
having stale applications in the enterprise integration.
“I think we fixed it
with this next train.
Try again.”
To avoid this, you are going to need a good process for providing visibility
of the artifact aging to everyone in the organization. Figure 17 depicts using
your deployment pipeline to build up stable components into a stable enterprise
system. This supports the basic Agile principle of keeping the code base stable so
it is economical to do more frequent smaller releases.
FIGURE 19: LOOSELY COUPLED ARCHITECTURE
DEPLOYMENT PIPELINE
Component 1
Component 2
Component 3
Production
Component 4
Component 5
Component 6
Component 7
Summary
Building up a stable enterprise system is a key component of the DevOps or Agile
principle of enabling more frequent and smaller releases. Even if your business
does not require or support more frequent releases, it helps with developer pro-
ductivity and takes uncertainty out of the endgame. That said it is not a natural
outcome of most enterprise Agile implementations that start with a focus on the
team. Therefore, it is going to require focus from the leadership team to design
and implement these improvements.
O nce the fundamentals are in place and you have started applying DevOps
principles at scale, you are ready to start improving stability. The basics of
starting this change management process were covered in chapter 7.
This chapter covers how to increase stability of the enterprise system using
build acceptance tests and the deployment pipeline. This next step is important
because the closer the main code base gets to production-level quality, the more
economical it is to do more frequent, smaller releases and get feedback from
customers. Additionally, having a development process that integrates stable
code across the enterprise is one of the most effective ways of aligning the
work across the teams, which, as we have discussed before, is the first-order
effect for improving productivity.
Depending on the business, your customers may not require or even allow
overly frequent releases, but keeping the code base more stable will help find
integration issues early before there are large investments in code that won’t work
well together. It will also make your Development teams much more productive
because they will be making changes to a stable and functioning code base.
Even if you are not going to release more frequently than you currently do, you will
find keeping the code base more stable will help improve productivity. If your organi-
zation does want to move to continuous deployment, trunk (not branches) should be
driven to production levels of stability, such that any new development can be easily
released into production. Regardless of your need for CD, the process for improving
stability on the trunk is going to be a big transformation that requires managing the
automated tests and the deployment pipeline. In this chapter we will show how to
improve stability over time.
700 160
600 140
120
500
97
95
400
85
83 82
80
79 79
300 76 76
72 72 72 60
200
40
100 20 #
0 0
Day 1 Day 2 Day 3 Day 4 Day 5 Day 6 Day 7 Day 8 Day 9 Day 10 Day 11 Day 12 Day 13
minute that stability will tend to take a hit. If instead the release branch requires
all features to be signed off on, with no open defects and test automation at an
acceptable level of coverage and passing rates, then there will be a more stable
trunk and, therefore, less work in the end game.
If the organization has historically focused on features, it will take some time
to adjust to this new way. Your project-management group will have to work to
steadily increase these criteria over time. They must be willing to kick projects out
of the release if they are not ready. A good example of the value of this is shown
in figure 22, where kicking a small portion of stories out of the release on the
branching day resulted in a big drop in defects and an increase in test passing on
the day after. This last step is usually hard for traditional organizations because
the business may want the new capability really badly. The problem with this is
that when the management team wants a new capability really badly, it tends
to get it that way in terms of quality.
Allowing new features that are not ready into a release tends to delay the
release and impacts the productivity of all the other projects in the release. The
key is helping the organization understand those tradeoffs and see the value of
90
Component 2
PASS RATE %
80 Component 3
Component 4
70
60
50
40
DAY 10
DAY 12
DAY 14
DAY 13
DAY 11
DAY 8
DAY 6
DAY 9
DAY 2
DAY 4
DAY 5
DAY 7
DAY 3
DAY 1
Auto-Revert
Fix
Test
Test
Test
Test
Test
L1 Sim
(10-14x/day)
Fix Fix Fix Fix Fix
STAGE 2 Sma
ll gr Merge Conflict
Manual Intervention
o
indi upings
vidu
al co of pass
mm in
its g
Test
L2 Sim Failure
(12x/day)
Submit
L3 Emu Defect
(6x/day) Resolve
Defect
L4 Sim
Rerun
(1x/day) Test(s)
for stability over speed provided the best breakthroughs in productivity, which is
contrary to what some consultants would recommend.
We designed the next stages of testing to localize the offending code that would
break different stages of testing using the componentization approach. The quality
bar was a fairly small subset of testing that defined the minimal level of acceptable
stability. L2 got into more extensive testing spread across a couple dozen simula-
tors and ran every two hours. The intent of this stage was to provide much better
code coverage on a slightly less frequent basis. Each L2 build included a few L1
builds. In this case we were looking for commits that broke a significant percent-
age of L2 tests that were not caught by L1. L3 ran every four hours on emulators
to catch interactions with the hardware. Then L4 was a full regression test that was
run every day on the simulators. In addition, while not shown on this graphic, we
ran basic acceptance testing every day on the final products under development.
This example shows that we worked to optimize the amount of testing and
defects we were able to find using simulators and emulators. Since we had over
15,000 hours of automated testing to run each day, it would have been impossible
to run them all on real printers. It is impractical to get that many printers and
you would have to print millions of pages a day on real paper. It would have cost
Summary
Applying DevOps principles at scale really requires the executive to drive the
cultural and technical changes for improving the stability of trunk. This is vitally
important because of the productivity gains that are possible when you eliminate
branches and integrate all work across the teams on trunk. To get your code trunk
to this level of quality and to keep it there, you need to begin using a CD pipeline
E xecutives need to make sure they don’t let the magnitude of the overall change
keep them from starting on improving the processes. We understand that
taking on this big of a change in a large organization can be a bit overwhelming,
which might cause some executives to question leading the effort. They need to
understand that while leading the transformation is a big challenge, trying to
compete in the market with traditional software development processes is going
to be even more challenging as companies around them improve.
Instead of worrying about the size of the challenge or how long it is going to
take, executives just need to make sure they start the enterprise-level continuous
improvement process. There is really no bad software development process. There
is only how you are doing it today and better.
The key is starting the process of continually improving. The first step is
making sure you have a clear set of business objectives that you can use to
prioritize your improvements and show progress along the journey. Next is
forming the team that will lead the continuous improvement process. The
team should include the right leaders across Development, QA, Operations,
and the business that will need to help support the priorities and lead the
transformation. You need to help them understand the importance of this
transformation and get them to engage in the improvement process.
As much as possible, if you can get these key leaders to help define the plan so
they take ownership of its success, it is going to help avoid resistance from these
groups over time. Nobody is going to know all the right answers up front, so the
key is going to be moving through the learning and adjusting processes as a group.
BUSINESS
OBJECTIVES
APPLYING DEVOPS
PRINCIPLES AT SCALE
ENTERPRISE-LEVEL
CONTINUOUS
IMPROVEMENT
Executives need to understand that other than engaging with the organization
in the continuous improvement process, their most important role is leading the
cultural shifts for the organization. They will need to be involved and helping
to prioritize the technical improvements but they need to understand that these
investments will be a waste of time if they can’t get the organization to embrace
the following cultural shifts:
These are big changes that will take time, but without the executives driving these
cultural shifts the technical investments will be of limited value.
Summary
There are lots of ideas for improving software development processes for tradi-
tional enterprise organizations. There are also documented case studies that show
the dramatic impact these transformations can have on traditional organizations.
The biggest thing missing in the industry is engaged executives who are willing
to lead the journey and drive the cultural changes. It is a big challenge with the
opportunity for big improvements. Are you up the challenge? Who do you need
to get to join you in leading the transformation? What are your plans for taking
the first step? What are your business objectives? What organizational barriers do
you think you will need to address? What do you think your organization can
commit to completing in the first iteration of your enterprise-level continuous
improvement process? What are you waiting for? Let’s get the journey started!
This is the most important book executives should be reading as soon as they have
developed the ability to release on a more frequent basis. It shows how to take
advantage of this new capability to address the 50% of features that are never used
or do not meet their intended business objectives.
This is a good, easy-to-read case study that will give the reader a good feel for the
transformation journey of one organization. It provides more details on the HP
experience referenced heavily in this book.
Toyota Kata: Managing People for Improvement, Adaptiveness, and Superior Results
Mike Rother
This book should be read by lead developers and lead testers to ensure you are
creating an automated testing framework that is maintainable and that quickly
localizes defects.
Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment
Automation
Jez Humble and David Farley
This is must-read for all your database administrators and anyone telling you that
trunk can’t be always releasable due to database schema changes.
ACKNOWLEDGEMENTS 107