The Complete Guide To An Enterprise DataOps Transformation (2022)
The Complete Guide To An Enterprise DataOps Transformation (2022)
The Complete Guide To An Enterprise DataOps Transformation (2022)
SUCCESS
The Complete Guide to an
Enterprise DataOps Transformation
INTRODUCTION 3
EDUCATE 5
Why Do DataOps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Better than Shake ‘n Bake! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
For Data Team Success, What You Do is
Less Important Than How You Do It . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
“Chicken & Rice Guys” Chicken and Rice . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
6 Steps to an Enterprise DataOps Transformation . . . . . . . . . . . . . . . . . . . . 21
Bungeoppang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
The Business Case for DataOps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Red Lentil Curry / Dal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
FIND 39
Launch Your DataOps Journey
with the DataOps Maturity Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Slovak Sunday Bone Broth Soup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Jump-Starting Your DataOps Journey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Peanut Butter Energy Bites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4 Easy Ways to Start DataOps Today . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Chocolate Stout Cupcakes with
Irish Whiskey Filling and Baileys Frosting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
ESTABLISH 69
Finding an Executive Sponsor for Your DataOps Initiative . . . . . . . . . . . . . 71
Gil’s Easy Chicken Cacciatore . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Pitching a DataOps Project That Matters . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
Risotto alla Monzese . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
DEMONSTRATE 83
Prove Your Team’s Awesomeness with DataOps Process Analytics . . . . . 85
Grandma’s Italian Meatballs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
ITERATE 93
Eliminate Your Analytics Development Bottlenecks . . . . . . . . . . . . . . . . . . 95
Pav Bhaji . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
EXPAND 107
Do You Need a DataOps Dojo? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
Spinach Madeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
DataOps Engineer Will Be the Sexiest Job in Analytics . . . . . . . . . . . . . . . . 115
Kerala Style Chicken Stew . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
Improving Teamwork in Data Analytics with DataOps . . . . . . . . . . . . . . . . 121
Spinach-Mushroom Quiche . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
Governance as Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
Slow Cooker Hangi Pork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
CONCLUSION 147
Why Are there So Many -Ops Terms? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
Mom’s Keto Chocolate Peanut Butter Fat Bombs . . . . . . . . . . . . . . . . . . . 157
A Guide to Understanding DataOps Solutions . . . . . . . . . . . . . . . . . . . . . . 159
Gil’s Old Fashion Fudge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
What a DataOps Platform Can Do For You . . . . . . . . . . . . . . . . . . . . . . . . . 165
Chapssal Doughnuts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
Business agility separates the leaders from the laggards. An agile business
monitors the environment, quickly detects change, forms and executes plans
and makes adjustments based on feedback. As data professionals, we see the
foundational role that data and analytics play throughout this process. If ana-
lytics are bureaucratic and error-prone, people will naturally seek workarounds,
resulting in diminished agility. Business agility depends upon analytics agility.
We are entering an era where analytics agility will be a key competitive differ-
entiator for enterprises. Organizations bogged down by data errors and sluggish
analytics team productivity will find themselves at a significant disadvantage. If
companies want to be more agile, they must start with the data analytics team.
Agile analytics can transform an enterprise from the inside.
When data and analytics are accurate, people learn to trust data. When a data
team responds to requests immediately and on-demand, business stakeholders
work more closely with the analytics team. When users and data professionals
work closely together, it unlocks creativity spurred by insights that drive orga-
nizations toward new products and services, innovative marketing strategies and
new markets.
DataOps is a data analytics methodology that serves as the vehicle for transforma-
tional change led by analytics. It emphasizes observability and meta-orchestration
to produce error-free analytics that can be created and updated at lightning speed.
DataOps is the secret sauce that can build market-leading analytics capabilities
that will raise a company’s business agility. We’ve written extensively about
DataOps over the past years. If you are new to the topic, please see our first book,
“The DataOps Cookbook” (over 12,000 downloads and counting) and the other
resources listed in the Appendix section.
Many people ask us how to begin their DataOps journey. We used to answer that
question by talking about the “Seven Steps to Implement DataOps.” Over time,
we understood that some people were asking a broader question about using
Imagine if a person could time travel back to the 1980s and tried to evangelize
Agile development. That person would face a lot of naysayers. “We’ve never done
it that way.” “I don’t get how that benefits us.” “Your methods don’t align with
how we allocate resources for projects.” From our perspective, we know that the
Agile advocate is correct, but our intrepid time traveler would need a way to con-
vince skeptics.
We hope this book will help you evangelize and lead a DataOps transformation at
your organization. We’ve included all of the insight that we’ve gained from our
own experiences coaching data analytics professionals on the best way to lead
organizational change using DataOps. We hope that these materials will help you
on your DataOps journey.
Our book’s title (“Recipes for DataOps Success”) refers to the orchestrated
pipelines that drive DataOps. One of the DataKitchen Platform’s lesser-known
features is how it helps data teams share development and operations “Recipes,”
improving collaboration and promoting reuse throughout the organization. To
have some fun with this metaphor, we asked our coworkers at DataKitchen (our
data chefs) to share their favorite recipes with you. You’ll find these sprinkled
throughout the book. Enjoy and bon appetit!
If you are frustrated with your enterprise’s data analytics, you are not alone.
VentureBeat reported that 87% of data science projects never make it into
production. It’s no surprise then that, despite soaring investments in AI and data
science, the percentage of organizations that describe themselves as “data driven”
has fallen from 37% to 31% since 2017.
Data teams can learn a lot from the quality methods used in automotive and other
industrial manufacturing. Methodologies like Lean manufacturing and the Theory
of Constraints apply just as well to data operations and analytics development
as traditional factories. Analytics is a pipeline process. Data sources enter the
enterprise, are loaded into databases, undergo processing and transformation, and
then feed into charts, graphs and predictive analytics. From a process perspective,
The data science industry refers to these methods under the umbrella term
DataOps. Just to be clear, DataOps is not a single vendor. It is not a particular
tool. You do not have to throw away your existing infrastructure and start over.
DataOps augments your existing operations. It is a new approach to data science
which draws upon three widely-adopted methodologies that are supported
by tools and software automation: Agile Software Development, DevOps and
statistical process controls (SPC).
AGILE DEVELOPMENT
One axiom in the Theory of Constraints is that small lot sizes reduce inventory,
minimize waste and increase the overall system throughput of a manufacturing
DEVOPS
Imagine clicking a button in order
to fully test and publish new analytics
into the production pipeline. That’s how Amazon and others deploy software
releases in minutes or seconds. This approach to releasing software is called
DevOps.
DevOps also automates testing. An extensive battery of tests verify and validate
When environment creation, test and deployment are placed under software
control, they can happen in seconds or minutes. This is how companies like
Amazon attain such rapid cycle time.
Agile development and DevOps work hand in hand. Agile enables enterprises to
quickly specify and commit to developing new features, while DevOps speeds
execution, test and release of those features. Neither of these methods would be
as effective without the other. Additionally, it’s impossible to move quickly when
a team is plagued by quality errors.
DataOps approaches data errors the same way that a manufacturing operation
controls supplier quality, work-in-progress and finished goods. DataOps borrows
a methodology, straight from lean manufacturing, called statistical process
control (SPC). Tests monitor data flowing through the pipeline and verify it to
be valid, complete and within statistical limits. Every stage of the data pipeline
monitors inputs, outputs and business logic. Input tests can catch process drift at
a data supplier or upstream processing stage. Output tests can catch incorrectly
processed data before it is passed downstream. Tests ensure the integrity of the
final output by verifying that work-in-progress (the results of intermediate steps
in the data pipeline) matches expectations.
If an anomaly occurs at any point in the workflow or pipeline, the data team will
be the first to know, through an automated alert, and they can take action. Test
results can also be displayed in dashboards, making the state of the data pipeline
Better than Shake ‘n Bake! Easy, inexpensive & tastes better too! Common ingredients come together in this
copycat Shake ‘n Bake recipe that’s even better than the original.
3 cups dried bread crumbs, ground very fine 1 tsp finely ground black pepper
1 Tbsp granulated onion, (powder can work too) 1 tsp ground dry thyme
INSTRUCTIONS
1. Mix together all of the ingredients very well. I use a food processor or mixer to make sure everything
is very well blended.
2. Simply wet chicken pieces with water, drain well and drop them, one at a time into a plastic bag
containing some of the homemade shake ‘n bake. I usually start with a half cup of the coating in
the bag, which is equivalent to what is in an envelope if you bought it at the supermarket. You can
always add a little extra if you need it at the end, but I find this is the best way to maximize the use
you get out of a batch.
3. Shake the bag and press the coating onto the individual chicken pieces. Place the coated pieces on
a parchment paper-lined baking sheet. Don’t crowd the pieces, they will crisp much better if there is
space between them.
Bake for about 45-55 minutes depending upon the size of the chicken pieces being used. Boneless skinless
chicken breasts can be ready in as little as 25 minutes depending on size. I use my meat thermometer to
ensure that the internal temperature is 175-180 degrees F to ensure they are fully cooked.
In today’s on-demand economy, the ability to derive business value from data
is the secret sauce that will separate the winners from the losers. Data-driven
decision-making is now more critical than ever. Analytics could mean the
difference between finding the right mix of strategic moves or falling behind. In
fact, Forrester Research predicted that insight-driven companies would grow
seven to 10 times faster than the global GDP through 2021.
Most enterprise companies recognize the need to be data-driven, yet 60% of data
projects fail to move past preliminary stages, and 87% of science projects never
make it to production. More surprisingly the number of data-driven companies
has actually fallen from 37% to 31% since 2017, despite increased investment
Taken together, the need to manage complex toolchains and data, as well as
collaborate with other organizations, roles, locations, and data centers, saps the
data team’s time. In fact, most data teams spend more time fixing errors and
addressing operational issues than innovating and providing business value.
According to Gartner, only 22% of a data team’s time is spent on new initiatives
and innovation. As a result, many data teams are not meeting expectations, or
worse, are beaten down and disempowered.
Figure 1: According to Gartner, only 22% of a data team’s time is spent on new initiatives
and innovation.
This mind shift was more recently highlighted by Elon Musk who said “we
realized the true problem, the true difficulty, and where the greatest potential is
— is building the machine that builds the machine. In other words, it’s building
the factory. I’m really thinking of the factory like a product.” Successful data
organizations are also wise to think of their data pipelines like a factory where
quality and efficiency must be managed. But how can a data team shift its focus
from the next big tool, technology or data feature to the people and process?
All of this creates the time and space for the data team to focus on what they
signed up for in the first place — creating innovative analytics and delivering
business value.
INGREDIENTS
marinade and chicken side salad
○ 1 tsp kosher salt ○ 1⁄2 small white or red onion, chopped (op-
○ 1 tsp paprika (sweet, hot, or smoked) tional) 1 small cucumber, chopped (optional)
INSTRUCTIONS
1. Marinate the chicken: combine marinade and pour into a bowl. Add the chicken, coat evenly and
leave in the fridge, covered, for 30 minutes
2. Cook the chicken: Once the chicken is done marinating, heat a large deep skillet with a lid on me-
dium-high heat in a single layer. Brown them, 5-8 minutes per side. Check the internal temp with a
meat thermometer: it should reach at least 165 F.
3. Boil the rice: Add 1 Tbsp olive oil to the pot, heat it, add spices and rice and toast together for 1 minute,
stirring frequently. Add the stock and salt and cook.
4. Make the sauces: while the rice cooks, make the yogurt and hot sauces
5. Make the salad: chop lettuce, tomatoes, and other (optional) veggies. Combine and season with salt
and pepper.
6. Chop the chicken: when the chicken is done cooking, remove it and chop into bite-sized pieces on a
cutting board. Return the pieces to the pan, coating them with the oil and spices in the pan.
As a career data professional, you may find it fairly straightforward to wrap your
mind around the tools that implement DataOps. However, leading a DataOps
initiative is about more than technologies and workflows. DataOps champions are
leading cultural change, which also involves overcoming skepticism.
EDUCATE
DataOps introduces new
methodologies, supported by tools
automation, that shortens data
analytics cycle time, improves
collaboration, virtually eliminates
errors and provides unprecedented
transparency into data operations.
DataOps can support your current
toolchain or ease migration to new
tools and technologies.
FIND
A mini or pilot project can serve as a proof of concept for potential DataOps
benefits. Choose your first project in consultation with your team and, if possible,
an executive sponsor. Ideally, it should demonstrate meaningful improvement
in a key performance parameter. Ideally, a first project leads to a quick win.
A shorter schedule is eminently preferable to an extended development
effort. That’s not to say that you have to get it perfect in one shot. Iterated
improvements demonstrate how value builds by using Agile development. If you
can’t decide where to begin, our DataOps Maturity Model may be helpful.
The DataOps Maturity Model can help organizations understand their DataOps
strengths and weaknesses. Maturity models are commonly used to measure
an organization’s ability to improve in a particular discipline continuously.
DataKitchen’s DataOps Maturity Model outlines a measurement approach for
building, monitoring, and deploying data and analytics according to DataOps
principles. With this model, teams can understand where they are today and how
to move up the curve of DataOps excellence.
Eliminating Bottlenecks
Most data teams are interested in DataOps because they seek to accelerate the
creation and deployment of new data analytics (data, models, transformation,
visualizations) without introducing errors. Reducing project cycle time or
eliminating errors are both excellent starting points. Errors are a major source of
unplanned work, which is a bottleneck that limits the throughput of the overall
system. To minimize errors, start tracking errors and form a quality circle to explore
root causes. Add tests to your data operations pipelines and continuous deployment
pipelines so that your data team can address errors before they affect users.
To reduce project cycle time, study and measure the workflow processes from the
inception of an analytics requirement to the delivery of published analytics. Every
workflow process includes constraints and bottlenecks. Improve overall cycle time
by mitigating these constraints in your development processes (Figure 3).
Whatever your choice of projects, invest in activities that will garner support and
demonstrate how DataOps produces measurable results.
The DataKitchen DataOps Platform and other DataOps tools can play a critical
role in shortening the cycle time of your DataOps model project. A DataOps
Platform is purpose-built to augment an existing toolchain with DataOps
automation. It can help you hit the ground running.
DEMONSTRATE
DataOps will deliver an unprecedented level of transparency into your operations
and analytics development. DataOps automated orchestration provides an
Choose your metrics to reflect your DataOps project objectives. The metric gives the
entire team a goal to rally around. The number of possible DataOps metrics is as varied as
the architectures that enterprises use to produce analytics. When your team focuses
on a metric and iterates on it, you’ll see significant improvements in each sprint.
EXPAND
As your DataOps initiative grows beyond the early stages, you will expand
to incorporate more staff, resources, and a broader scope. One best practice
incorporates DataOps into the organization chart. A sign of DataOps maturity
is building a common technical infrastructure and tools for DataOps using
centralized teams. It’s also important to establish enterprise-wide measurements
and metrics. Work with other teams throughout the organization to bring DataOps
benefits to every corner of the enterprise.
One approach standardizes a set of software services that support the rollout of
Agile/DataOps. The DataOps Technical Services (DTS) group provides a set of
central services leveraged by other groups. Examples of technologies that can be
delivered ‘as a service’ include:
• Source code control repository
• Agile ticketing/Kanban tools
• Deploy to production
• Product monitoring
• Develop/execute regression testing
• Development sandboxes
• Collaboration and training portals/wikis
• Test data management and other functions provided ‘as a service’
The DTS group can also act as a services organization, offering services to other
teams. Below are some examples of services that a DTS group can provide:
• Reusable deployment services that integrate, deliver and deploy end-to-end
analytic pipelines to production.
• Central code repository where all data engineering/science/analytic work can
be tracked, reviewed and shared.
• Central DataOps process measurement function with reports
• ‘Mission Control’ for data-production metrics and data-team development
metrics to demonstrate progress on the DataOps transformation
DataOps COE
The Center of Excellence (COE) model leverages the DataOps team to solve
real-world challenges. The goal of a COE is to take a large, widespread,
deep-rooted organizational problem and solve it in a smaller scope,
proof-of-concept project, using an open-minded approach. The COE then
attempts to leverage small wins across the larger organization at scale. A COE
typically has a full-time staff that focuses on delivering value for customers in
an experimentation-driven, iterative, result-oriented, customer-focused way.
COE teams try to show what “good” looks like by establishing common technical
standards and best practice. They also can provide education and training
enterprise-wide. The COE approach is used in many enterprises, but the DevOps
industry has more often standardized on Dojos as a best practice.
CHAMPIONING DATAOPS
DataOps can serve as a positive agent of change in an otherwise slow and process-
heavy organization. Remember that leading change in technical organizations
is equal parts people, technology and processes. DataOps offers the potential
to reinvigorate data team productivity and agility while improving quality and
predictability. Our six-step program should help you introduce and establish
DataOps in your data organization. In our experience, many data organizations
desperately need the benefits that DataOps offers. They need people to champion
a DataOps initiative. Can your organization count on you?
• Sweet red beans (canned or homemade): for homemade, use the method
from my patbingsu recipe
INSTRUCTIONS
• Combine flour, kosher salt, baking soda, and sugar in a bowl. Add water and mix it well.
• Sieve the mixture through a strainer to get a silky batter without any lumps.
• Heat up the bunggeoppang pan and turn the heat down to low.
• Open the pan and grease both the upper and lower fish molds with a light coating of vegetable oil.
• Pour the batter into one side of the fish mold until it’s 1/3 full. Add 1 tablespoon of sweet red beans to
the center, and then gently fill up the rest of the fish mold to totally cover the red beans.
• Close the mold and cook for about 3 minutes over low heat.
• Turn the pan over and let it cook another 3 minutes. Open it and turn it over again for another 30
seconds, to make the bread a little more crispy.
Attribution: Maangchi
Savvy executives maximize the value of every budgeted dollar. Decisions to invest
in new tools and methods must be backed up with a strong business case. As data
professionals, we know the value and impact of DataOps: streamlining analytics
workflows, reducing errors, and improving data operations transparency. Being
able to quantify the value and impact helps leadership understand the return
on past investments and supports alignment with future enterprise DataOps
transformation initiatives. Below we discuss three approaches to articulating the
return on investment of DataOps.
Gartner describes the time spent on “operational execution” execution as using the
data team to implement and maintain production initiatives. A big percentage of
the time that data scientists spend on operational effort is consumed servicing data
errors.
Thirty six percent of the total time of a ten-person team, based on a full-time
employee (FTE) cost of $156,000 amounts to $561,000. This significant sum can be
redeployed to higher value-add activities.
Analytics agility leads to business agility. When the data team delivers analytics
rapidly and accurately, analytics do a better job supporting decision-makers. When an
organization can make decisions faster and better, it is able to capture opportunities
that it would have otherwise missed or misjudged. With analytics playing a central role
in corporate strategy, analytics agility can be a competitive advantage.
1 tomato
1 tsp ginger
cilantro leaves
salt
INSTRUCTIONS
• Wash 1.5 cup lentils and soak them in water for about 2 hours.
• Add the soaked lentils, with 1 chopped tomato, 1 tsp turmeric powder and salt into an instant pot.
• Set the instant pot to Pressure Cook mode with a timer of 7 minutes and let it naturally release the
steam for 5 minutes before opening the lid.
• Note, Instant pot is quicker, but the lentils can be easily cooked in a large saucepan. You can com-
bine the lentils with about 3.5 cups of water and cook the lentils for about 25-35 minutes.
• Once the lentils are cooked, heat a pan and add 1 tbsp ghee / clarified butter. Once the ghee is
hot, add 1 tsp cumin seeds, 1/2 cup chopped onion, 1 tsp ginger finely chopped, 1 tsp garlic finely
chopped, 1 tsp chili powder and cook them until the onion turns translucent.
WHY?
Becoming data-driven is hard. Data teams are caught between the competing
demands of data consumers, data providers, and supporting teams. Typically,
data consumers live in an “Amazon world” and expect trusted, original insight
on-demand. Yet data providers often send inaccurate, late, or error-prone data
sets. The flawless collaboration demanded of stakeholders often just isn’t there.
Taken together, the need to manage complex toolchains and data, as well as
collaborate with other organizations, roles, locations, and data centers, saps
the data team’s time. In fact, most data teams spend more time fixing errors
and addressing operational issues than innovating and providing business
value. According to Gartner, only 22% of a data team’s time is spent on new
initiatives and innovation (Figure 1). As a result, many data teams are not meeting
expectations, or worse, are beaten down and disempowered.
In data analytics, DataOps provides the path forward. Research shows that
“organizations that adopt a DevOps- and DataOps-based approach are more
successful in implementing end-to-end, reliable, robust, scalable and repeatable
solutions,” says Gartner’s Sumit Pal. (Gartner, November 2018)
WHAT IS DATAOPS?
DataOps is a set of technical practices, cultural norms, and architectures that
enables:
• Rapid experimentation and innovation for the fastest delivery of new
insights to customers
• Low error rates
• Collaboration across complex sets of people, technology, and environments
• Clear measurement and monitoring of results
Level 3 Consistent: Automated processes are being applied across the entire data
analytic development lifecycle.
Level 1 Struggle: Processes are unrepeatable, poorly controlled, manual, and reactive.
To reduce the level of errors, robust DataOps programs use automated testing,
monitoring, and orchestration in their production pipelines. Inspired by statistical
process control, they will have tests running in production across all pipelines,
sources, and tools, multiple types of tests per process step, and error alerts in place.
In contrast, teams that struggle will have no automated tests in production. This
results in costly and embarrassing errors, often discovered by customers. (Figure 4)
TEAM CULTURE
DataOps draws upon the principles of Agile and Lean manufacturing to transform
processes that manage data on its journey toward value creation. Successful
DataOps teams follow Agile principles which are a strong part of the overall
company culture. These organizations are focused on continuous learning and
optimization and errors are viewed as an opportunity for improvement.
On the opposite end of the spectrum are companies that follow waterfall
principles. Errors go undiscovered or are hidden and blame is passed around when
things go wrong. (Figure 8)
The good news is that success in one particular dimension does not have to be
traded-off against others. In organizations that don’t practice DataOps, this
is a common practice. They often trade speed for quality (or vice versa). For
example, in order to reduce fear and uncertainty over errors, a team may establish
practices, like documentation, checks and balances, and lots of meetings, that
lengthen their cycle time and reduce productivity. With DataOps practices in
place, you can excel in both speed and quality. Best-in-class data organizations
do well across the board, leading to overall greater productivity and lower costs.
By focusing on the right areas, a data team can start to look more like Bristol
Myers Squibb (formerly Celgene), a company that is now several years into their
DataOps journey (Figure 11). This team initially overcame obstacles that prevented
analytics responsiveness and quality. Data was organized in silos – using a variety
of technologies and isolated platforms. Without the right processes and tools in
place, the data engineering and analytics teams spent a majority of their time on
data engineering and pipeline maintenance. This distracted them from their main
mission – producing analytic insights that help the business attain its objectives.
After implementing DataOps, they now achieve excellence across all critical
dimensions:
• Very, very few errors or missed SLAs
• Weekly cycle time of new changes/features/data
• Detailed process metrics
• Agile culture
• High inter- and intra-team coordination
• High customer satisfaction
salt
1 onion
parsley top
4-6 carrots
1 small zucchini
Egg noodles
INSTRUCTIONS
1. Wash bones, place in pot and fill 3/4 with cold water. Add about 1 1/2 tbsp of salt (more or
less to preference) and set over low heat. The broth should never boil away, only have the
occasional bubble rise to the top. If it does boil, of course, it’s still tasty, but the broth will be
cloudy instead of clear.
2. Add a peeled onion cut in half, the first vegetables (celery root, kohlrabi, cabbage core, etc), and
parsley top (if you have one). Leave the broth on low heat for at least 3-4 hrs, if not longer.
3. About an hour before serving, add carrots and parsley root. Do not slice, although you can cut them
in half lengthwise if they are bigger.
4. Make zucchini noodles and cut into 2 inch/5 cm lengths. I like to put them in a sieve and put the
sieve into the broth for a few minutes to warm up and soften the noodles but not cook them.
5. Strain out the carrots and parsley root, cool for a minute, and chop. Put carrots, parsley, and
zucchini noodles in a soup tureen, large mixing bowl, or another pot.
6. Ladle the hot broth through a sieve into the soup tureen, sprinkle some dried vegetable flavoring
and/or salt to taste, add a handful of chopped parsley.
7. Serve piping hot over cooked egg noodles. Hot pepper can be added to individual bowls if desired.
• Use bones from any animal, preferably raised in a sustainable manner. Beef Marrow Bones
preferred.
• A bit of fat (or skin) and meat on the bones adds flavor.
• Note that the vegetables are put in whole, or cut in half, don’t cut them up when putting the soup
together.
• If you don’t have parsley root, parsnips would do as well. If you don’t have either, leave it out.
• The more vegetables go in at the beginning, the sweeter the broth will be, you can choose as many
or as little as you like. I save green cabbage cores or cauliflower stems in the freezer and throw
them in as well.
These teams are trying to answer two simple questions. First, how can their team
collaborate to reduce the cycle time to create and deploy new data analytics (data,
models, transformation, visualizations, etc.) without introducing errors? And
second, where to start this process? We’ve written about how to apply the ‘Theory
of Constraints’ to choosing your first DataOps win. The answer relates to finding
and eliminating the bottlenecks that slow down analytics development.
What follows are examples of different types of bottlenecks, why they were
selected first, and the benefits of resolving those bottlenecks with DataKitchen.
In some ways, the bottleneck is not just technical in nature. Quickly moving code
from development into production can be scary – with sometimes painful and costly
business implications. What if we make a mistake? Will we get yelled at by the
business? Will the business make a critical (and wrong!) decision based on erroneous
data? Ensuring new feature deployment success requires both a platform like
DataKitchen and an effective approach to writing tests. As part of working with the
customer we spend time educating them on how to write great tests as well as the
core principles of DataOps.
So, what bottleneck did they focus on first? Orchestration of the toolchain for
low error execution and enhanced collaboration. They started with a variety of
technologies that the company currently uses, including on-prem:
• Oracle
• SQL Server
• Informatica
• Apache Nifi
• Hadoop/HDFS
• PySpark Cloud
• Redshift
• S3
• Tableau online
• MLflow
DataKitchen and the company have identified another bottleneck that first needs
to be addressed. Talented (and expensive) Data Scientists can create the first
version of an idea but have no interest in running it on a day to day basis. How
can that ML model and all its associated data transformation, code, and UI be
put into operation following a DataOps Approach? To address this bottleneck the
team created a Recipe in DataKitchen that provides the ability to orchestrate the
components developed in the ML use case. That Recipe has the ability to run data
tests and detect errors in the data in the production environment. It also has the
ability to detect operational errors (e.g., the web scraping issues), alert users to
those errors, and permit re-start of the processing of the data pipeline (Recipe).
See diagram below.
2 tablespoons honey
INSTRUCTIONS
1. Combine all 5 ingredients in a medium bowl. Stir to combine.
2. Place in the refrigerator for 15-30 minutes so they are easier to roll.
3. Roll into 12 bites and store in the fridge for up to a week. An ice cream scooper is a
good measurement.
If you are a CDO or a VP, you have the power to institute broad change, but what
if you are an individual contributor? What can you do? This is a common question
that we hear from our conversations with data scientists, engineers and analysts.
An individual contributor has assigned duties and usually no ability to approve
purchases. How can one get started given these limitations?
DATAOPS OBJECTIVES
DataOps includes four key objectives:
• Measure Your Process — As data professionals, we advocate for the benefits
of data-driven decision making. Yet, many are surprisingly unanalytical
about the activities relating to their own work.
• Improve Collaboration, both Inter- and Intra-team — If the individuals in
your data analytics team don’t work together, it can impact analytics cycle
time, data quality, governance, security and more. Perhaps more importantly,
it’s fun to work on a high-achieving team.
• Lower Error Rates in Development and Operations — Finding your errors is
the first step to eliminating them.
• Decrease the Cycle Time of Change — Reduce the time that elapses from the
conceptualization of a new idea or question to the delivery of robust analytics.
If that’s too abstract, we’ll suggest four projects, one in each of the areas above,
that will start the ball rolling on your DataOps initiative. These tasks illustrate
how an individual contributor can start to implement DataOps on their own.
The data arrival report shows which data sources meet their target service levels.
When you bring these reports to the team, it will help everyone understand where
time and resources are being wasted. Perhaps this will inspire a project to mitigate
your worst bottleneck, leading to another project in one of the next areas.
IMPROVE COLLABORATION
Conceptually, the data analytics pipeline is a set of stages implemented using a
wide variety of tools. All of the artifacts associated with these tools (JSON, XML,
scripts, …) are just source code. Code deterministically controls the entire data
analytics pipeline from end to end.
If the code that runs your data pipeline is not in source control, then it may be spread
out on different systems, not revision controlled, even misplaced. You can take a
big step toward establishing a controlled, repeatable data pipeline by putting all
your code in a source code repository. For example, Git is a free and open-source,
distributed version control system used by many software developers. With version
control, your team will be better able to reuse code, work in parallel and trace bugs
back to source code changes. Version control also serves as the foundation for
DataOps continuous deployment, which is an excellent long-term goal.
Every processing or transformation step should include tests that check inputs,
outputs and evaluate results against business logic.
When you have started counting and cataloging your errors, start a quality circle,
find patterns and aim to fix one error per month.
Factors that derail the development team and lengthen analytics cycle time
The image below shows the many different kinds of tests that should be performed.
We explain each of these types of tests in our Guide to DataOps Tests.
A broad set of tests can validate that the analytics work and fit into the overall system.
Tests that validate and monitor new analytics enable you to deploy with
confidence. When you have certainty, you can deploy and integrate new analytics
more quickly.
CONCLUSION
There are many small yet effective projects that you can start today that will serve
your DataOps goals. Hopefully, we’ve given you a few ideas.
INGREDIENTS
1/2 cup Guinness (or stout of your choice)
1 large egg
BAILEYS FROSTING
2 cups confectioners’ sugar
Stout Cupcakes
1. Preheat oven to 350 degrees and line a 12-cavity cupcake tin with papers.
2. Bring stout and butter to a simmer in a large, heavy saucepan over medium heat. Add cocoa pow-
der to the saucepan and whisk the mixture until it’s smooth. Remove saucepan from heat.
3. In a separate medium bowl, whisk together the flour, sugar, baking soda, and salt.
4. In the bowl of a stand mixer or in a separate large bowl with a hand mixer (or whisk) beat together
egg and sour cream, until combined.
5. Add the chocolate stout mixture to the egg mixture and beat until just combined.
6. Add the dry mixture to the wet mixture and mix until just combined, taking care not to over-mix.
7. Divide batter among cupcake liners, filling them about ¾ of the way.
8. Bake for about 17-20 minutes, until a toothpick stuck into the center of a cupcake comes out clean.
9. Let the cupcakes cool in the pan for a few minutes and then take them out to cool completely on
a wire rack. Once cupcakes are cooled completely, core out a small section from the middle using
either a knife or a cupcake corer.
10. Spoon Irish whiskey filling into centers of cupcakes. Frost cupcakes with Bailey’s frosting. I used a
Wilton 1A pastry tip for mine.
2. In a small saucepan, bring cream just to a boil (keep a close eye on it and remove from heat right
when it starts boiling). Pour cream over chocolate in bowl and let sit for 1 minute. Then, stir until
chocolate is completely melted and smooth.
Baileys Frosting
1. In the bowl of a mixer or in a large bowl with a hand mixer, mix butter on medium speed until it’s
nice and fluffy. Add confectioners’ sugar one cup at a time and beat until well-combined.
2. Add the Baileys and beat until combined. If frosting is too thin, add more confectioners’ sugar a
couple tablespoons at a time.
Notes
For an easy way to core cupcakes, see the link for a cupcake corer.
DataOps revolutionizes how data analytics work gets done. Like many other “big
ideas,” it sometimes faces resistance from within the organization. For most
organizations, data is a means to an end. The organization’s primary focus is
on its mission, whether that is a product or a service. As data professionals, we
communicate the value of data-driven insights. Although many of our colleagues
appreciate the value of insight, they generally pay little attention to the process of
uncovering that insight unless there is an issue or error.
If you are launching a DataOps initiative, executive sponsorship can give you
air cover while building DataOps capabilities on the ground. A C-level sponsor
can tie the project’s activities into the larger organization’s strategic goals.
An executive can explain the value to others and provide guidance as the
project team faces obstacles or grapples with trade-offs. The executive sponsor
provides resources and budget as a skunkworks matures into an official
project. To pitch a transformational concept like DataOps to an executive, put
yourself in his or her shoes.
Translate DataOps’ impact into benefits that your executive understands and cares
about. DataOps offers ways to slash analytics development cycle time, streamline
workflows, and virtually eliminate errors in data operations. These capabilities
help business leaders rapidly capitalize on opportunities and gain insight into the
marketplace, often well before the competition.
An executive is always on the lookout for ways to grow revenue and maximize
resources. Circumstances present the business with an endless stream of
opportunities to make investments that spur growth or implement efficiencies.
Companies can’t jump on every opportunity. They have to select the best of
the bunch based on return-on-investment (ROI), risk assessment, or another
preferred metric.
If you can cut things into pieces, you can make this easy recipe.
INGREDIENTS
• 6-8 boneless, skinless chicken thighs (about 2-4 pounds) – or an equivalent type
of chicken
• 1 large onion
• 2 teaspoons turmeric
INSTRUCTIONS
• Cut chicken, peppers, and onion into pieces
• Until the chicken is cooked and the vegetables are soft and tender simmer covered on the stove for
45-60 minutes or bake in an oven-proof pot (e.g. a Dutch oven) at 350 degrees for 45-60 minutes.
DataOps addresses a broad set of use cases because it applies workflow process
automation to the end-to-end data analytics lifecycle. DataOps reduces
errors, shortens cycle time, eliminates unplanned work, increases innovation,
improves teamwork, and more. Each of these improvements can be measured
and iterated upon.
These benefits are hugely important for data professionals, but if you made
a pitch like this to a typical executive, you probably wouldn’t generate much
enthusiasm. Your data consumers are focused on business objectives. They need
to grow sales, pursue new business opportunities, or reduce costs. They have
very little understanding of what it means to create development environments
in a day versus several weeks. How does that help them “evaluate a new M&A
opportunity by Friday?”
User feedback may feel concrete to users, but as a data professional, you will have
to translate these requirements into metrics. For example, users may not trust
the data. That may seem abstract and not directly actionable. Try measuring
your errors per week. If you can show users that you are lowering that number,
you can build trust. A test coverage dashboard can illustrate progress in quality
controls. Demonstrating your success with data can help gradually win over
detractors. What other problems have eroded trust? You may need to look for
more than one contributing factor.
Another common user complaint is that data analytics teams take too long to
deliver requested features. The length of time required to deliver analytics can
be expressed in a metric called cycle time. Benchmark how fast you can deploy
new ideas or requests into production. To reduce cycle time, examine the data
science/engineering/analytic development process. For example, how long does
it take to create a development environment? How up-to-date are development
environments? How well-governed are development environments?
INGREDIENTS
320g of big grain rice
1 red onion
1 sausage
100g of butter
A bag of saffron
Salt
INSTRUCTIONS
1. Chop the red onion. Put a spoon and a half of olive oil into a pot, wait for it to be hot and fry the red
onion.
2. In the meantime boil the broth in another pot and keep it hot for the whole recipe time as you will
need it.
3. Cut the sausage, when the onion changes color, put the sausage into the pot and fry it, then add
the rice.
4. Once the rice becomes transparent, add broth until the content of the pot gets covered. At the same
time add the bag of saffron.
5. When the broth gets absorbed completely, add another ladle of broth and continue this way for
18-20 minutes (add broth only when is absorbed!). After 18-20 minutes the rice will be cooked. Now
turn off the stove (mandatory!) and add salt, butter (from the fridge) and cheese.
Do you deserve a promotion? You may think to yourself that your work is
exceptional. Could you prove it?
As a Chief Data Officer (CDO) or Chief Analytics Officer (CAO), you serve as an
advocate for the benefits of data-driven decision making. Yet, many CDO’s are
surprisingly unanalytical about the activities relating to their own department.
Why not use DataOps analytics to shine a light on yourself?
Internal analytics could help you pinpoint areas of concern or provide a big-
picture assessment of the state of the analytics team. We call this set of analytics
the CDO Dashboard. If you are as good as you think you are, the CDO Dashboard
will show how simply awesome you are at what you do. You might find it helpful
to share this information with your boss when discussing the data analytics
department and your plans to take it to the next level. Below are some reports
that you might consider including in your CDO dashboard:
The burn down chart graphically represents the completion of backlog tasks over
time. It shows whether a team is on schedule and sheds light on the productivity
achieved in each development iteration. It can also show a team’s accuracy in
forecasting its own schedule.
VELOCITY CHART
The velocity chart shows the amount of work completed during each sprint — it
displays how much work the team is doing week in and week out. This chart
can illustrate how improved processes and indirect investments (training, tools,
process improvements, …) increase velocity over time.
The Tornado Report is a stacked bar chart that displays a weekly representation of
the operational impact of production issues and the time required to resolve them.
The Tornado Report provides an easy way to see how issues impacted projects and
development resources.
A large organization might receive hundreds of data sets from suppliers and each
one could represent dozens of files. All of the data has to arrive error-free in order
to, for example, build the critical Friday afternoon report. The Data Arrival report
tracks how vendors perform relative to their respective service level agreements
(SLA).
The Data Arrival report enables you to track data suppliers and quickly spot
delivery issues. Any partner that causes repeated delays can be targeted for
coaching and management. The Tornado Report mentioned above can help
quantify how much time is spent managing these issues in order to articulate
impact. These numbers are quite useful when coaching a peer organization or
vendor to improve its quality.
The data analytics pipeline is a complex process with steps often too numerous to
be monitored manually. Statistical Process Control (SPC) tests inputs, outputs
and business logic at each stage of the pipeline. It allows the data analytics team
to monitor the pipeline end-to-end from a big-picture perspective, ensuring that
everything is operating as expected.
CONCLUSION
One of the main goals of analytics is to improve decision-making. The CDO
DataOps Dashboard puts information at the fingertips of executives, so they
have a complete picture of what is happening in the data analytics domain.
When it’s time to review performance, the CDO DataOps Dashboard can help
you show others that the analytics department is a well-oiled machine. Now,
about that promotion…
INGREDIENTS
• 1 lb. Ground Beef
• 2 Eggs
• 2 slices of crust removed bread soaked in water (wring out good before adding)
Very little oil: less than a tablespoon (I usually drizzle quickly over mixture)
Salt & Pepper (I usually shake both to cover the ingredients above)
Instructions
Put all ingredients in a big bowl. Mix/Knead well. I always make a marble size tasting ball that I cook in
the microwave for about 20-30 seconds – rotating halfway through. I sometimes find I need to add more
salt.
Spray a cookie sheet with olive oil spray (or wipe on olive oil). Roll into ping pong or golf ball-sized
spheres.
Bake for 20 minutes (I do 11 minutes, then flip over and bake another 9 minutes). Eat them while they are
hot (by themselves or butter a piece of scali bread and put a warm ball in there) or place them in your
tomato sauce.
Bonus material: using just plain tomato sauce or crushed tomatoes, use these meatballs to add joyful
flavoring by simmering for hours.
Each pound makes about 18-20 golf ball size meatballs (I usually make 3 lbs at a time)
Analytics teams need to move faster, but cutting corners invites problems in
quality and governance. How can you reduce cycle time to create and deploy
new data analytics (data, models, transformation, visualizations, etc.) without
introducing errors? The answer relates to finding and eliminating the bottlenecks
that slow down analytics development.
Each of the groups shown in figure 1 tracks their own projects. Figure 3 shows the
data analytics groups again, but each with their own Kanban boards to track the
progress of work items. To serve the end goal of creating analytics for users, the data
teams are desperately trying to move work items from the backlog (left column) to
the done column at the right, and then pass it off to the next group in line.
Data professionals are smart and talented. They work hard. Why does it take so long
to move work tickets to the right? Why does the system become overloaded with so
many unfinished work items forcing the team to waste cycles context switching?
To address these questions, we need to think about the creation and deployment
of analytics like a manufacturing process. The collective workflows of all of
the data teams are a linked sequence of steps, not unlike what you would see
in a manufacturing operation. When we conceptualize the development of
new analytics in this way, it offers the possibility of applying manufacturing
management tools that uncover and implement process improvements.
THE BOTTLENECK
The plant’s complex manufacturing process, with its long sequence of
interdependent stages, was throughput limited by one particular operation — a
certain machine with limited capacity. This machine was the “constraint” or
bottleneck. The Theory of Constraints views every process as a series of linked
activities, one of which acts as a constraint on the overall throughput of the entire
system. The constraint could be a human resource, a process, or a tool/technology.
In “The Goal,” Alex learned that “an improvement at any point in the system, not
at the constraint, is an illusion.” An improvement made at a stage that feeds work
to the bottleneck just increases the queue of work waiting for the bottleneck.
Improvements after the bottleneck will always remain starved. Every loss of
productivity at the bottleneck is a loss in the throughput of the entire system.
Losses in productivity in any other step in the process don’t matter as long as that
step still produces faster than the bottleneck.
When managers talk to data analysts, scientists and engineers, they can quickly
discover the issues that slow them down. Figure 5 shows some common constraints.
For example, data errors in analytics cause unplanned work that upsets a carefully
crafted Kanban board. Work-in-progress (WIP) is placed on hold and key personnel
context switch to address the high-severity outages. Data errors cause the Kanban
boards to be flooded with new tasks which can overwhelm the system. Formerly
high priority tasks are put on hold, and management is burdened, having to manage
the complexity of many more work items. Data errors also affect the culture of the
organization. After a series of interruptions from data errors, the team becomes
accustomed to moving more slowly and cautiously. From a Theory of Constraints
perspective, data errors severely impact the overall throughput of the data organization.
A related problem, also shown in figure 5, occurs when deployment of new analytics
breaks something unexpectedly. Unsuccessful deployments can be another cause
of unplanned work which can lead to excessive caution, and burdensome manual
operations and testing.
Another common constraint is team coordination. The teams may all be furiously
rowing the boat, but perhaps not in the same direction. In a large organization,
each team’s work is usually dependent on each other. The result can be a
serialized pipeline. Tasks could be parallelized if the teams collaborated better.
New analytics wouldn’t break existing data operations with proper coordination
between and among teams.
3. Subordinate everything to the constraint — Review all activities and make sure
that they benefit (or do not negatively impact) the constraint. Remember, any loss
in productivity at the constraint is a loss in throughput for the entire system.
4. Elevate the constraint — If after steps 2–3, the constraint remains in the same
place, consider what other steps, such as investing resources, will help alleviate
this step as a bottleneck
Figure 7: Errors, deployment and team coordination are bottlenecks that inhibit
the flow of analytics innovation
A leading book on DevOps, called “The Phoenix Project,” was explained by author
Gene Kim to be essentially an adaptation of “The Goal” to IT operations. To
alleviate their bottleneck, the team in the book implements Agile development
(small lot sizes) and DevOps (automation). One important bottleneck was a bright
programmer named Brent who was needed for every system enhancement and
was constantly being pulled into unplanned work. When the team got better at
relieving and managing their constraints, the output of the whole department
dramatically improved.
The problem is that customers don’t actually know what products or services they
want. What customer would have asked for Velcro or Post-It notes or Twitter?
Many data professionals can relate to the experience of working diligently to
deliver what customers say they want only to receive a lukewarm response.
There is much debate about how to listen to the voice of the customer (Dorothy
Leonard, Harvard Business School, The Limitations of Listening). Customer
preferences are reliable when you ask them to make selections within a familiar
product category. If you venture outside of the customer’s experience, you tend to
encounter two blocks. People fixate on the way that products are normally used,
preventing them from thinking outside the box. Second, customers have seemingly
contradictory needs. Your data analytics customers want analytics to be error-free,
which requires a lot of testing, but they dislike waiting for lengthy QA activities to
complete. Data professionals might feel like they are in a no-win situation.
Deconstruct, step by step, the underlying processes behind your delivery of data
analytics. It may make sense to interview users like data analysts who leverage
data to create analytics for business colleagues.
Note that if Satisfaction is greater than Importance, then the term (Importance -
Satisfaction) is zero not negative.
When you are done, you should have produced something like the below example.
Table 1 reveals which outcomes are important to users and deprecates those
outcomes that are already well served by the existing analytics development
process. The outcomes which are both important and unsatisfied will rise to
the top of the priority list. This data can be used as a guide to prioritize process
improvements in the data analytics development pipeline and process.
If you have multiple bottlenecks, you can’t address them all at once. The
opportunity algorithm enables the data organization to prioritize process
improvements that produce outcomes that are recognized as valued by users.
It avoids the requirement for users to understand the technology, tools, and
processes behind the data analytics pipeline. For DataOps proponents, it can
provide a clear path forward for analytics projects that are both important and
appreciated by users.
INGREDIENTS
2 large potatoes, diced
2 teaspoon ginger-garlic paste (if you don’t have paste - finely grate ginger and garlic instead)
1 tablespoon oil
1 teaspoon red chili powder (vary as per spice level you want)
3 tablespoon pav bhaji masala powder (available in any Indian store. Best brand I used: Everest )
Salt
GARNISH
Sprinkle little lemon
Chopped onions
Coriander leaves
Butter
Cooking Veggies
1. Add cauliflower, potatoes, carrots to a pressure cooker. Add 2 cups of water or just enough to soak
the veggies. Let it whistle twice. When the pressure releases, open the lid and mash them well.
> You can also cook it in a pot till they are soft/tender. You need to mash them so make sure
they are cooked.
3. Add ginger-garlic paste and chopped green chili (if using). Let the raw smell of ginger-garlic go
away
5. Next add tomatoes. Let it sautés on low flame for 10-15 minutes - this is important, do not rush this
step. Tomatoes must be soft and mushy.
6. Next add peas (mash them with hands while adding) and let them cook for a few minutes
7. Add red chili powder, turmeric (very little if using) and pav bhaji masala.
8. Let the spices cook for 3-4 minutes, till you see oil releasing from the sides. It becomes fragrant!
10. Pour some water to bring it to a high consistency (it should not be too runny or too thick).
12. Cook for 10 minutes till the gravy thickens, stirring in between
13. After 10 minutes, add another tablespoon of pav bhaji masala and some butter.
14. Cook for 3-5 minutes and turn off the stove.
Pav/Dinner rolls
2. Heat butter in a pan. Open buns and place on the pan and toast them for a minute.
Toast both sides.
3. Garnish the gravy with onions, butter, lemon and coriander. Serve hot with the toasted dinner rolls.
Enjoy!!
As DataOps activity takes root within an enterprise, managers face the question
of whether to build centralized or decentralized DataOps capabilities. Centralizing
analytics brings it under control but granting analysts free reign is necessary to
foster innovation and stay competitive. The beauty of DataOps is that you don’t
have to choose between centralization and freedom. You can choose to do one
or the other — or both. Below we’ll discuss some standard DataOps technical
services that could be developed and supported by a centralized team. We’ll also
discuss building DataOps expertise around the data organization, in a decentralized
fashion, using DataOps centers of excellence (COE) or DataOps Dojos.
A centralized team can publish a set of software services that support the rollout
of Agile/DataOps. The DataOps Technical Services (DTS) group provides a set of
central services leveraged by other groups. DTS services bring the benefits of
DataOps to groups that aren’t ready to implement DataOps themselves. Examples of
technologies that can be delivered ‘as a service’ include:
• Source code control repository
• Agile ticketing/Kanban tools
• Deploy to production
• Product monitoring
• Develop/execute regression testing
• Development sandboxes
• Collaboration and training portals/wikis
• Test data management and other functions provided ‘as a service’
The DTS group can also act as a services organization, offering services to other
teams. Below are some examples of services that a DTS group can provide:
• Reusable deployment services that integrate, deliver, and deploy end-to-end
analytic pipelines to production
• Central code repository where all data engineering/science/analytics work can be
tracked, reviewed and shared
• Central DataOps process measurement function with reports
• ‘Mission Control’ for data production metrics and data team development
metrics to demonstrate progress on the DataOps transformation
DTS creates robust DataOps services and capabilities, but if an organization wishes
to seed DataOps practices throughout the organization, it should plan methods to
transfer DataOps solutions and “know-how” to data scientists and engineers in the
periphery of the organization.
DATAOPS TRANSFORMATION
Each of the approaches described above can deliver DataOps benefits to the
enterprise. Nevertheless, it can be challenging to grow DataOps expertise
in-house without the benefit of mentorship. DataKitchen offers DataOps
Transformation Advisory Services that address DataOps methodologies, strategy,
tools automation, and cultural change.
4 tablespoons butter
Salt to taste
6-ounce roll of jalapeno cheese (or substitute Velveeta with 2 minced jalapenos), cut into
small pieces
Cayenne to taste
INSTRUCTIONS
1. Cook the spinach according to package directions. Drain and reserve the liquid from the
pot for the butter-flour roux in the next step.
2. Melt the butter in a saucepan over low heat. Add the flour, stirring constantly until blend-
ed and smooth, but not brown. Add the onions and cook until soft but not brown. Add the
milk and one-half cup of the reserved liquid from the spinach pot. Stir constantly to avoid
any lumps. Cook, stirring, until smooth and thick. Add the seasonings and cheese and stir
until the cheese is completely melted.
3. Pour into a casserole dish and top with buttered bread crumbs (optional).
Attribution: Spinach Madeline is from River Road Recipes, first published in 1959 by the Junior
League of Baton Rouge. From nola.com.
Years ago, prior to the advent of Agile development, a friend of mine worked as
a release engineer. His job was to ensure a seamless build and release process
for the software development team. He designed and developed builds, scripts,
installation procedures and managed the version control and issue tracking
systems. He played a mean mandolin at company parties too.
The role of release engineer was (and still is) critical to completing a successful
software release and deployment, but as these things go, my friend was valued less
than the software developers who worked beside him. The thinking went something
like this — developers could make or break schedules and that directly contributed
to the bottom line. Release engineers, on the other hand, were never noticed, unless
something went wrong. As you might guess, in those days the job of release engineer
was compensated less generously than development engineer. Often, the best people
vied for positions in development where compensation was better.
Whereas a release engineer used to work off in a corner tying up loose ends,
the DevOps engineer is a high-visibility role coordinating the development,
test, IT and operations functions. If a DevOps engineer is successful, the wall
between development and operations melts away and the dev team becomes
more agile, efficient and responsive to the market. This has a huge impact
on the organization’s culture and ability to innovate. With so much at stake,
it makes sense to get the best person possible to fulfill the DevOps engineer
role, and compensate them accordingly. When DevOps came along, the release
engineer went from fulfilling a secondary supporting role to occupying the
most sought after position in the department. Many release engineers have
successfully rebranded themselves as DevOps engineers and significantly
upgraded their careers.
Data engineers, data analysts, data scientists — these are all important roles,
but they will be valued even more under DataOps. Too often, data analytics
professionals are trapped into relying upon non-scalable methods: heroism,
hope or caution. DataOps offers a way out of this no-win situation.
The capabilities unlocked by DataOps impacts everyone that uses data analytics
— all the way to the top levels of the organization. DataOps breaks down the
barriers between data analytics and operations. It makes data more easily
accessible to users by redesigning the data analytics pipeline to be more flexible
and responsive. It will completely change what people think of as possible in
data analytics.
And watch out Data Scientist, the real sexiest job of the 21st century is
DataOps Engineer.
2 Green cardamom
3 Clove
1-Inch Cinnamon
1 Bay leaf
2 green chili
1.5 LB Chicken
Salt to taste
INSTRUCTIONS
1. In a pan, heat oil. Once the oil is hot, add cardamom, cloves, cinnamon, peppercorn, and bay leaf.
4. Add ginger and garlic and fry until the raw smell is gone.
5. Add green chilies and curry leaves and fry for a minute.
9. Add potato and carrot and cook until chicken and vegetables are done.
10. Add the remaining 1 cup coconut milk and cook for another 5 minutes.
11. Pour a little (1 teaspoon) coconut oil on top
Imagine that a Vice President of Marketing makes an urgent request to the data
analytics team: “I need new data on profitability ASAP.” At many organizations
the process for creating and deploying these new analytics would go something
like this:
1. The new requirement falls outside the scope of the development “plan of
record” for the analytics team. Changing the plan requires departmental meetings
and the approval of a new budget and schedule. Meetings ensue.
2. Padma, a Data Engineer, requests access to new data. The request goes on the
IT backlog. IT grants access after several weeks.
6. Once the fires are extinguished, Eric returns to testing on the target and
uncovers some issues in the analytics. Eric feeds error reports back to Padma.
She can’t easily reproduce the issues because the code doesn’t fail in the “dev”
environment. She spends significant effort replicating the errors so she can
address them. The cycle is repeated a few times until the analytics are debugged.
7. Analytics are finally ready for deployment. Production schedules the update.
The next deployment window available is in three weeks.
8. After several months have elapsed (total cycle time), the VP of Marketing
receives the new analytics, wondering why it took so long. This information could
have boosted sales for the current quarter if it had been delivered when she had
initially asked.
Every organization faces unique challenges, but the issues above are ubiquitous.
The situation we described is not meeting anyone’s needs. Data engineers went
to school to learn how to create analytic insights. They didn’t expect that it would
take six months to deploy twenty lines of SQL. The process is a complete hassle
for IT. They have to worry about governance and access control and their backlog
is entirely unmanageable. Users are frustrated because they wait far too long for
new analytics. We could go on and on. No one here is enjoying themselves.
The frustration sometimes expresses itself as conflict and stress. From the
outside, it looks like a teamwork problem. No one gets along. People are rowing
the boat in different directions. If managers want to blame someone, they will
point at the team leader.
At this point, a manager might try beer, donuts and trust exercises (hopefully
not in that order) to solve the “teamwork issues” in the group. Another common
mistake is to coach the group to work more slowly and carefully. This thinking
stems from the fallacy that you have to choose between quality and cycle time. In
reality, you can have both.
The development team receives its own separate but equivalent release
environment, managed by the third important member of our team; Chris, a
DataOps Engineer. Chris also implements the infrastructure that abstracts the
release environments so that analytics move easily between dev and production.
We’ll describe this further down below. Any existing team member, with DataOps
skills, can perform the DataOps engineering function, but in our simplified case
study, adding a person will better illustrate how the roles fit together.
Chris uses DataOps to create and implement the processes that enable successful
teamwork. This activity puts him right at the nexus between data analytics
development and operations. Chris is one of the most important and respected
members of the data team. He creates the mechanisms that enable work to flow
seamlessly from development to production. Chris makes sure that environments
are aligned and that everyone has the hardware, software, data, network and
other resources that they need. He also makes available software components,
created by team members, to promote reuse — a considerable multiplier of
productivity. In our simple example, Chris manages the tasks that comprise
the pre-release process. Padma appreciates having Chris on the team because
now she has everything that she needs to create analytics efficiently on a self-
service basis. Eric is happy because DataOps has streamlined deployment, and
expanded testing has raised both data and analytics quality. Additionally, there is
much greater visibility into the artifacts and logs related to analytics, whether in
development, pre-release or in production. It’s clear that Chris is a key player in
implementing DataOps. Let’s dive deeper into how it really works.
The processing pipelines for analytics consist of a series of steps that operate on
data and produce a result. We use the term “Pipeline” to encompass all of these
tasks. A DataOps Pipeline encapsulates all the complexity of these sequences,
performs the orchestration work, and tests the results. The idea is that any
analytic tool that is invokable under software control can be orchestrated by a
DataOps Pipeline. Kitchens enable team members to access, modify and execute
workflow Pipelines. A simple Pipeline is shown in Figure 4.
Pipelines, and the components that comprise them, are made visible within a
Kitchen. This encourages the reuse of previously developed analytics or services.
Code reuse can be a significant factor in reducing cycle time.
Kitchens also tightly couple to version control. When the development team
wants to start work on a new feature, they instantiate a new child Kitchen which
creates a corresponding Git branch. When the feature is complete, the Kitchen is
merged back into its parent Kitchen, initiating a Git merge. The Kitchen hierarchy
aligns with the source control branch tree. Figure 5 shows how Kitchen creation/
deletion corresponds to a version control branch and merge.
A DATAOPS PROCESS
Now let’s look at how to use a DataOps Platform to develop and deliver analytics with
minimal cycle time and unsurpassed quality. We’ll walk through an example of how
DataOps helps team members work together to deploy analytics into production.
Think back to the earlier request by the VP of Marketing for “new analytics.”
DataOps coordinates this multi-step, multi-person and multi-environment
workflow and manages it from inception to deployment.
The Agile Sprint meeting commits to the new feature for the VP of Marketing in the
upcoming iteration. The project manager creates a JIRA ticket.
In a few minutes, Padma creates a development Kitchen for herself and gets to work.
Chris has automated the creation of Kitchens to provide developers with the test data,
resources, and Git branch that they need. Padma’s Kitchen is called “dev_Kitchen” (see
Figure 6). If Padma takes a technical risk that doesn’t work out, she can abandon this
Kitchen and start over with a new one. That effectively deletes the first Git branch and
starts again with a new one.
Step 3 — Implementation
Padma’s Kitchen provides her with pipelines that serve as a significant head start
on the new profitability analytics. Padma receives the test data (de-identified) she
needs as part of Kitchen creation and configures toolchain access (SFTP, S3, Redshift,
…) for her Kitchen. Padma implements the new analytics by modifying an existing
Pipeline. She adds additional tests to the existing suite, checking that incoming data
is clean and valid. She writes tests for each stage of ETL/processing to ensure that
the analytics are working from end to end. The tests verify her work and will also run
as part of the production flow. Her new pipelines include orchestration of the data
and analytics as well as all tests. The tests direct messages and alerts to her Kitchen-
specific Slack channel. With the extensive testing, Padma knows that her work will
migrate seamlessly into production with minimal effort on Eric’s part. Now that
release environments have been aligned, she’s confident that her analytics work in the
target environment.
Before she hands off her code for pre-production staging, Padma first has to merge
down from “demo_dev” Kitchen so that she can integrate any relevant changes her
coworkers have made since her branch. She reruns all her tests to ensure a clean
merge. If there is a conflict in the code merge, the DataOps Platform will pop-up a
three panel UI to enable further investigation and resolution. When Padma is ready,
she updates and reassigns the JIRA ticket. If the data team were larger, the new
analytics could be handed off from person to person, in a line, with each person
adding their piece or performing their step in the process.
Step 4 — Pre-Release
In our simple example, Chris serves as the pre-release engineer. With a few clicks,
Chris merges Padma’s Kitchen “dev_Kitchen” back into the main development Kitchen
“demo_dev,” initiating a Git merge. After the merge, the Pipelines that Padma updated
are visible in Chris’ Kitchen. If Chris is hands-on, he can review Padma’s work, check
artifacts, rerun her tests, or even add a few tests of his own, providing one last step of
QA or governance. Chris creates a schedule that, once enabled, will automatically run the
new Pipeline every Monday at 6 am. When Chris is satisfied, he updates and reassigns
the JIRA ticket, letting Eric know that the feature is ready for deployment.
Eric easily merges the main development Kitchen “demo_dev” into the production
Kitchen, “demo_production,” corresponding to a Git merge. Eric can now see the new
Pipelines that Padma created. He inspects the test logs and reruns the new analytics
and tests to be 100% sure. The release environments match so the new Pipelines
work perfectly. He’s also happy to see tests verifying the input data using DataOps
statistical process control. Tests will detect erroneous data, before it enters the
production pipeline. When he’s ready, Eric enables the schedule that Chris created,
integrating the new analytics into the operations pipeline. DataOps redirects any Slack
messages generated by the new analytics to the production Slack channels.
The VP of Marketing sees the new customer segmentation and she’s delighted. She
then has an epiphany. If she could see this new data combined with a report that
Padma delivered last week, it could open up a whole new approach to marketing
— something that she is sure the competitors haven’t discovered. She calls the
analytics team and…back to Step 1.
DATAOPS BENEFITS
As our short example demonstrated, the DataOps Teamwork Process delivers
these benefits:
• Ease movement between team members with many tools and environments —
Kitchens align the production and development environment(s) and abstract
the machine, tools, security and networking resources underlying analytics.
Analytics easily migrate from one team member to another or from dev to
production. Kitchens also bind changes to source control.
• Collaborate and coordinate work — DataOps provides teams with the
compelling direction, strong structure, supportive context and shared mindset
that are necessary for effective teamwork.
• Automate work and reduce errors — Automated orchestration reduces process
variability and errors resulting from manual steps. Input, output and business
logic tests at each stage of the workflow ensure that analytics are working
correctly, and that data is within statistical limits. DataOps runs tests both in
development and production, continuously monitoring quality. Warnings and
errors are forwarded to the right person/channel for follow up.
• Maintain security — Kitchens are secured with access control. Kitchens then
access a release environment toolchain using a security Vault which stores
unique usernames/passwords.
• Leverage best practices and re-use — Kitchens include Pipelines and other
reusable components which data engineers can leverage when developing new
features.
• Self-service — Data professionals can move forward without waiting for
resources or committee approval.
Manager: Good morning, everyone. I’m pleased to report that the VP of Marketing
called the CDO thanking him for a great job on the analytics last week.
Chris (DataOps Engineer): Once I set-up Kitchen creation, Padma was able to start
being productive immediately. With matching release environments, we quickly
migrated the new analytics from dev to production.
Eric (Production Engineer): The tests are showing that all data remains within
statistical limits. The dashboard indicators are all green.
DataOps helps our band of frustrated and squabbling data professionals achieve
a much higher level of overall team productivity by establishing processes and
providing resources that support teamwork. With DataOps, two key performance
parameters improve dramatically — the development cycle time of new analytics
and quality of data and analytics code. We’ve seen it happen time and time again.
What’s even more exciting is the business impact of DataOps. When users request
new analytics and receive them in a timely fashion, it initiates new ideas and
uncharted areas of exploration. This tight feedback loop can help analytics achieve
its true aim, stimulating creative solutions to an enterprise’s greatest challenges.
Now that’s teamwork!
INGREDIENTS
4 eggs
1 cup 1% milk
2 tablespoons flour
INSTRUCTIONS
1. Pre-heat oven to 350 degrees
2. Wisk eggs, milk, mayo, 4 grinds of sea salt & flour in a mixing bowl.
Data teams using inefficient, manual processes often find themselves working
frantically to keep up with the endless stream of analytics updates and the
exponential growth of data. If the organization also expects busy data scientists
and analysts to implement data governance, the work may be treated as an
afterthought, if not forgotten altogether. Enterprises using manual procedures
need to carefully rethink their approach to governance.
DATA GOVERNANCE
In her book, “Disrupting Data Governance: A Call to Action,” data governance
expert Laura Madsen envisions a more agile model for data governance by redirecting
the focus of governance towards value creation through promoting the usage of data
(figure 1). Instead of focusing on how to limit users, governance should be concerned
with promoting the safe and controlled use of data at scale. Data governance is then
more about active enablement than rule enforcement. In other words, can we design
data quality, management and protection workflows in such a way that they empower,
not limit, data usage? This can be done if we take a DataOps approach to governance.
When these various methodologies are backed by a technical platform and applied
to data analytics, it’s called DataOps. DataOps automation can enable a data
organization to be more agile. It reduces cycle time and virtually eliminates data
errors, which distract data professionals from their highest priority task – creating
new analytics that add value for the enterprise.
DATAGOVOPS
All of the new Ops terms (Figure 2) are simply an effort to run organizations in
a more iterative way. Enterprises seek to build automated systems to run those
iterations more efficiently. In data governance, this comes down to finding the right
balance between centralized control and decentralized freedom. When governance
is enforced through manual processes, policies and enforcement interfere with
freedom and creativity. With DataOps automation, control and creativity can coexist.
DataGovOps uniquely addresses the DataOps needs of data governance teams
who strive to implement robust governance without creating innovation-killing
bureaucracy. If you are a governance professional, DataGovOps will not put you
out of a job. Instead, you’ll focus on managing change in governance policies and
implementing the automated systems that enforce, measure, and report governance.
In other words, governance-as-code.
Governance is, first and foremost, concerned with policies and compliance. Some
governance initiatives are somewhat akin to policing traffic by handing out
speeding tickets. Focusing on violations positions governance in conflict with
analytics development. Data governance advocates can get much farther with
positive incentives and enablement rather than punishments.
1. Business Glossary & Data Catalog 1. Business Glossary & Data Catalog as Code
Figure 5 depicts a data pipeline that ingests data from sftp, builds facts and
dimensions, forecasts sales, visualizes data and updates a data catalog. Many
data organizations use a mix of tools across numerous locations and data
centers. They may use hybrid cloud with some centralized data teams and
decentralized development using self-service tools. Data lineage helps the data
team keep track of this end-to-end process. Which team owns which steps in
the process? Which tools are used? Who made changes and when?
DataGovOps records and organizes all of the metadata related to data – including the
code that acts on the data. Test results, timing data, data quality assessments and all
other artifacts generated by execution of the data pipeline document the lineage of data.
All metadata is stored in version control so that you have as complete a picture of your
data journey as possible. DataGovOps documents the exact process lineage of every tool
and step that happened along the data’s journey to value.
If your users see an error in charts, graphs or models, they won’t care whether
the error originated with data or the transformations that operate on that data.
DataGovOps tests the code that operates on data so that ETL operations and models
are validated during deployment and monitored in production.
All of this testing reduces errors to virtually zero, eliminating the stress and
embarrassment of having to explain mistakes. When analytics are correct, data is
trusted, and the data team has more time for the fun and innovative work that they
love doing.
Self-Service Sandboxes
Note that the self-service sandbox includes test data. Access to test data is a
significant pain point for many enterprises. It sometimes takes several months
to obtain clean, accurate, and privacy-aware test data that has passed security
checks. Once set-up, a self-service environment provides test data on demand.
The self-service sandbox enables data teams to deploy faster and lower their error
rate. This capability empowers them to iterate more quickly and find solutions
to business challenges. The provision of test data on demand is called Test Data
Management.
In data science and analytics, test data management (TDM) is the process of
managing the data necessary for fulfilling the needs of automated tests, with zero
human intervention (or as little as possible).
That means that the TDM solution is responsible for creating the required test
data, according to the requirements of the tests. It should also ensure that
the data is of the highest possible quality. Poor quality test data is worse than
having no data at all since it will generate results that can’t be trusted. Another
important requirement for test data is fidelity. Test data should resemble, as
closely as possible, the real data found in the production servers.
Finally, the TDM process must also guarantee the security and privacy of test
data. It’s no use to have high-quality data that is as realistic as possible but lack
secure, privacy-aware data for testing.
Figure 9: DataGovOps mission control view: The Data Arrival report enables you to track
data suppliers and quickly spot delivery issues.
CONCLUSION
The concept of governance as a policing function that restricts development
activity is out-moded and places governance at odds with freedom and
innovation. DataGovOps provides a better approach that actively promotes the
safe use of data with automation that improves governance while freeing data
analysts and scientists from manual tasks. DataGovOps is a prime example of how
DataOps can optimize the execution of workflows without burdening the team.
DataGovOps transforms governance into a robust, repeatable process that executes
alongside development and data operations.
Hangi is a traditional New Zealand Māori method of cooking food using umu, basically, a type of oven
made with heated rocks buried in a pit. Using meats like pork, beef, lamb and chicken this method is
usually used on special occasions.
Prep Time: 15 minutes — Cook Time: 8 hours — Total Time: 8 hours 156 mins. — Yield: 8 1x
INGREDIENTS
2 kg fatty pork (pork shoulder or belly) cut into large chunks
banana leaf
INSTRUCTIONS
1. Make your stuffing according to packet instructions, form into a large ball then set it aside.
2. Lay large banana leaf on a table, arrange bacon in one layer on the bottom, place stuffing ball in
the middle, then place meat, sweet potatoes and pumpkin. Season with smoked paprika, smoked
salt and freshly ground black pepper.
3. Wrap the meats and vegetables with the banana leaf then secure it with another wrap of aluminum
foil. Set it aside.
4. Using aluminum foil, crumple four small rolled-up aluminum foils then place them on the bottom of
the slow cooker. Pour enough water to cover the balls then place wrapped meat on top.
5. Cover with damp cloth on top with the sides hanging outside the slow cooker, slow cook for 8 hours
on low heat.
In the past couple of years, there has been a tremendous proliferation of acronyms
with the “Ops” suffix. This was started in the software space by the merger of
development (dev) and IT operations (Ops). Since then people have been creating
new Ops terms at a pretty rapid pace:
There are probably even more Ops terms out there (honestly, got tired of googling).
Naturally, people have found this confusing and have questioned whether all these
acronyms are necessary. As students of management methodology and lovers of
software tools, we thought we might take a stab at trying to sort this all out.
When terms point to the same team members and the same genre of tools, the
Ops terms are synonymous. For example, ModelOps, MLOps, and AnalyticOps
focus on the unique problems of data scientists creating, deploying and
maintaining models and AI-assisted analytics using ML and AI tools and
methods. Maybe the industry doesn’t need all three of these terms.
STAY LEAN
Whenever a term or acronym gains momentum, marketers go to great lengths
to associate their existing offerings with whatever is being hyped. Sometimes
that creates a backlash that drowns out some good ideas. You may believe that
you do not need a new Ops term or you may find that it helps to galvanize your
target audience and increases focus on the technical environment critical to
your projects. Stay focused on the goals of lean manufacturing. Anything that
eliminates errors, streamlines workflow processes, improves collaboration and
enhances transparency aligns with DevOps, DataOps and all the other possible
Ops’ that are out there.
INGREDIENTS
16 ounces (2 pkg) softened full fat cream cheese
6 tablespoons crunchy peanut butter (Ingredients should list just peanuts. No added sugar)
8 tablespoons Lakanto Monkfruit Sweetener with Erythritol or another granulated sugar substitute
with Erythritol such as Swerve (not one with maltodextrin or sucralose such as Splenda)
4 tablespoons Lily's Sugar-Free (stevia sweetened) Dark Chocolate Chips finely chopped
INSTRUCTIONS
1. Place chocolate chips in a food processor and chop finely. Add softened cream cheese, crunchy
peanut butter, sugar substitute, cocoa powder, and process until well combined. You can also chop
the chips by hand with a sharp knife and mix everything together in a bowl if you prefer.
2. With a small cookie scoop or a spoon scoop about one tablespoon and place onto a parch-
ment-lined baking sheet.
3. Freeze for 20-30 minutes to firm up. Remove and place in freezer bags to keep frozen.
4. Makes about 36 servings or you can make them larger for fewer servings.
Fiber 0.8g
Protein 1.6g
Fat 6.2g
Calories 67
Note: I suggest you make the recipe as written the first time and then adjust it according to your taste by
adding a little more or less sweetener, cocoa or peanut butter, although that may alter the macros.
As a result, vendors that market DataOps capabilities have grown in pace with the
popularity of the practice. To date, we count over 100 companies in the DataOps
ecosystem. However, the rush to rebrand existing products as related to DataOps
has created some marketplace confusion. Because it is such a new category,
both overly narrow and overly broad definitions of DataOps abound. As a result,
it is easy to get overwhelmed when trying to evaluate different solutions and
determine whether they will help you achieve your DataOps goals.
This sounds great and you are ready to get started, but the next big question is
how can your organization best achieve this transformation? How can you sift
through all the marketing speak and find the solutions that will truly help you?
When evaluating DataOps solutions, consider the following ways that companies
are marketing their capabilities.
The Data Toolchain – Many tools being marketed today as DataOps solutions
are simply independent components of the data toolchain that collect, store,
transform, visualize, and govern the data running through the pipeline. Although
all of these technologies play an important role in the value pipeline, they do not
ensure that each step in the data pipeline is executed and coordinated as a single,
integrated, and accurate process or help people and teams better collaborate.
Remember that a DataOps process automates the orchestration and testing of
these tools across the pipeline. In fact, in a true DataOps environment, it does
not matter which data tools you use. Your team can continue to use the ETL or
analytics tools they like best or add new tools at any time. Typically, components
of toolchain are being marketed as DataOps solutions in two different ways.
Data Process Tools – Data process and automation tools are being correctly
marketed as important components of a DataOps solution. You’ll need some
combination of these tools if you decide to implement DataOps yourself. Many
popular DevOps tools can also be used.
• Orchestration of end-to-end multi-tool, multi-environment pipelines can be
facilitated by tools like Apache Airflow or Saagie.
• Automated Testing and Monitoring at every step in production and
development pipelines is important to catch and address errors before they
reach the business user. iCEDQ is a leading testing and monitoring platform.
• Environment and Deployment technologies allow teams to spin-up self-
service work environments and innovate without breaking production. New
features can be deployed with the push of a button. There are a host of tools
built for this purpose, including well-known open-source tools such as Git
(version control), Docker (containerization), and Jenkins (CI/CD).
DataOps, when implemented correctly, holds exciting promise for data teams
to be able to reclaim control of their data pipelines and deliver value instantly
without errors. It is easy to get confused by all the marketing noise, but
remember that DataOps, at its core, is a collaborative process that orchestrates
data pipelines, automates testing and monitoring, and speeds new feature
deployment. Whether you use an all-in-one tool like DataKitchen or build it
yourself, the right combination of tools, processes, and people are critical to make
DataOps a success.
This tastes as good as the fudge they sell at tourist destinations. This is a multi-hour project and is
great for a rainy or snowy day. Make sure you have a candy thermometer and parchment paper
before you start.
INGREDIENTS
4 cups sugar
2 cups milk
INSTRUCTIONS
1. Line an 8-or 9-inch square pan with parchment paper.
2. Mix sugar, cocoa and salt in heavy 4-quart saucepan or larger; stir in milk. Cook over medium heat,
stirring constantly until mixture comes to full rolling boil. Boil, without stirring, until mixture reaches
234°F on candy thermometer or until small amount of mixture dropped into very cold water, forms a
soft ball which flattens when removed from water. This can take a while.
3. Remove from heat. Add butter and vanilla. DO NOT STIR. Cool at room temperature to 110°F (1-2
hours). Fold with wooden DataKitchen spoon until fudge thickens and just begins to lose some of its
gloss (about 7 minutes).
NOTE: For best results, do not double this recipe. The directions must be followed exactly. In the third step,
beat too little and the fudge is too soft. Beat too long and it becomes hard and sugary.
Leading software companies perform millions of code releases per year. Typical
data analytics organizations perform less than 10. This gap explains why most
data analytics projects fail to deliver. Without the capability to move at lightning
speed, data analytics can’t adapt to fast-paced markets and keep up with the
endless stream of requests generated by business users. Despite soaring levels of
investment, the percentage of organizations that describe themselves as “data-
driven” has fallen since 2017.
Software teams have faced similar challenges and found answers. The methods
that yielded tremendous improvements in software development productivity can
deliver similar results for data organizations. In the data industry, the process of
going from 10 releases per year to millions is called “DataOps.”
DATAOPS PLATFORM
A DataOps Platform unifies the end-to-end workflow and processes related to
data analytics planning, development and operations into a single, common
framework, improving overall collaboration. It incorporates your existing tools
into automated orchestrations that drive analytics creation and the transformation
of raw data to insights. The DataOps Platform accomplishes this goal by managing
the creation, deployment and production execution of analytics. DataOps
Platforms offer four fundamental capabilities:
INGREDIENTS
1 cup glutinous rice flour (aka sweet rice flour, Mochiko powder, or chapssalgaru)
1 tablespoon flour
1 tablespoon sugar
1 cup dried azuki beans (aka red beans, or pat) 7 ounces or 200 grams
¼ cup sugar
vegetable oil
• Combine glutinous rice flour, flour, kosher salt, baking soda, and melted butter in a large
bowl. Add hot water and mix with a wooden spoon for 1 minute.
• Knead the lump by hand for 2 minutes, until smooth. Put it in a plastic bag to keep it from
drying out.
• Wash the azuki beans in cold water and strain. Put them into a solid,
heavy-bottomed pot.
• Add 7 cups of water. Cover and boil for 30 minutes over medium-high heat.
• Turn off the heat and let the beans soak in the hot water for 30 minutes.
• Turn on the heat to medium and cook for 1 hour until the beans are very soft.
• Remove from the heat and mash the beans with a wooden spoon or potato masher.
• Set a strong mesh strainer over a large bowl and strain the paste through it to remove the
bean skins.
• Use your hands to squeeze every drop out of the skins as best you can. Discard the empty
skins and wash the strainer to use it again.
• Put the strainer over an empty bowl and line it with a clean cotton cloth. Strain the paste by
pouring it through the cloth and strainer.
• Lift up the edges of the cloth and gently squeeze it to force the all water through.
• When all the water has passed, you’ll be left with a solid lump of finely ground, cooked beans
inside the cloth.
• Put it into the pot, and turn the on heat to medium-high. Add sugar, rice syrup, kosher salt,
and vanilla extract.
• Stir well with a wooden spoon for about 6 to 7 minutes until the bean paste moves together as
a lump. Remove from the heat and let cool.
• Use about 200 grams (7 ounces) of the red bean paste for this recipe and freeze rest for
another day.
• Divide the pasta into 10 pieces and roll each piece into a smooth ball. Cover with plastic wrap so
they don’t dry out while you work.
• Divide the dough into 10 pieces (each one about 1 ounce, or 28 grams) and roll each piece into a
smooth ball. Cover with plastic wrap.
• Put one of the dough balls on the cutting board and flatten it out with your hand into a disk about
2½ inches in diameter. Make a circle with your thumb and forefinger and put the disk on top of it.
• With your other hand put one red bean paste ball in the center of the disk and push and pull the
dough around it, so the red bean ball is completely covered by the dough.
• Seal the dough gently and tightly around the red bean, and softly roll the ball on your cutting
board to smooth out any lumps. Repeat this with the rest of the dough and red beans to make 10
balls.
• I usually use my 7-inch stainless steel saucepan with 3 cups of oil and fry 5 balls at a time to save
on oil, but you can use more oil and fry them all at once in a larger pan if you want.
• Fry the balls for 6 to 7 minutes over medium-low heat, until light golden brown. As they fry, stir
gently with a wooden spoon so they’re cooked evenly and don’t stick to the bottom of the pot.
Serve:
• Roll in sugar to coat, and serve. Finish in several hours, for the best chewiness!
Attribution: Maangchi
Previously, Chris was Regional Vice President in the Revenue Management Intelli-
gence group in Model N. Before Model N, Chris was COO of LeapFrogRx, a descriptive
and predictive analytics software and service provider. Chris led the acquisition of
LeapFrogRx by Model N in January 2012. Prior to LeapFrogRx Chris was CTO and VP
of Product Management of MarketSoft (now part of IBM) an innovative Enterprise
Marketing Management software. Prior to that, Chris developed Microsoft Passport,
the predecessor to Windows Live ID, a distributed authentication system used by 100s
of Millions of users today. He was awarded a US Patent for his work on that project.
Before joining Microsoft, he led the technical architecture and implementation of
Firefly Passport, an early leader in Internet Personalization and Privacy. Microsoft
subsequently acquired Firefly. Chris led the development of the first travel-related
e-commerce web site at NetMarket. Chris began his career at the Massachusetts
Institute of Technology’s (MIT) Lincoln Laboratory and NASA Ames Research Center.
There he created software and algorithms that provided aircraft arrival optimization
assistance to Air Traffic Controllers at several major airports in the United States.
Chris served as a Peace Corps Volunteer Math Teacher in Botswana, Africa. Chris has
an M.S. from Columbia University and a B.S. from the University of Wisconsin-
Madison. He is an avid cyclist, hiker, reader, and father of two college age children.
Eran Strod is a Marketing Chef at DataKitchen where he writes white papers, case
studies and contributes to the DataOps blog. He is passionate about applying
process-oriented management science to data and analytics.
Eran was previously Director of Marketing for Atrenne Integrated Solutions (now
Celestica) and has held product marketing and systems engineering roles at
Curtiss-Wright, Black Duck Software (now Synopsys), Mercury Systems, Motorola
Computer Group (now Artesyn), and Freescale Semiconductor (now NXP), where
he was a contributing author to the book “Network Processor Design, Issues and
Practices.” Eran began his career as a software developer at CSPi working in the
field of embedded computing.
Eran holds a B.A. in Computer Information Science and Psychology from the
University of California at Santa Cruz (Stevenson College) and an M.B.A. from
Northeastern University. He is a proud dad and enjoys hiking, travel and watching
the New England Patriots.
2 Eggs
¼ cup Butter
INSTRUCTIONS
1. Beat eggs with salt and pepper in a medium bowl. Combine breadcrumbs and cheese in a small
bowl.
6. Cook over medium heat 2-3 min. each side or until chicken has a light golden crust.
10. If the sauce looks too dry, add a little more wine.
INGREDIENTS — Serves 2 to 3
2 tablespoons milk
½ cup water
¾ cup peeled sweet potato, sliced into ¼ inch thick bite-size pieces.
½ cup water
MARINATE CHICKEN:
• Combine the chicken, milk, soy sauce, and ground black pepper in a bowl and mix all together with
a spoon.
• Add onion, carrot, green chili pepper, sweet potato, rice cake, and perilla leaves in that order.
• Add the chicken in the center. Pour the seasoning sauce over the chicken and spread it with a
wooden spoon. Add 1/2 cup water.
• Cover and cook for 3 to 4 minutes over medium-high heat until it starts boiling. Turn down the heat
to medium. Open and stir with a (DataKitchen) wooden spoon so that the pan doesn’t burn and
the ingredients and sauce mix evenly. Cover and cook another 13 to 15 minutes over medium heat,
stirring occasionally until the chicken and sweet potato are cooked thoroughly.
• Keep the heat low during the meal. Cook, stir, eat, and talk. The pieces will be hot, so be careful!
Turn off the heat when the chicken and potato are totally cooked.
• Give a bowl to each diner. They can each take some out of the pan into their bowl, and eat. When
it’s almost totally finished, make some fried rice by adding some rice and chopped kimchi to what’s
left on the grill. Stir with a wooden spoon over medium heat for a few minutes. Serve in separate
bowls, or give everyone a spoon and let them eat from the pan together.
Attribution: Maangchi
This recipe is from the 1960s where you combine processed ingredients into a fast and easy dish.
INGREDIENTS
1-2 Chickens cut in quarters or pieces or parts that you like
INSTRUCTIONS
1. Mix ingredients in a baking dish
2. Roll chicken in it
48 Lady Fingers
INSTRUCTIONS
1. Place a mixing bowl in the freezer to chill. Combine 3 cups of strong, cooled, brewed coffee and ¼
cup of Kahlua in a container. Set aside.
2. Using a stand mixer combine mascarpone, cream cheese, white sugar, brown sugar, and remain-
ing ¼ cup of Kahlua. Beat until smooth.
3. Dip a ladyfinger into the coffee-Kahlua mixture. Place the ladyfinger dipped-side down in a 13" x 9"
pan. Repeat until the bottom of the pan is covered in dipped ladyfingers.
4. Spread a layer of the cheese mixture over the ladyfingers. Dust with cocoa powder. Repeat for a
second layer. Set aside.
5. Remove the chilled mixing bowl from the freezer; using a stand mixer combine heavy whipping
cream, vanilla and confectioners’ sugar. Using the whisk attachment beat on medium speed until
soft peaks form and the mixture is firm. Do not overbeat.
6. Spread whipped cream on top of the pan containing the ladyfinger/cheese mixture. Dust with
cocoa powder.
Keep refrigerated.
The authors of “Recipes for DataOps” have understood what most of the industry has yet to
learn - the key to data success lies not in having large data science teams or the latest
machinery, components, and tools, but in establishing efficient, value-driven work
processes. This book is a great step-by-step guide to unlocking that capability and
achieving a DataOps culture - the data- and AI equivalent of Lean manufacturing. It is
a long journey, but rewarding from the start. This book is one of the few good DataOps
guides available, and I recommend it to everyone that is working with data on a daily
basis - data engineers, analysts, data scientists, product owners, and data team
managers. Moreover, that Maori slow-cooked pork seems delicious.
— Lars Albertsson
Founder of Scling
Chris Bergh, Eran Strod, and James Royster have written a unique book that is the go-to
guide for DataOps transformation. It covers an impressive breadth and scope of
topics and explains them in a highly accessible way. If your organization is in any way
struggling to deliver high-quality data analytics at speed you owe it to yourself to read
this book, there is something to learn for everyone.
— Harvinder Atwal
Author, Practical DataOps: Delivering Agile Data Science at Scale
DataOps is one of the most important innovations in the data industry in the last decade.
It will transform how your organization delivers analytic capabilities, drives value, and
shifts to data-supported decisions. The latest book from DataKitchen is the “how-to”
manual that you need to start your DataOps transformation.
— Laura Madsen
Author, Disrupting Data Governance
Takes the path to success with DataOps to a whole new level of understanding. There are
so many actionable insights in the book.
— Jesse Anderson
Author, Data Teams
This book is a great read and really important for any organization that wants to transform
with DataOps rather than tinker around the edges. The book covers important concepts
critical to the success of DataOps such as the Theory of Constraints, Process Measurement,
and DataGovOps. It clarifies the full approach – business requirements first, tools second
– so you are not creating more constraints before you have even started. This is a
must-read for anyone open to finding better ways of working through DataOps.
— Simon Trewin
Author, The Dataops Revolution: Delivering the Data-Driven Enterprise
www.datakitchen.io
182 • Recipes for DataOps Success