Devopssec
Devopssec
Jim Bird
Beijing
Tokyo
DevOpsSec
by Jim Bird
Copyright 2016 OReilly Media, Inc. All rights reserved.
Printed in the United States of America.
Published by OReilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA
95472.
OReilly books may be purchased for educational, business, or sales promotional use.
Online editions are also available for most titles (http://safaribooksonline.com). For
more information, contact our corporate/institutional sales department:
800-998-9938 or corporate@oreilly.com.
First Edition
978-1-491-95899-5
[LSI]
Table of Contents
7
8
9
9
11
12
13
15
19
20
21
22
23
24
25
26
29
iii
31
32
36
41
46
54
57
5. Compliance as Code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Defining Policies Upfront
Automated Gates and Checks
Managing Changes in Continuous Delivery
Separation of Duties in the DevOps Audit Toolkit
Using the Audit Defense Toolkit
Code Instead of Paperwork
70
70
71
72
73
73
iv
Table of Contents
CHAPTER 1
Introduction
Some people see DevOps as another fad, the newest new-thing over
hyped by Silicon Valley and by enterprise vendors trying to stay rel
evant. But others believe it is an authentically disruptive force that is
radically changing the way that we design, deliver, and operate sys
tems.
In the same way that Agile and Test-Driven Development (TDD)
and Continuous Integration has changed the way that we write code
and manage projects and product development, DevOps and Infra
structure as Code and Continuous Delivery is changing IT service
delivery and operations. And just as Scrum and XP have replaced
CMMi and Waterfall, DevOps is replacing ITIL as the preferred way
to manage IT.
DevOps organizations are breaking down the organizational silos
between the people who design and build systems and the people
who run themsilos that were put up because of ITIL and COBIT
to improve control and stability but have become an impediment
when it comes to delivering value for the organization.
continuous-deployment-at-imvu-doing-the-impossible-fifty-times-a-day/
About DevOps
This paper is written for security analysts, security engineers, pen
testers, and their managers who want to understand how to make
security work in DevOps. But it also can be used by DevOps engi
neers and developers and testers and their managers who want to
understand the same thing.
You should have a basic understanding of application and infra
structure security as well as some familiarity with DevOps and
Agile development practices and tools, including Continuous Inte
gration and Continuous Delivery. There are several resources to
help you with this. Some good places to start:
The Phoenix Project by Gene Kim, Kevin Behr, and George
Spafford is a good introduction to the hows and whys of
DevOps, and is surprisingly fun to read.
Watch 10+ Deploys per Day, John Allspaw and Paul Ham
monds presentation on Continuous Deployment, which intro
duced a lot of the world to DevOps ideas back in 2009.2
And, if you want to understand how to build your own Contin
uous Delivery pipeline, read Continuous Delivery: Reliable Soft
ware Releases through Build, Test, and Deployment Automation
by Jez Humble and Dave Farley.
Introduction
CHAPTER 2
1 AWS re:Invent 2015 | (DVO202) DevOps at Amazon: A Look at Our Tools and Pro
cesses. https://www.youtube.com/watch?v=esEFaY0FDKc
10
Microservices
Microservices are another part of many DevOps success stories.
Microservicesdesigning small, isolated functions that can be
changed, tested, deployed, and managed completely independently
lets developers move fast and innovate without being held up by
the rest of the organization. This architecture also encourages devel
opers to take ownership for their part of the system, from design to
delivery and ongoing operations. Amazon and Netflix have had
remarkable success with building their systems as well as their
organizations around microservices.
But the freedom and flexibility that microservices enable come with
some downsides:
Operational complexity. Understanding an individual microser
vice is simple (thats the point of working this way). Under
standing and mapping traffic flows and runtime dependencies
between different microservices, and debugging runtime prob
lems or trying to prevent cascading failures is much harder. As
Michael Nygard says: An individual microservice fits in your
head, but the interrelationships among them exceed any
humans understanding.
Attack surface. The attack surface of any microservice might be
tiny, but the total attack surface of the system can be enormous
and hard to see.
Unlike a tiered web application, there is no clear perimeter, no
obvious choke points where you can enforce authentication or
access control rules. You need to make sure that trust bound
aries are established and consistently enforced.
The polyglot programming problem. If each team is free to use
what they feel are the right tools for the job (like at Amazon), it
can become extremely hard to understand and manage security
risks across many different languages and frameworks.
Unless all of the teams agree to standardize on a consistent
activity logging strategy, forensics and auditing across different
services with different logging approaches can be a nightmare.
Microservices
11
Containers
ContainersLXC, rkt, and (especially) Dockerhave exploded in
DevOps.
Container technologies like Docker make it much easier for devel
opers to package and deploy all of the runtime dependencies that
their application requires. This eliminates the works on my
machine configuration management problem, because you can ship
the same runtime environment from development to production
along with the application.
Using containers, operations can deploy and run multiple different
stacks on a single box with much less overhead and less cost than
using virtual machines. Used together with microservices, this
makes it possible to support microsegmentation; that is, individual
microservices each running in their own isolated, individually man
aged runtime environments.
Containers have become so successful, Docker in particular, because
they make packaging and deployment workflows easy for developers
and for operations. But this also means that it is easy for developers
and operationsto introduce security vulnerabilities without
knowing it.
The ease of packaging and deploying apps using containers can also
lead to unmanageable container sprawl, with many different stacks
(and different configurations and versions of these stacks) deployed
across many different environments. Finding them all (even know
ing to look for them in the first place), checking them for vulnera
bilities, and making sure they are up-to-date with the latest patches
can become overwhelming.
And while containers provide some isolation and security protec
tion by default, helping to reduce the attack surface of an applica
tion, they also introduce a new set of security problems. Adrian
Mouat, author of Using Docker, lists five security concerns with
using Docker that you need to be aware of and find a way to man
age:
Kernel exploit
The kernel is shared between the host and all of the kernels,
which means that a vulnerability in the kernel exposes every
thing running on the machine to attack.
12
13
2 http://www.kitchensoap.com/2009/06/23/slides-for-velocity-talk-2009/
14
Change Control
How can you prove that changes are under control if developers are
pushing out changes 10 or 50 times each day to production? How
does a Change Advisory Board (CAB) function in DevOps? How
and when is change control and authorization being done in an
environment where developers push changes directly to production?
How can you prove that management was aware of all these changes
before they were deployed?
ITIL change management and the associated paperwork and meet
ings were designed to deal with big changes that were few and far
between. Big changes require you to work out operational depen
dencies in advance and to understand operational risks and how to
mitigate them, because big, complex changes done infrequently are
Change Control
15
risky. In ITIL, smaller changes were the exception and flowed under
the bar.
DevOps reverses this approach to change management, by optimiz
ing for small and frequent changesbreaking big changes down to
small incremental steps, streamlining and automating how these
small changes are managed. Compliance and risk management need
to change to fit with this new reality.
16
CHAPTER 3
Now lets look at how to solve these problems and challenges, and
how you can wire security and compliance into DevOps.
17
and you can fix it, fix it and ship the fix out right away. Security
engineers dont throw problems over the wall to dev or ops if
they dont have to. They work with other teams to understand
problems and get them fixed, or fix the problem themselves if
they can. Everyone uses the Continuous Deployment pipelines
and the same tools to push changes out to production, includ
ing the security team.
Security cannot be a blocker. The word No is a finite resource
use it only when you must. Securitys job is to work with
development and operations to help them to deliver, but
deliver safely. This requires security to be practical and make
realistic trade-offs when it comes to security findings. Is this
problem serious enough that it needs to block code from ship
ping now? Can it be fixed later? Or, does it really need to be
fixed at all? Understand the real risk to the system and to the
organization and deal with problems appropriately. By not cry
ing wolf, the security team knows that when serious problems
do come up, they will be taken seriously by everyone.
Etsys security team takes a number of steps to build relationships
between the security team and engineering.
Designated Hackers is a system by which each security engineer
supports four or five development teams across the organization
and are involved in design and standups. The security engineer
tries to understand what these teams are doing and raise a signal if
a security risk or question comes up that needs to be resolved. They
act as a channel and advocate between security and the develop
ment teams. This helps to build relationships, and builds visibility
into design and early stage decisionswhen security matters most.
Every new engineering hire participates in one-week boot camps
where they can choose to work with the security team to under
stand what they do and help to solve problems. And each year
every engineer does a senior rotation where they spend a month
with another team and can choose to work with the security team.
These initiatives build understanding and relationships between
organizations and seed security champions in engineering.1
18
19
20
Secure by Default
Shifting Security Left begins by making it easy for engineers to write
secure code and difficult for them to make dangerous mistakes, wir
ing secure defaults into their templates and frameworks, and build
ing in the proactive controls listed previously. You can prevent SQL
injection at the framework level by using parameterized queries,
hide or simplify the output encoding work needed to protect appli
cations from XSS attacks, enforce safe HTTP headers, and provide
simple and secure authentication functions. You can do all of this in
ways that are practically invisible to the developers using the frame
work.
Secure by Default
21
22
tive findings. Security checks become just another part of their cod
ing cycle.2
2 Put your Robots to Work: Security Automation at Twitter. OWASP AppSec USA
2012, https://vimeo.com/54250716
23
24
3 http://www.slideshare.net/jallspaw/go-or-nogo-operability-and-contingency-planning-at-
etsycom
25
26
27
CHAPTER 4
Continuous Delivery
Agile ideas and principlesworking software over documentation,
frequent delivery, face-to-face collaboration, and a focus on techni
cal excellence and automationform the foundation of DevOps.
And Continuous Delivery, which is the control framework for
DevOps, is also built on top of a fundamental Agile development
practice: Continuous Integration.
In Continuous Integration, each time a developer checks in a code
change, the system is automatically built and tested, providing fast
and frequent feedback on the health of the code base. Continuous
Delivery takes this to the next step.
29
Continuous Delivery is not just about automating the build and unit
testing, which are things that the development team already owns.
Continuous Delivery is provisioning and configuring test environ
ments to match production as closely as possibleautomatically.
This includes packaging the code and deploying it to test environ
ments; running acceptance, stress, and performance tests, as well as
security tests and other checks, with pass/fail feedback to the team,
all automatically; and auditing all of these steps and communicating
status to a dashboard. Later, you use the same pipeline to deploy the
changes to production.
Continuous Delivery is the backbone of DevOps and the engine that
drives it. It provides an automated framework for making software
and infrastructure changes, pushing out software upgrades, patches,
and changes to configuration in a way that is repeatable, predictable,
efficient, and fully audited.
Putting a Continuous Delivery pipeline together requires a high
degree of cooperation between developers and operations, and a
much greater shared understanding of how the system works, what
production really looks like, and how it runs. It forces teams to
begin talking to one another, exposing and exploring details about
how they work and how they want to work.
There is a lot of work that needs to be done: understanding depen
dencies, standardizing configurations, and bringing configuration
into code; cleaning up the build (getting rid of inconsistencies, hard
coding, and jury rigging); putting everything into version control
application code and configuration, binary dependencies, infra
structure configuration (recipes, manifests, playbooks, CloudFor
mation templates, and Dockerfiles), database schemas, and
configurations for the Continuous Integration/Continuous Delivery
pipeline itself; and, finally, automating testing (getting all of the
steps for deployment together and automating them carefully). And
you may need to do all of this in a heterogeneous environment, with
different architectures and technology platforms and languages.
30
31
the tooling on its own using scripts and simple workflow conven
tions, before todays DevOps tools were available.
32
33
Precommit
These are the steps before and until a change to software or configu
ration is checked in to the source code repo. Additional security
checks and controls to be added here include the following:
Lightweight, iterative threat modeling and risk assessments
Static analysis (SAST) checking in the engineers IDE
Peer code reviews (for defensive coding and security vulnerabil
ities)
1 For software that is distributed externally, this should involve signing the code with a
code-signing certificate from a third-party CA. For internal code, a hash should be
enough to ensure code integrity.
34
Acceptance Stage
This stage is triggered by a successful commit. The latest good com
mit build is picked up and deployed to an acceptance test environ
ment. Automated acceptance (functional, integration, performance,
and security) tests are executed. To minimize the time required,
these tests are often fanned out to different test servers and executed
in parallel. Following a fail fast approach, the more expensive and
time-consuming tests are left until as late as possible in the test
cycle, so that they are only executed if other tests have already
passed.
Security controls and tests in this stage include the following:
Secure, automated configuration management and provisioning
of the runtime environment (using tools like Ansible, Chef,
Puppet, Salt, and/or Docker). Ensure that the test environment
is clean and configured to match production as closely as possi
ble.
Automatically deploy the latest good build from the binary arti
fact repository.
Smoke tests (including security tests) designed to catch mistakes
in configuration or deployment.
Targeted dynamic scanning (DAST).
Automated functional and integration testing of security fea
tures.
Automated security attacks, using Gauntlt or other security
tools.
Deep static analysis scanning (can be done out of band).
Fuzzing (of APIs, files). This can be done out of band.
Manual pen testing (out of band).
35
2 Agile Security Field of Dreams. Laksh Raghavan, PayPal, RSA Conference 2016.
https://www.rsaconference.com/events/us16/agenda/sessions/2444/agile-security-field-ofdreams
3 At Netflix, where they follow a similar risk-assessment process, this is called the paved
road, because the path ahead should be smooth, safe, and predictable.
37
38
new ports or new APIs, adding new data stores, making calls
out to new services?
Are you changing authentication logic or access control rules or
other security plumbing?
Are you adding data elements that are sensitive or confidential?
Are you changing code that has anything to do with secrets or
sensitive or confidential data?
Answering these questions will tell you when you need to look more
closely at the design or technology, or when you should review and
verify trust assumptions. The key to threat modeling in DevOps is
recognizing that because design and coding and deployment are
done continuously in a tight, iterative loop, you will be caught up in
the same loops when you are assessing technical risks. This means
that you can makeand you need to makethreat modeling effi
cient, simple, pragmatic, and fast.
and a lot of this code has serious problems in it. Sonatype looked at
17 billion download requests from 106,000 different organizations
in 2014. Heres what it found:
Large software and financial services companies are using an aver
age of 7,600 suppliers. These companies sourced an average of
240,000 software parts in 2014, of which 15,000 included known
vulnerabilities.
39
40
41
42
With just a little training, developers can learn to look out for bad
practices like hardcoding credentials or attempts at creating custom
crypto. With more training, they will be able to catch more vulnera
bilities, early on in the process.
In some cases (for example, session management, secrets handling,
or crypto), you might need to bring in a security specialist to exam
ine the code. Developers can be encouraged to ask for security code
reviews. You can also identify high-risk code through simple static
code scanning, looking for specific strings such as credentials and
dangerous functions like crypto functions and crypto primitives.
To identify high-risk code, Netflix maps out call sequences for
microservices. Any services that are called by many other services or
that fan out to many other services are automatically tagged as high
risk. At Etsy, as soon as high-risk code is identified through reviews
or scanning, they hash it and create a unit test that automatically
alerts the security team when the code hash value has been changed.
Code review practices also need to be extended to infrastructure
codeto Puppet manifests and Chef cookbooks and Ansible play
books, Dockerfiles, and CloudFormation templates.
43
will sometimes miss, and that might be hard to find through other
kinds of testing.
But rather than relying on a centralized security scanning factory
run by infosec, DevOps organizations like Twitter and Netflix
implement self-service security scanning for developers, fitting
SAST scanning directly into different places along the engineering
workflow.
Developers can take advantage of built-in checkers in their IDE,
using plug-ins like FindBugs or Find Security Bugs, or commercial
plug-ins from Coverity, Klocwork, HPE Fortify, Checkmarx, or
Cigitals SecureAssist to catch security problems and common cod
ing mistakes as they write code.
You can also wire incremental static analysis precommit and com
mit checks into Continuous Integration to catch common mistakes
and antipatterns quickly by only scanning the code that was
changed. Full system scanning might still be needed to catch inter
procedural problems that some incremental scans cant find. You
will need to run these scans, which can take several hours or some
times days to run on a large code base, outside of the pipeline. But
the results can still be fed back to developers automatically, into
their backlog or through email or other notification mechanisms.
Different kinds of static code scanning tools offer different value:
Tools that check for code consistency, maintainability, and
clarity (PMD and Checkstyle for Java, Ruby-lint for Ruby) help
developers to write code that is easier to understand, easier to
change, easier to review, and safer to change.
Tools that look for common coding bugs and bug patterns
(tools like FindBugs and RuboCop) will catch subtle logic mis
takes and errors that could lead to runtime failures or security
vulnerabilities.
Tools that identify security vulnerabilities through taint analysis,
control flow and data flow analysis, pattern analysis, and other
techniques (Find Security Bugs, Brakeman) can find many com
mon security issues such as mistakes in using crypto functions,
configuration errors, and potential injection vulnerabilities.
You should not rely on only one tooleven the best tools will catch
only some of the problems in your code. Good practice would be to
44
run at least one of each kind to look for different problems in the
code, as part of an overall code quality and security program.
There are proven SAST tools available today for popular languages
like Java, C/C++, and C#, as well as for common frameworks like
Struts and Spring and .NET, and even for some newer languages and
frameworks like Ruby on Rails. But its difficult to find tool support
for other new languages such as Golang, and its especially difficult
to find for dynamic scripting languages. Most static analyzers, espe
cially open source tools, for these languages are still limited to lint
ing and basic checking for bad practices, which helps to make for
better code but arent enough to ensure that your code is secure.
Static analysis checking tools for configuration management lan
guages (like Foodcritic for Chef or puppet-lint for Puppet) are also
limited to basic checking for good coding practices and some com
mon semantic mistakes. They help to ensure that the code works,
but they wont find serious security problems in your system config
uration.
SonarQube
SonarQube wraps multiple SAST scanning tools for multiple lan
guages. It originally wrapped open source tools, and now includes
proprietary checkers written by SonarSource. Some of these check
ers, for languages like Objective-C and Swift, C/C++, and other leg
acy languages are only available in the commercial version. But
there is good support in the open source version of SonarQube for
Java, JavaScript, PHP, and other languages like Erlang.
One of SonarQubes main purposes is to assess and track technical
debt in applications. This means that most of the code checkers are
focused on maintainability (for style and coding conventions) as
well as for coding correctness and common bug patterns. However,
SonarSource has recently started to include more security-specific
checkers, especially for Java.
SonarQube runs in Continuous Integration/Continuous Delivery,
with plug-ins for Jenkins and GitHub. You can set quality gates and
automatically notify the team when the quality gates fail. It collects
metrics and provides reports and dashboards to analyze these met
rics over time, to identify where bugs are clustered and to compare
metrics across projects.
45
To ensure that the feedback loops are effective, its important to tune
these tools to minimize false positives and provide developers with
clear, actionable feedback on real problems that need to be fixed.
Noisy checkers that generate a lot of false positives and that need
review and triage can still be run periodically and the results fed
back to development after they have been picked through.
fuzzing-at-scale.html
47
Then, write negative tests for these cases, tests which prove that
unauthenticated users cant access admin functions, that a user cant
see or change information for a different account, that they cant
tamper with a return value, and so on. If these tests fail, something
is seriously broken and you want to learn about it as early as possi
ble.
Automated Attacks
Even with these tests in place, you should still go further and try to
attack your system. Bad guys are going to attack your system, if they
havent done so already. You should try to beat them to it.
There are a few test frameworks that are designed to make this easy
and that behave well in Continuous Integration and Continuous
Delivery:
Gauntlt
Mittn
BDD-Security
49
Using one of these tools, you will be able to set up and run a basic
set of targeted automated pen tests against your system as part of
your automated test cycle.
50
51
52
Vulnerability Management
Infosec needs their own view into the pipeline and into the system,
and across all of the pipelines and systems and portfolios, to track
vulnerabilities, assess risk, and understand trends. You need metrics
for compliance and risk-management purposes, to understand
where you need to prioritize your testing and training efforts and to
assess your application security program.
Collecting data on vulnerabilities lets you ask some important ques
tions:
How many vulnerabilities have you found?
How were they found? What tools or testing approaches are giv
ing you the best returns?
What are the most serious vulnerabilities?
How long are they taking to get fixed? Is this getting better or
worse over time?
You can get this information by feeding security testing results from
your Continuous Delivery pipelines into a vulnerability manager,
such as Code Dx or ThreadFix.
ThreadFix
ThreadFix is a vulnerability management tool (available in open
source and enterprise versions) that consolidates vulnerability
information to provide a view into vulnerability risks and remedia
tion across tools, pipelines, and appsand over time. ThreadFix
automatically takes vulnerability findings from SAST and DAST
tools (and manual pen tests), deduplicates the results, and lets an
analyst review and triage vulnerabilities and easily turn them into
bug reports that can be fed back into a developers bug tracking sys
tem or IDE. ThreadFix is designed to facilitate the feedback loop
from testing to developers while providing analytical and reporting
tools to security analysts so that they can compare vulnerability
risks across a portfolio and track program performance such as
time to remediation.
The open source engine includes integration with different testing
tools, Continuous Integration/Continuous Delivery servers, and
development toolsets. The commercial enterprise version offers
53
54
Hardening.io
Hardening.io is an open source infrastructure hardening frame
work from Deutsche Telekom for Linux servers. It includes practi
cal hardening steps for the base OS and common components such
as ssh, Apache, nginx, mysql, and Postgres.
Hardening templates are provided for Chef and Puppet as well as
Ansible (only base OS and ssh is currently implemented in Ansi
ble). The hardening rules are based on Deutsche Telekoms internal
guidelines, BetterCrypto, and the NSA hardening guide.
55
Security in Production
Security doesnt end after systems are in production. In DevOps,
automated security checks, continuous testing, and monitoring
feedback loops are integral parts of production operations.
Security in Production
57
58
59
Signal Sciences
Signal Sciences is a tech startup that offers a next-generation SaaSbased application firewall for web systems. It sets out to Make
security visible by providing increased transparency into attacks in
order to understand risks. It also provides the ability to identify
anomalies and block attacks at runtime.
Signal Sciences was started by the former leaders of Etsys security
team. The firewall takes advantage of the ideas and techniques that
they developed for Etsy. It is not signature-based like most web
application firewalls (WAFs). It analyzes traffic to detect attacks,
and aggregates attack signals in its cloud backend to determine
when to block traffic. It also correlates attack signals with runtime
errors to identify when the system might be in the process of being
breached.
Attack data is made visible to the team through dashboards, alert
notifications over email, or through integration with services like
Slack, HipChat, PagerDuty, and Datadog. The dashboards are built
API-first so that data can be integrated into log analysis tools like
Splunk or ELK, or into tools like ThreadFix or Jira.
The firewall and its rules engine are being continuously improved
and updated, through Continuous Delivery.
Runtime Defense
If you cant successfully shift security left, earlier into design and
coding and Continuous Integration and Continuous Delivery, youll
need to add more protection at the end, after the system is in pro
duction. Network IDS/IPS solutions tools like Tripwire or signaturebased WAFs arent designed to keep up with rapid system and
technology changes in DevOps. This is especially true for cloud IaaS
60
Security in Production
61
Alert Logic
CloudPassage Halo
Dome9 SecOps
Evident.io
Illumio
Palerra LORIC
Threat Stack
Another kind of runtime defense technology is Runtime Application
Security Protection/Self-Protection (RASP), which uses run-time
instrumentation to catch security problems as they occur. Like
application firewalls, RASP can automatically identify and block
attacks. And like application firewalls, you can extend RASP to leg
acy apps for which you dont have source code.
But unlike firewalls, RASP is not a perimeter-based defense. RASP
instruments the application runtime code and can identify and
block attacks at the point of execution. Instead of creating an
abstract model of the code (like static analysis tools), RASP tools
have visibility into the code and runtime context, and use taint anal
ysis and data flow and control flow and lexical analysis techniques,
directly examining data variables and statements to detect attacks.
This means that RASP tools have a much lower false positive (and
false negative) rate than firewalls.
You also can use RASP tools to inject logging and auditing into leg
acy code to provide insight into the running application and attacks
against it. They trade off runtime overheads and runtime costs
against the costs of making coding changes and fixes upfront.
There are only a small number of RASP solutions available today,
mostly limited to applications that run in the Java JVM and .NET
CLR, although support for other languages like Node.js, Python, and
Ruby is emerging. These tools include the following:
Immunio
Waratek
Prevoty
62
Contrast Security
Contrast is an Interactive Automated Software Testing (IAST) and
RASP solution that directly instruments running code and uses
control flow and data flow analysis and lexical analysis to trace and
catch security problems at the point of execution. In IAST mode,
Contrast can run on a developers workstation or in a test environ
ment or in Continuous Integration/Continuous Delivery to alert if
a security problem like SQL injection or XSS is found during func
tional testing, all while adding minimal overhead. You can automat
ically find security problems simply by executing the code; the
more thorough your testing, and the more code paths that you
cover, the more chances that you have to find vulnerabilities. And
because these problems are found as the code is executing, the
chances of false positives are much lower than running static analy
sis.
Contrast deduplicates findings and notifies you of security bugs
through different interfaces such as email or Slack or HipChat, or
by recording a bug report in Jira. In RASP mode, Contrast runs in
production to trace and catch the same kinds of security problems
and then alerts operations or automatically blocks the attacks.
It works in Java, .NET (C# and Visual Basic), Node.js, and a range
of runtime environments.
Security in Production
63
detail.cfm?id=2371297
8 ACM: Fault Injection in Production, Making the case for resilience testing. http://
queue.acm.org/detail.cfm?id=2353017
Security in Production
65
The Blue Team is made up of the people who are running, support
ing, and monitoring the system. Their responsibility is to identify
when an attack is in progress, understand the attack, and come up
with ways to contain it. Their success is measured by the Mean Time
to Detect the attack and their ability to work together to come up
with a meaningful response.
Here are the goals of these exercises:
Identify gaps in testing and in design and implementation by
hacking your own systems to find real, exploitable vulnerabili
ties.
Exercise your incident response and investigation capabilities,
identify gaps or weaknesses in monitoring and logging, in play
books, and escalation procedures and training.
Build connections between the security team and development
and operations by focusing on the shared goal of making the
system more secure.
After a Game Day or Red Team exercise, just like after a real pro
duction outage or a security breach, the team needs to get together
to understand what happened and learn how to get better. They do
this in Blameless Postmortem reviews. Here, everyone meets in an
open environment to go over the facts of the event: what happened,
when it happened, how people reacted, and then what happened
next. By focusing calmly and objectively on understanding the facts
and on the problems that came up, the team can learn more about
the system and about themselves and how they work, and they can
begin to understand what went wrong, ask why things went wrong,
and look for ways to improve, either in the way that the system is
designed, or how it is tested, or in how it is deployed, or how it is
run.
To be successful, you need to create an environment in which people
feel safe to share information, be honest and truthful and transpar
ent, and to think critically without being criticized or blamedwhat
Etsy calls a Just Culture. This requires buy-in from management
down, understanding and accepting that accidents can and will hap
pen, and that they offer an important learning opportunity. When
done properly, Blameless Postmortems not only help you to learn
from failures and understand and resolve important problems, but
66
they can also bring people together and reinforce openness and
trust, making the organization stronger.9
Security at Netflix
Netflix is another of the DevOps unicorns. Like Etsy, Amazon, and
Facebook, it has built its success through a culture based on Free
dom and Responsibility (employees, including engineers, are free
to do what they think is the right thing, but they are also responsible
for the outcome) and a massive commitment to automation, includ
ing in securityespecially in security.
After experiencing serious problems running its own IT infrastruc
ture, Netflix made the decision to move its online business to the
cloud. It continues to be one of the largest users of Amazons AWS
platform.
Netflixs approach to IT operations is sometimes called NoOps
because they dont have operations engineers or system admins.
They have effectively outsourced that part of their operations to
Amazon AWS because they believe that data center management
and infrastructure operations is undifferentiated heavy lifting. Or,
put another way, work that is hard to do right but that does not add
direct value to their business.
Here are the four main pillars of Netflixs security program:10
Undifferentiated heavy lifting and shared responsibility
Netflix relies heavily on the capabilities of AWS and builds on or
extends these capabilities as necessary to provide additional
security and reliability features. It relies on its cloud provider for
automated provisioning, platform vulnerability management,
data storage and backups, and physical data center protections.
Netflix built its own PaaS layer on top of this, including an
extensive set of security checks and analytic and monitoring
services. Netflix also bakes secure defaults into its base infra
structure images, which are used to configure each instance.
blameless-postmortems/
10 See Splitting the Check on Compliance and Security: Keeping Developers and Audi
tors Happy in the Cloud. Jason Chan, Netflix, AWS re:Invent, October 2015. https://
www.youtube.com/watch?v=Io00_K4v12Y
Security in Production
67
Traceability in development
Source control, code reviews through Git pull requests, and the
Continuous Integration and Continuous Delivery pipeline pro
vide a complete trace of all changes from check-in to deploy
ment. Netflix uses the same tools to track information for its
own support purposes as well as for auditors instead of wasting
time creating audit trails just for compliance purposes. Engi
neers and auditors both need to know who made what changes
when, how the changes were tested, when they were deployed,
and what happened next. This provides visibility and traceabil
ity for support and continuous validation of compliance.
Continuous security visibility
Recognize that the environment is continuously changing and
use automated tools to identify and understand security risks
and to watch for and catch problems. Netflix has written a set of
its own tools to do this, including Security Monkey, Conformity
Monkey, and Penguin Shortbread (which automatically identi
fies microservices and continuously assesses the risk of each ser
vice based on runtime dependencies).
Compartmentalization
Take advantage of cloud account segregation, data tokenization,
and microservices to minimize the systems attack surface and
contain attacks, and implement least privilege access policies.
Recognizing that engineers will generally ask for more privi
leges than they need just in case, Netflix has created an auto
mated tool called Repoman, which uses AWS Cloudtrail activity
history and reduces account privileges to what is actually
needed based on what each account has done over a period of
time. Compartmentalization and building up bulkheads also
contains the blast radius of a failure, reducing the impact on
operations when something goes wrong.
Whether you are working in the cloud or following DevOps in your
own data center, these principles are all critical to building and oper
ating a secure and reliable system.
68
CHAPTER 5
Compliance as Code
Chef Compliance
Chef Compliance is a tool from Chef that scans infrastructure and
reports on compliance issues, security risks, and outdated software.
It provides a centrally managed way to continuously and automati
cally check and enforce security and compliance policies.
Compliance profiles are defined in code to validate that systems are
configured correctly, using InSpec, an open source testing frame
work for specifying compliance, security, and policy requirements.
You can use InSpec to write high-level, documented tests/assertions
to check things such as password complexity rules, database config
uration, whether packages are installed, and so on. Chef Compli
ance comes with a set of predefined profiles for Linux and
Windows environments as well as common packages like Apache,
MySQL, and Postgres.
When variances are detected, they are reported to a central dash
board and can be automatically remediated using Chef.
69
1 http://itrevolution.com/devops-and-auditors-the-devops-audit-defense-toolkit/
70
71
72
73
when the change was tested, to when it was deployed. Except for the
discipline of setting up a ticket for every change and tagging
changes with a ticket number, compliance becomes automatic and
seamless to the people who are doing the work.
Just as beauty is in the eye of the beholder, compliance is in the
opinion of the auditor. Auditors might not understand or agree with
this approach at first. You will need to walk them through it and
prove that the controls work. But that shouldnt be too difficult, as
Dave Farley, one of the original authors of Continuous Delivery
explains:
I have had experience in several finance firms converting to Con
tinuous Delivery. The regulators are often wary at first, because
Continuous Delivery is outside of their experience, but once they
understand it, they are extremely enthusiastic. So regulation is not
really a barrier, though it helps to have someone that understands
the theory and practice of Continuous Delivery to explain it to
them at first.
If you look at the implementation of a deployment pipeline, a core
idea in Continuous Delivery, it is hard to imagine how you could
implement such a thing without great traceability. With very little
additional effort the deployment pipeline provides a mechanism for
a perfect audit trail. The deployment pipeline is the route to pro
duction. It is an automated channel through which all changes are
released. This means that we can automate the enforcement of
compliance regulationsNo release if a test fails, No release if a
trading algorithm wasnt tested, No release without sign-off by an
authorised individual, and so on. Further, you can build in mecha
nisms that audit each step, and any variations. Once regulators see
this, they rarely wish to return to the bad old days of paper-based
processes.2
74
CHAPTER 6
76
77