Microsoft Azure Security Response in The Cloud
Microsoft Azure Security Response in The Cloud
Microsoft Azure Security Response in The Cloud
Abstract
Acknowledgments
Authors
Ben Ridgway
Frank Simorjay
(c) 2016 Microsoft Corporation. All rights reserved. This document is provided "as-is." Information and views expressed in
this document, including URL and other Internet Web site references, may change without notice. You bear the risk of using
it. Some examples are for illustration only and are fictitious. No real association is intended or inferred.
This document does not provide you with any legal rights to any intellectual property in any Microsoft product. You may
copy and use this document for your internal, reference purposes.
P A G E | 02
Microsoft Azure Security Incident Management
Table of Contents
ABSTRACT ....................................................................................................................................................................................2
ACKNOWLEDGMENTS ............................................................................................................................................................2
1 INTRODUCTION ...............................................................................................................................................................4
CONCLUSION ........................................................................................................................................................................... 14
P A G E | 03
Microsoft Azure Security Incident Management
1 Introduction
Events such as natural disasters, hardware failures, or service outages are all considered high impact
issues, but only a limited number of these issues are considered to be security incidents. Microsoft defines
a security incident in the Online Services as illegal or unauthorized access that results in the loss,
disclosure or alteration of Customer Data.
This white paper examines how Microsoft investigates, manages, and responds to security incidents within
Azure. Other service impacting issues that are not security incidents are addressed by a separate response
plan (or business continuity plan), and will not be discussed in this paper.
Routine response to security vulnerabilities that has Unauthorized access to Azure infrastructure systems
not resulted in inappropriate disclosure of customer
and exfiltration of customer data
data
Unauthorized disclosure of sensitive control data,
A security issue that affects Azure but has not
such as credentials, encryption keys, or API keys,
resulted in inappropriate disclosure of customer data
which could be used to alter or access customer data
Investigation of internal alarms or monitoring alerts
Physical intrusion into a datacenter hosting Azure
which are shown to be false positives
properties which results in theft of unencrypted
Operations by Azures own Red Team activity
customer data
Security issues within a customer deployment caused
Bug in Azure code which has resulted in malicious
by a flaw or weakness introduced by the customer
alteration or exposure of customer data
(failure to patch, brute force, configuration error)
Intrusion into a customer deployment caused by a
Denial-of-service attack (DoS) against Azure
flaw or weakness introduced by the Azure
infrastructure or customers
Infrastructure
Compliance events that do not affect confidentiality,
integrity, or availability of service or customer data
This whitepaper is a distillation of the salient points from Microsofts Security Incident Management
procedures for Azure. It provides you with the highlights of how the Azure Security Response team
operates during the investigation and response to security incidents.
The goal of security incident management in to identify and remediate threats quickly, investigating
thoroughly, and notifying affected parties. Microsofts process is constantly evolving by tuning out false-
positives, automating responses, and contains a framework for evaluating the effectiveness of the
program.
Security incident management is an essential part of an effective risk management strategy and critical for
compliance efforts. Having a clearly documented processes is crucial because it allows the business to
plan ahead for the worst, rather than figuring it out in the heat of the moment. Security incident
management is called out by multiple risk and compliance frameworks; for example, ISO/IEC 27035:2011
addresses security incident management.
P A G E | 04
Microsoft Azure Security Incident Management
A holistic security incident response plan enumerates the steps, owners, and timelines for assessing and
remediating threats using a repeatable and standardized operating procedure (SOP). Such a procedure
ensures that security staff follow a process consistently through manual or automated steps. A security
incident response plan is a living document, and it works in concert with other information security
management guidelines and standard operating procedures.
The security incident response SOP is designed to be clear and auditable. Responsibilities should map
appropriately to roles; individuals who fulfill those roles should have the experience, training, and
authority to carry out tasks designated in the plan.
In a traditional on-premises datacenter, the organization that owns the data center is responsible for
managing security incidents end-to-end, including the mitigation and remediation of any security
incident.
Conversely, if the same company is using an IaaS offering such as Azure Virtual Machines, security of the
physical hosts is the responsibility of Microsoft. The customer tenant can expect to be notified if they
were affected by a security incident within the infrastructure hosting that VM. Happenings within the
confines of the IaaS VM are outside the service providers scope, and thus would be a customer
responsibility.
Microsoft Azure does not monitor for or respond to security incidents within the customers area of
responsibility. We do provide many tools (such as Azure Security Center) which are used for this purpose.
There is also an effort to help make every service as secure as possible by default. That is, it comes with a
baseline which is already designed to provide security for most common use cases. This is not a
guarantee, however, because there is no way to predict how a service will be used. One must review these
security controls to evaluate whether they adequately mitigate risks.
As such, not all security incidents that occur in a cloud environment necessarily involve Microsoft Azure
services. A customer-only security compromise would not be processed as an Azure security incident.
A customer-only security compromise would require the customer tenant to manage the compromise
response effort and potentially working with Microsoft customer support (with appropriate service
contracts).
P A G E | 05
Microsoft Azure Security Incident Management
These teams work closely with one another when incidents transcend these product boundaries. The
Microsoft Cyber Defense Operations Center is a single location which houses responders from all over the
company. It is a way that we can run coordinated incident response as a unified One Microsoft.
This is important to the scope of this document. For instance, Office 365 and Dynamics CRM have their
own dedicated security response team. They are some of Azures closest partners and are often involved
in joint investigations with Azure. The specifies of response outside Microsoft Azure are outside the scope
of this document. However, similar processes are often implemented when technology warrants across all
Microsoft products and services.
P A G E | 06
Microsoft Azure Security Incident Management
Role Responsibilities
Communications Develops communication content with input from the security incident manager
Manager and other experts
Provide notification updates to customers
Provide updates to Microsoft customers and support and service organizations
Service Team Work with incident response team and incident manager to diagnose and
Experts remediate the identified issue
Provides expertise about the operation of their own service should that service be
impacted
P A G E | 07
Microsoft Azure Security Incident Management
Stage Description
2 Assess An on-call incident response team member assesses the impact and severity of the
event. Based on evidence, the assessment may or may not result in further
escalation to the security response team.
3 Diagnose Security response experts conduct the technical or forensic investigation, identify
containment, mitigation, and workaround strategies.
If the security team believes that customer data may have become exposed to an
unlawful or unauthorized individual, parallel execution of the Customer Incident
Notification process begins in parallel.
4 Stabilize, The incident response team creates a recovery plan to mitigate the issue. Crisis
Recover containment steps such as quarantining impacted systems may occur immediately
and in parallel with diagnosis. Longer term mitigations may be planned which
occur after the immediate risk has passed.
P A G E | 08
Microsoft Azure Security Incident Management
5 Close/ Post The incident response team creates a post-mortem that outlines the details of the
Mortem incident, with the intention to revise policies, procedures, and processes to prevent
a reoccurrence of the event.
Table 2. Incident response stages
The detection processes used by Azure are designed to discover events that risks the confidentiality,
integrity, and availability of Azure services. Several events can trigger an investigation, such as:
Customer reports via the Customer Support Portal that describe suspicious activity attributed to
the Azure infrastructure (as opposed to activity occurring within the customers scope of
responsibility)
Security vulnerabilities are reported to the Microsoft Security Response Center via
[email protected]. MSRC works with partners and security researchers around the world to
help prevent security incidents and to advance Microsoft product security.
Security Blue and Red teams activity. This strategy uses a highly skilled Red team of experts to
attack potential weaknesses in Azure and the security response (Blue team) to uncover the Red
teams activity. Both Red and Blue team actions are treated as a means to verify that Azure
security response efforts are managing security incidents. Security Red team and Blue team
activities are operated under requirements of responsibility to help ensure the protection of
Customer Data.
Detections of suspicious activities by internal monitoring and diagnostic systems within the Azure
service. These alerts could come in the way of signature-based alarms such as antimalware,
intrusion detection or via algorithms designed to profile expected activity and alert upon
anomalies.
Escalations for operators of Azure Services. Microsoft employees are trained to identify and
escalate potential security issues.
P A G E | 09
Microsoft Azure Security Incident Management
been identified at risk in the investigation. The Security Incident Manager role is often assumed by
the Security Response Teams on-call team member. However, sometimes a more senior
manager will be pulled in to assume this role.
Throughout this process, the Security Incident Manager is ultimately responsible for managing and
tracking the investigative process. The Security Incident Manager will ensure that alerts, events, and
forensic data generated from multiple sources are investigated and cataloged. They are also responsible
for communicating with partner teams to continue the triage process, including the engineering and
operations teams to determine whether a given event may affect customers and/or production
environments.
Continue to troubleshoot the incident with the help of service teams and additional security
personnel.
Ensure that artifacts are stored in a forensically sound manner.
Document the investigation with as much technical detail as possible.
Determine whether customer data is impacted, how, and belonging to which customers.
The information gathered in this stage will be used as the basis of the stabilization and recovery effort
(stage 4) if necessary.
This phase may involve forensic examinations of impacted systems. Because investigating forensic images
can be sensitive, the ability to do so is tightly controlled and audited. The security response team works
closely with global legal advisors to help ensure that forensics are done in accordance with legal
obligations and commitments to our customers.
If at any time the investigation is determines that unauthorized or unlawful access resulted in the loss,
disclosure, or alteration of any Customer Data, the Security Incident Manager will immediately begin
executing on the Customer Incident Notification Process. In the course of the investigation, Microsoft
may also determine that other compliance and security risks exist, but do not result in the unauthorized or
unlawful access of customer data. In those cases, the security incident manager will continue driving
these issues to closure even though the customer incident notification process is not necessarily
triggered.
P A G E | 010
Microsoft Azure Security Incident Management
If necessary, take emergency mitigation steps to resolve immediate security risks associated with
the event.
Verify that customer and business risk has been successfully contained, and that corrective
measures are being implemented.
Identify additional mitigation and corrective measures and long-term solutions if needed.
Mitigation action
During the Diagnose and Stabilize stages, it may be possible that the response team identifies an
emergency mitigation or containment step to minimize the impact of an event. The executive incident
manager, service owner, and security incident manager may jointly choose to take immediate emergency
mitigation steps when needed. For instance, it is possible that these actions may result in a temporary
outage. Such decisions are not taken lightly. When such an aggressive mitigation occurs, the standard
processes for notifying customers of outages and recovery timelines would apply.
Technical or communications lapses, procedural failures, manual errors, process flaws that might
have caused the security incident or that were identified with a post mortem are identified.
Identified technical lapses that are captured and can be followed up on with engineering teams.
Response procedures that are evaluated for sufficiency and completeness of operating
procedures.
Updates that may be necessary to the Security Incident Response SOP or any related security
response processes.
Internal postmortems for security events are highly confidential records which are not available to
customers. They may, however, be summarized and included in other customer event notifications. These
reports are provided to external auditors for review as part of Azures routine audit cycle.
The incident manager is accountable for drafting the post mortem report and maintaining an inventory of
all repair items, their owners, and completion dates.
The goal of the customer security incident notification process is to provide impacted customers with
accurate, actionable, and timely notice when their customer data has been breached. Such notices may
also be required to meet specific legal requirements.
P A G E | 011
Microsoft Azure Security Incident Management
Generally, the process of drafting notifications occurs as the incident investigation is ongoing. The
security response team will move quickly and accurately. Additional experts in security communications
and legal are often brought in to assist with this process.
If the designated executive is satisfied that unauthorized access or a security incident has occurred, an
incident declaration will occur. This declaration triggers the process of sending official notifications.
Notification of security incidents will be delivered to the listed security contacts provided in Azure Security
center, which can be configured by following the implementation guidelines. Additionally, if contact
information is not provided in Security Center, notification will be sent to one or more of a customers
administrators. Notification will be sent by any means Microsoft selects, including via email. Email is
considered the most desirable approach for most issues. It provides the security response team great
bandwidth to notify a lot of customers quickly.
To ensure that notification can be successfully delivered, the customer is responsible for ensuring that the
security contact, and administrative contact information on each of their subscription(s) and online
P A G E | 012
Microsoft Azure Security Incident Management
services portal(s) is correct. Emails may also be distributed to the subscription co-administrators of the
impacted subscription(s).
Microsoft security response personnel receive specialized training for their roles. There are numerous
training curriculums offered commercially to prepare security response and forensics personnel for their
duties. The MSRC Azure Security Response team uses an additional apprenticeship period to train new
Security Incident Managers. It is only after a long period of working with a senior member of the team
that a new member in considered ready to run a security event.
Multiple times throughout the year, Azure performs tests of the incident response capability. Some of
these occur in response to operations by the Azure Red Team. The Azure Red Team is continuously
testing the security posture of the Azure Infrastructure. When detected, the Security Response team (also
known in this case as the Blue Team) acts as if the adversary was real in all ways. The only difference
between the Red Team and an outside adversary is that the Red Team is prohibited from actually
accessing customer data, such that the Red Team should not create an actual security incident.
In addition, Microsoft periodically conducts cross-company Red Team versus Blue Team exercises. These
exercises extend across multiple Microsoft services. They ensure that all security responders are able to
act as one cohesive unit when an incident transcends one product line.
The security response team devotes significant resources to preparing for incidents before they occur.
Besides the aforementioned exercises, Azure security response has a significant development team who
creates tooling and procedures in response to prior or anticipated incidents. Tabletop exercises also
occur particularly with new features to help them think through scenarios during their design and
development phases. That way they are prepared, should the worst happen.
P A G E | 013
Microsoft Azure Security Incident Management
Conclusion
The security incident management program is a critical responsibility for Microsoft, and represents an
investment that customers using Microsoft Online Services can count on. The five-stage process
presented in this white paper is a process that has evolved over many years and continues to do so
and involves a team of dedicated experts with skill and dedication to protecting Microsoft customers.
P A G E | 014