Netflix OSS
Netflix OSS
Netflix OSS
April 2013
Adrian Cockcroft
@adrianco #netflixcloud @NetflixOSS
http://www.linkedin.com/in/adriancockcroft
Cloud Native
Perfect code
Perfect hardware
Perfectly operated
But perfection takes too long…
So we compromise
Time to market vs. Quality
Utopia remains out of reach
Where time to market wins big
Web services
Agile infrastructure - cloud
Continuous deployment
How Soon?
Utopia Dystopia
A new engineering challenge
CDN
Management and
Steering
OpenConnect
CDN Boxes
Content Encoding
Content Delivery Service
Open Source Hardware Design + FreeBSD, bird, nginx
November 2012 Traffic
Real Web Server Dependencies Flow
(Netflix Home page business transaction as seen by AppDynamics)
Each icon is
three to a few
hundred
instances
across three Cassandra
AWS zones
memcached
Web service
Start Here
S3 bucket
Clients Things
Cloudy
App Servers
Buffer
Datacenter
MySQL Legacy Apps
Dinosaurs
New Anti-Fragile Patterns
Micro-services
Chaos engines
Highly available systems composed
from ephemeral components
Stateless Micro-Service Architecture
Monitoring
appagent
monitoring
Tomcat
Log rotation Application war file, base Healthcheck, status
to S3 GC and thread servlet, platform, client servlets, JMX interface,
AppDynamics dump logging interface jars, Astyanax Servo autoscale
machineagent
Epic/Atlas
Cassandra Instance Architecture
Tomcat and
Priam on JDK Java (JDK 7)
Healthcheck,
Status
AppDynamics
appagent
monitoring
Cassandra Server
Monitoring
AppDynamics Local Ephemeral Disk Space – 2TB of SSD or 1.6TB disk
GC and thread holding Commit log and SSTables
machineagent dump logging
Epic/Atlas
Cloud Native
Autoscale Up
Autoscale Down
Leveraging Public Scale
Grey
Public Private
Area
Is it running yet?
How many places is it running in?
How far apart are those places?
Antifragile API Patterns
Functional Reactive with Circuit Breakers and Bulkheads
Outages
• Running very fast with scissors
– Mostly self inflicted – bugs, mistakes
– Some caused by AWS bugs and mistakes
AWS DynECT
Route53 Denominator
UltraDNS DNS
Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas
Eureka
Services
metadata
AWS
AppDynamics
Instances,
Request flow
ASGs, etc.
Edda Monkeys
Edda Query Examples
Find any instances that have ever had a specific public IP address
$ curl "http://edda/api/v2/view/instances;publicIpAddress=1.2.3.4;_since=0"
["i-0123456789","i-012345678a","i-012345678b”]
Goals
Cloudbees
Dynaslave
Jenkins AWS
AWS Build
Aminator Baked AMIs
Slaves
Bakery
AWS Account
Asgard Console
Archaius Config
Multiple AWS Regions
Service
Cross region
Priam C* Eureka Registry
Explorers
Dashboards
Exhibitor ZK
3 AWS Zones
Application
Priam Evcache
Atlas Edda History Clusters
Cassandra Memcached
Monitoring Autoscale Groups
Persistent Storage Ephemeral Storage
Instances
Simian Army
Genie Hadoop
Services
NetflixOSS Instance Libraries
• Security Monkey
Security • Conformity Monkey
Example Application – RSS Reader
What’s Coming Next?
Better portability
Higher availability
More
Features Easier to deploy
Eucalyptus 3.3
Netflix Cloud Prize
Nominating Committee
Panel of Judges
Judges
Aino Corry
Martin Fowler
Program Chair for Qcon/GOTO Simon Wardley Chief Scientist Thoughtworks
Strategist
Twitter #cloudprize
Award
Registration Apache
Close Entries AWS Ceremony
Github Opened Github Licensed Github September 15 Dinner
March 13 Contributions Re:Invent
November
Judges Winners
$10K cash
$5K AWS
Netflix
Nominations Categories
Ten Prize Engineering
Categories
AWS
Trophy Re:Invent Conforms to Working Community
Tickets Entrants Rules Code Traction
Functionality and scale now, portability coming
http://netflix.github.com
http://techblog.netflix.com
http://slideshare.net/Netflix
http://www.linkedin.com/in/adriancockcroft