AWS Playbook-V2
AWS Playbook-V2
AWS Playbook-V2
Version:02
March 2021
AWS Fundamentals 10
Hands on Setup 24
AWS Certifications 40
Assignments 43
Case Studies 48
Cloud On Premise
On Premise
As your resources move from on
VS -premises to off-premises,
your costs are reduced, and your
Cloud administration requirements
decrease ( as depicted in the picture
on the right)
This cloud service model is a managed hosting environment. The cloud provider manages the virtual machines and
networking resources, and the cloud tenant deploys their applications into the managed hosting environment. For
PAAS example, Azure App Services provides a managed hosting environment where developers can upload their web
applications without having to deal with the physical hardware and software requirements.
In this cloud service model, the cloud provider manages all aspects of the application environment, such as virtual
machines, networking resources, data storage, and applications. The cloud tenant only needs to provide their data to
the application managed by the cloud provider. For example, Office 365 provides a fully working version of Office that
SAAS runs in the cloud. All that you need to do is create your content, and Office 365 takes care of everything else.
Private cloud solutions are dedicated to one organization or business, and often have much more specific security controls
than a public cloud. Many medical offices, banking institutions etc. use a private cloud. Using private cloud storage allows
Private cloud them to control highly sensitive data like medical records, trade secrets, or other classified information. Private cloud solutions
utilize infrastructure that is either owned and controlled by the organization, or they are able to contractually require those
specific criteria be met by a vendor who manages the infrastructure.
This computing environment combines a public cloud and a private cloud by allowing data and applications to be shared
Hybrid cloud between them. An example of a hybrid cloud solution is an organization that wants to keep confidential information secured
on their private cloud, but make more general, customer-facing content on a public cloud.
Examples
Public Cloud Private Cloud
Public Cloud
• HP Data
vs • AWS Centres
Hybrid
Private Cloud • Azure • Telestra
• GCP Cloud
Cloud
• Ubuntu
AWS provides a wide range of machine types, CPUs, Azure provides a wide range of machine types, CPUs, There is a narrow range of standard machine types,
serverless, containers and event driven compute serverless, containers and event driven compute CPUs and Memory amounts supported, with on-
Compute options. AWS is the only provider to offer SKUs across options. Azure offers a bare metal instance only for use demand, reserved and transient instance pricing. GCP
a select set of global regions and native support for with SAP HANA and capability to run VMWare workloads doesn't offer a bare metal instance and capabilities to
VMWare workloads using a 3rd party solution by CloudSimple natively run VMWare workloads
There is a wide range of data storage options for There is a wide range of data storage options for object, There is a similar range of offerings for storage types
object, block, file and blob storage as well as a hybrid block, file and blob storage as well as a hybrid storage compared to other providers. However, network
Storage storage gateway. The network storage option is gateway. There is an offering of SMB based network storage is only available as NFS and there are no hybrid
available in NFS and SMB formats storage storage and cold storage options
AWS is a good option for data processing such as batch Azure’s data platforms score better than AWS and GCP GCP data platforms do not score well when compared
and streaming, data migration services and in-memory and fits well with user requirements. Azure data stores to AWS and Azure as they don’t scale well and are not
Data computing. However, AWS data stores are relatively also can scale better and can handle more concurrent optimal for indexing, scanning and handling concurrent
expensive and do not scale well when compared to queries queries
Azure
AWS provides a wide variety of native advanced Azure provides an optimal mix of standard advanced GCP provides a rich set of capabilities for predictive
analytics services such ML Workbench, Image analytics capabilities that are tightly integrated across analytics and deep learning especially using readily
Analytics & Data Processing, healthcare focused NLP toolset, speech the whole ecosystem. However, Azure currently doesn’t available open-source frameworks. Alphabet’s
Science and chatbots. However, AWS services are not optimal provide capabilities in healthcare centric AI tools subsidiaries continuously build and evolve strategic
standard reporting and BI solutions healthcare features which makes GCP a robust provider
for specific AI/ML use cases
AWS is a strong contender in the networking and Azure has the most optimal combination of networking, GCP services score less compared to Azure and AWS as
Platform Ops categories but is not optimal for IAM, security, dev, data platform Ops capabilities they do not offer a native dev environment
Infrastructure & Logging and Billing Management. and has no native management capability as well has no current solutions
Ops Disaster Recovery capability for backups and disaster recovery
• Extensive Mature Offerings : AWS has a huge and growing array of available • Cost Management : many enterprises find it difficult to manage those costs effectively
services, as well as the most comprehensive network of worldwide data centers when running a high volume of workloads on the service
• Support for large Firms : It has the deepest capabilities for governing a large • Overwhelming options
number of users and resources
• Second Largest Provider • Less Enterprise Ready : clients report that the service experience feels less enterprise-
• Integration with Microsoft Tools & Software : enterprises that use a lot of ready than they expected, given Microsoft's long history as an enterprise vendor
Microsoft software often find that it also makes sense for them to use Azure.
• Broad Feature Set :Rich set of API and developer tools • Incomplete Management tooling : Azure doesn't offer as much support for DevOps
• Hybrid Cloud : uses a mix of on-premises, private cloud and third-party, public approaches as some of the other cloud platforms
cloud services with orchestration between the two platforms
• Support for open source
• Commitment to open source and portability: GCP specializes in high compute • Fewer features and services : it doesn't offer as many different services and features as
offerings like Big Data, analytics and machine learning AWS and Azure
• DevOps Expertise
Amazon Web Services (AWS) is one of the world’s most wide-ranging and
largely implemented, comprehensive and easy to use computing cloud
platform offered by Amazon. The platform is developed with a
combination of infrastructure as a service (IaaS), platform as a service
What is (PaaS) and packaged software as a service (SaaS) offerings.
AWS
AWS offers upto 200 fully featured services from data centers
globally. AWS products include services like security, analytics,
development tools, databases, storage, networking, migration, and
enterprise applications.
Reference Tutorials
• https://www.youtube.com/watch?v=a9__D53WsUs
• https://www.youtube.com/watch?v=wWeyzYzd17o
• https://aws.amazon.com/what-is-aws/
Analytical Processing
Stream Ingestion Streaming Analytics
Other RDBMS Portals
Batch/Micro-Batch Processing (AWS EMR, Data Bricks) Interactive (Kibana)
AWS CLI S3
Querying —AWS
Data Athena (Real-time dashboards Mobile
Streams/Sync and transactional
Other Data Sources
Messaging—AWS applications)
Kinesis Enterprise Search
AWS S3 Transfer Stream Processing (AWS IOT Processing (AWS IoT Real-time Search—
IOT Data Acceleration (Confluent Kafka/ MR, Kinesis) Analytics)
AWS Elastic Search Analytics/ML
Model Repository Enterprise
NiFi)
Applications
Unstructured Data
Geospatial Data
Developer and Management Tools
External Data
AWS Identity & AWS Key AWS AWS AWS
AWS Directory Code Repository AWS Code Deploy
Live Streams Access Management AWS CloudTrail CloudWatch Management CloudFormation
Service (Git, Bit Bucket) (Jenkins/Circle CI)
Management Service (Data Dog) Console (Ansible)
Enterprise Content Data Quality Metadata Management Data Security & Master and Reference Business Rules Audit, Balance and Data Catalog and
Management Management Privacy Data Management Management Control Discovery
2020
Data Ingestion
Data Ingestion Data
Data Storage
Storage
&
& Integration
Integration
Data
Data Movement
Movement
Databases
Databases Interactive Querying
Querying
Data Processing
Processing Interactive
&
& Data Management
Data Management Data & analytics
& Compute & analytics
& Compute
• Connects an on-premises software appliance with cloud-based storage to provide seamless integration with data security
features between your on-premises IT environment and the AWS storage infrastructure
• Provides service to store data in the AWS Cloud for scalable and cost-effective storage that helps maintain data security.
Data
Movement AWS Data Migration Service
• Helps to migrate databases to AWS quickly and securely.
• The source database remains fully operational during the migration, minimizing downtime to applications
that rely on the database.
• Supports homogeneous migrations such as Oracle to Oracle, as well as heterogeneous migrations between
different database platforms.
• Reduces the variability in Internet routing, congestion and speeds that can affect transfers, and logically shortens the
distance to S3 for remote applications.
Data • Connects an on-premises software appliance with cloud-based storage to provide seamless integration with data security
features between your on-premises IT environment and the AWS storage infrastructure
• Provides service to store data in the AWS Cloud for scalable and cost-effective storage that helps maintain data security.
Ingestion &
Integration AWS Kinesis
• A massively scalable and durable real time data streaming service like website clickstreams,
database stream events, financial transactions, social media feed etc.
• Gigabytes of data can be captured in seconds and collected data can be available in milliseconds to
enable real-time analytic use cases
• Reduces the variability in Internet routing, congestion and speeds that can affect transfers, and logically shortens the
distance to S3 for remote applications.
• Provides easy-to-use management features so you can organize your data and configure finely-
tuned access controls to meet your specific business, organizational, and compliance requirements.
• Designed for 99.999999999% (11 9's) of durability, and stores data for millions of applications for
AWS
companies all around the world.
Storage
AWS Glacier
• A secure, durable, and extremely low-cost Amazon S3 cloud storage classes for data archiving
and long-term backup.
• Users create archives and vaults for storage
• An archive can be any data such as a photo, video, or document and is a base unit of storage in
S3 Glacier.
• Designed to deliver 99.999999999% durability and provide comprehensive security and
compliance capabilities that can help meet even the most stringent regulatory requirements.
• Builds, monitor, and troubleshoot your applications using the tools you love, at the scale you need.
• Provides support for open source Elasticsearch APIs, managed Kibana, integration with Logstash and other AWS
services, and built-in alerting and SQL querying. Amazon
AWS Redshift
• An interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. No need for complex ETL jobs
to prepare your data for analysis
• Serverless, so there is no infrastructure to manage, and you pay only for the queries that you run.
• Points to your data in Amazon S3, define the schema, and start querying using standard SQL. Most results are delivered within
Databases seconds.
Management • It helps in creating and publishing interactive BI dashboards which include Machine learning powered
insights.
AWS Dynamo DB
• An open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes
ranging from gigabytes to petabytes.
AWS ElasticCache
• An open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes
ranging from gigabytes to petabytes.
Deloitte Touche Tohmatsu India LLP L&D | AWS PLAYBOOK
EMR – Elastic Map Reduce
• A managed cluster platform that simplifies running big data frameworks, such as Apache
Hadoop and Apache Spark on AWS to process and analyze vast amounts of data.
• Using these frameworks and related open-source projects, such as Apache Hive and Apache
Pig, we can process data for analytics purposes and business intelligence workloads.
Data • Used to transform and move large amounts of data in and out of other AWS data stores and
databases.
Processing
and
Compute AWS Lambda
• Serverless compute service that lets you run code without provisioning or managing
servers, creating workload-aware cluster scaling logic, maintaining event integrations, or
managing runtimes.
• Can run code for virtually any type of application or backend service - all with zero
administration.
• Builds, monitor, and troubleshoot your applications using the tools you love, at the scale you need.
• Provides support for open source Elasticsearch APIs, managed Kibana, integration with Logstash and other AWS
services, and built-in alerting and SQL querying. Amazon
AWS Athena
• An interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. No need for complex ETL jobs
to prepare your data for analysis
• Serverless, so there is no infrastructure to manage, and you pay only for the queries that you run.
• Points to your data in Amazon S3, define the schema, and start querying using standard SQL. Most results are delivered within
Interactive seconds.
analytics • It helps in creating and publishing interactive BI dashboards which include Machine learning powered
insights.
Presto on EMR
• An open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes
ranging from gigabytes to petabytes.
AWS Comprehend
A continuously-trained Natural Language Processing (NLP) service that uses machine learning to find
insights and relationships across unstructured text like customer reviews and news articles.
AWS Lex
A service for building conversational interfaces into any application using voice and text, enabling
developers to bring sophisticated, natural language chatbots to applications.
Advanced AWS Machine Learning
A managed service for building ML models and generating predictions that enable the development of
analytics, ML, robust, scalable smart applications.
AWS Greengrass
Amazon’s IOT service that lets devices process the data they generate locally, while still taking
advantage of AWS services when an internet connection is available.
Tensorflow on AWS
Deep learning framework in AWS. Popular choice for deep learning research and application development,
particularly in areas such as computer vision, natural language understanding and speech translation.
A managed service to create and control the encryption keys used to encrypt data and uses FIPS
140-2 validated hardware security modules to protect the security of the keys.
AWS CloudFormation
A Service that allows to quickly and easily model and provision infrastructure resources and
applications in an automated and secure manner on AWS.
management A service that automates software deployments to a variety of compute services including Amazon
EC2, AWS Lambda, and instances running on-premises.
AWS CloudTrail
A service that enables governance, compliance, operational auditing—log, continuously monitor,
and retain account activity related to your AWS infrastructure.
AWS CloudWatch
A monitoring service for AWS cloud resources and the applications that runs on AWS. CloudWatch can be
used to collect and track metrics, collect and monitor log files, set alarms, and automatically react to
changes in your AWS resources.
AWS Directory Service
Directory Service for Microsoft AD, enables your directory-aware workloads and AWS resources to
Deloitte Touche Tohmatsu India LLP
use managed Active Directory in the AWS Cloud. L&D | AWS PLAYBOOK
Hands On Setup
Set up using AWS Free Tier
Setting up of Free-Tier Account on Amazon Web AWS Free Tier Offerings
Services (AWS)
• Log into aws-sign-up link to navigate into AWS Free Tier AWS provides three types of free offers depending upon the
home page. product used. Below are the details.
• Navigate to Section “Create a Free Account” displayed at
the home page. • Always Free – Offers do not expire and are available to
• Page 1: Enter the Organizational email and choose a all AWS customers. Please follow the link to get the list of
password as per the requirement displayed. Enter the all products which are available in this offering.
Account name of your choice. • 12 months Free – This offering includes products for 12-
• Page 2: Fill in required details in the account creation months from the sign-up date. See the list here.
fields. • Trials – Short-term free trial offers start from the date of
• Pending/Skeptical as both personal/business accounts activating particular service. More details is available
require Credit/Debit card details here.
Udemy
Cura
Visit Udemy for a learning content repository (that can be accessed
through Cura) that provides technical courses on topics like Cloud, AI,
Analytics, and Big Data Cura is Delotite’s new personalized learning platform. It uses machine
learning to bring you the most relevant content based on your
insterest, skills, and development needs
LinkedIN Use Cura to:
Visit LinkedIn for a learning content repository. Explore a variety of courses • Reskill or upskill quickly through continuous learning
available via Linked in Learning. opportunities
• Find just-in-time information on topics you need to learn more
about
Microsoft Learner Experience Portal
• Access Udemy, LinkedIn Learning and other training materials
(LxP)
Your one-stop access to a variety of learning choices: Instructor-led
training, guided self-paced learning through MS Learn and access to
Microsoft Certification exams.
Setup Up Microsoft LXP:
BEGINNER INTERMEDIATE
1. What is Big Data? 1. ETL in AWS
2. What is a Data Lake ? 2. Serverless ETL & BI on AWS
3. Cloud Computing on AWS 3. AWS Redshift
4. Data Lake in AWS 4. DynamoDB
5. Learning the AWS Well-Architected Framework 5. AWS EMR
6. Diving into AWS Web Services AWS S3 6. AWS Athena
7. AWS S3 Glacier Developers Guide 7. AWS Aurora
8. AWS EC2 8. Monitoring
9. Introduction to Python 9. AWS CloudWatch
10. SQL for Beginners 10. Docker in AWS
11. Diving into AWS Web Services 11. Introduction to Kubernetes
12. Basics of Machine Learning 12. Exploring Networking in AWS
Day21 Mastering AWS Glue, Quicksight, Athena & Redshift Spectrum 1200 Udemy
Day22 Mastering AWS Glue, Quicksight, Athena & Redshift Spectrum 1200 Udemy
Day23 Mastering AWS Glue, Quicksight, Athena & Redshift Spectrum 1200 Udemy
Day24 Mastering AWS Glue, Quicksight, Athena & Redshift Spectrum 1200 Udemy
Day25 Mastering AWS Glue, Quicksight, Athena & Redshift Spectrum 1200 Udemy
BEGINNER
Note: The above costs for certifications are for Feb 2021. Prices are subject to vary. ** Links embedded for each topic, click to explore
Deloitte Touche Tohmatsu India LLP L&D | AWS Playbook
Assignments
• This section contains 4 assignments
• Each Assignment consists of one or more labs and each lab has a specific
result which can be used in the subsequent labs/assignments
• Please do not skip any Lab as each Lab is equally important
• An Assignment will be marked complete if all the labs under it are
complete.
• Please attempt the Labs and assignment in the sequential order they are
defined
• The assignments use two Datasets:
Introduction • Flight_Weather Dataset: A csv file which contains flight data for the
year 2011 & 2012
• Ecommerce Sales Dataset: A csv file which contains the ecommerce
sales data
Location:
• Flight Weather dataset: Flight_weather.csv
• Ecommerce sales dataset: Ecommerce_sales.csv
▪ Select No of Instances as 1
Assignment 1: Lab2: Setup AWS CLI and execute the same helloworld.py
script from your local system on AWS using AWS CLI
Lab4: Setup and Connect to AWS RDS and access the data file
(flights data) in above step and calculate the following:
Links: Bring Your Own Data Labs (BYOD) :: Bring Your Own Data
Labs (BYOD) (workshop.aws), GitHub - aws-samples/bring-your-
own-data-labs: Bring your own data Labs: Build a serverless data
pipeline based on your own data
Pre-requisites 4. Data with multiple related tables via foreign keys are
supported in the context of this workshop.
5. Data with nested fields like JSON structures
are NOT supported in the context of this workshop.
6. Structure your data in Amazon S3 so that each table
would be in a separate folder, with the whole data in
separate bucket.
7. Before uploading your data files to Amazon S3, make sure
the files are UTF-8 encoding format.
Data Preparation
with AWS Glue DataBrew
Visualization
with Amazon QuickSight
• Insert the data files from the above and insert data into AWS
Dynamo DB
• Run Queries on AWS portal for the below:
• Total Trips taken per Month
• Total Trips taker per Hour
• Average Speed taken by Yellow Taxis per Hour of trips
• Average Distance travelled by Yellow Taxis per Hour
Deloitte Touche Tohmatsu India LLP L&D | AWS PLAYBOOK
Step 3: Access Dynamo DB through Redshift