Hands-On Machine Learning on Google Cloud Platform: Implementing smart and efficient analytics using Cloud ML Engine
()
About this ebook
Unleash Google's Cloud Platform to build, train and optimize machine learning models
Key Features- Get well versed in GCP pre-existing services to build your own smart models
- A comprehensive guide covering aspects from data processing, analyzing to building and training ML models
- A practical approach to produce your trained ML models and port them to your mobile for easy access
Google Cloud Machine Learning Engine combines the services of Google Cloud Platform with the power and flexibility of TensorFlow. With this book, you will not only learn to build and train different complexities of machine learning models at scale but also host them in the cloud to make predictions.
This book is focused on making the most of the Google Machine Learning Platform for large datasets and complex problems. You will learn from scratch how to create powerful machine learning based applications for a wide variety of problems by leveraging different data services from the Google Cloud Platform. Applications include NLP, Speech to text, Reinforcement learning, Time series, recommender systems, image classification, video content inference and many other. We will implement a wide variety of deep learning use cases and also make extensive use of data related services comprising the Google Cloud Platform ecosystem such as Firebase, Storage APIs, Datalab and so forth. This will enable you to integrate Machine Learning and data processing features into your web and mobile applications.
By the end of this book, you will know the main difficulties that you may encounter and get appropriate strategies to overcome these difficulties and build efficient systems.
What you will learn- Use Google Cloud Platform to build data-based applications for dashboards, web, and mobile
- Create, train and optimize deep learning models for various data science problems on big data
- Learn how to leverage BigQuery to explore big datasets
- Use Google’s pre-trained TensorFlow models for NLP, image, video and much more
- Create models and architectures for Time series, Reinforcement Learning, and generative models
- Create, evaluate, and optimize TensorFlow and Keras models for a wide range of applications
This book is for data scientists, machine learning developers and AI developers who want to learn Google Cloud Platform services to build machine learning applications. Since the interaction with the Google ML platform is mostly done via the command line, the reader is supposed to have some familiarity with the bash shell and Python scripting. Some understanding of machine learning and data science concepts will be handy
Giuseppe Ciaburro holds a PhD in environmental technical physics and two master's degrees. His research is on machine learning applications in the study of urban sound environments. He works at Built Environment Control Laboratory, Università degli Studi della Campania Luigi Vanvitelli (Italy). He has over 15 years' experience in programming Python, R, and MATLAB, first in the field of combustion, and then in acoustics and noise control. He has several publications to his credit. V Kishore Ayyadevara has over 9 years' experience of using analytics to solve business problems and setting up analytical work streams through his work at American Express, Amazon, and, more recently, a retail analytics consulting startup. He has an MBA from IIM Calcutta and is also an electronics and communications engineer. He has worked in credit risk analytics, supply chain analytics, and consulting for multiple FMCG companies to identify ways to improve their profitability. Alexis Perrier is a data science consultant with experience in signal processing and stochastic algorithms. He holds a master's in mathematics from Université Pierre et Marie Curie Paris VI and a PhD in signal processingRead more from Giuseppe Ciaburro
Neural Networks with R Rating: 0 out of 5 stars0 ratingsMATLAB for Machine Learning Rating: 0 out of 5 stars0 ratingsKeras 2.x Projects: 9 projects demonstrating faster experimentation of neural network and deep learning applications using Keras Rating: 0 out of 5 stars0 ratingsMATLAB for Machine Learning: Unlock the power of deep learning for swift and enhanced results Rating: 0 out of 5 stars0 ratingsKeras Reinforcement Learning Projects: 9 projects exploring popular reinforcement learning techniques to build self-learning agents Rating: 0 out of 5 stars0 ratings
Related to Hands-On Machine Learning on Google Cloud Platform
Related ebooks
Cloud Analytics with Google Cloud Platform: An end-to-end guide to processing and analyzing big data using Google Cloud Platform Rating: 0 out of 5 stars0 ratingsGoogle Cloud Platform Administration: Design highly available, scalable, and secure cloud solutions on GCP Rating: 0 out of 5 stars0 ratingsGoogle Cloud Platform for Architects: Design and manage powerful cloud solutions Rating: 0 out of 5 stars0 ratingsGoogle Cloud Platform for Developers: Build highly scalable cloud solutions with the power of Google Cloud Platform Rating: 0 out of 5 stars0 ratingsGoogle Cloud Platform Cookbook: Implement, deploy, maintain, and migrate applications on Google Cloud Platform Rating: 0 out of 5 stars0 ratingsGoogle Cloud for Developers: Write, migrate, and extend your code by leveraging Google Cloud Rating: 0 out of 5 stars0 ratingsCloud Native Python: Build and deploy resilent applications on the cloud using microservices, AWS, Azure and more Rating: 0 out of 5 stars0 ratingsHands-On Microservices with C#: Designing a real-world, enterprise-grade microservice ecosystem with the efficiency of C# 7 Rating: 0 out of 5 stars0 ratingsLearning Google Cloud Vertex AI: Build, deploy, and manage machine learning models with Vertex AI (English Edition) Rating: 0 out of 5 stars0 ratingsHands-On Machine Learning with C#: Build smart, speedy, and reliable data-intensive applications using machine learning Rating: 0 out of 5 stars0 ratingsCloud Development and Deployment with CloudBees Rating: 0 out of 5 stars0 ratingsMachine Learning with the Elastic Stack: Expert techniques to integrate machine learning with distributed search and analytics Rating: 0 out of 5 stars0 ratingsJavaScript Cloud Native Development Cookbook: Deliver serverless cloud-native solutions on AWS, Azure, and GCP Rating: 0 out of 5 stars0 ratingsAnsible 2 Cloud Automation Cookbook: Write Ansible playbooks for AWS, Google Cloud, Microsoft Azure, and OpenStack Rating: 0 out of 5 stars0 ratingsMachine Learning with Go Quick Start Guide: Hands-on techniques for building supervised and unsupervised machine learning workflows Rating: 0 out of 5 stars0 ratingsImplementing Modern DevOps: Enabling IT organizations to deliver faster and smarter Rating: 0 out of 5 stars0 ratingsPython for Google App Engine Rating: 0 out of 5 stars0 ratingsHands-On Networking with Azure: Build large-scale, real-world apps using Azure networking solutions Rating: 0 out of 5 stars0 ratingsHyperledger Cookbook: Over 40 recipes implementing the latest Hyperledger blockchain frameworks and tools Rating: 0 out of 5 stars0 ratingsHands-On Deep Learning with Go: A practical guide to building and implementing neural network models using Go Rating: 0 out of 5 stars0 ratingsMachine Learning with scikit-learn Quick Start Guide: Classification, regression, and clustering techniques in Python Rating: 0 out of 5 stars0 ratingsMicroservices Development Cookbook: Design and build independently deployable modular services Rating: 0 out of 5 stars0 ratings
Intelligence (AI) & Semantics For You
Artificial Intelligence: A Guide for Thinking Humans Rating: 4 out of 5 stars4/52084: Artificial Intelligence and the Future of Humanity Rating: 4 out of 5 stars4/5Mastering ChatGPT: 21 Prompts Templates for Effortless Writing Rating: 4 out of 5 stars4/5Summary of Super-Intelligence From Nick Bostrom Rating: 5 out of 5 stars5/5Midjourney Mastery - The Ultimate Handbook of Prompts Rating: 5 out of 5 stars5/5The Secrets of ChatGPT Prompt Engineering for Non-Developers Rating: 5 out of 5 stars5/5ChatGPT For Fiction Writing: AI for Authors Rating: 5 out of 5 stars5/5Coding with AI For Dummies Rating: 0 out of 5 stars0 ratingsEnterprise AI For Dummies Rating: 3 out of 5 stars3/5Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates Rating: 4 out of 5 stars4/5Dark Aeon: Transhumanism and the War Against Humanity Rating: 5 out of 5 stars5/5ChatGPT For Dummies Rating: 4 out of 5 stars4/53550+ Most Effective ChatGPT Prompts Rating: 0 out of 5 stars0 ratingsAI for Educators: AI for Educators Rating: 5 out of 5 stars5/5101 Midjourney Prompt Secrets Rating: 3 out of 5 stars3/5Artificial Intelligence For Dummies Rating: 3 out of 5 stars3/5The Roadmap to AI Mastery: A Guide to Building and Scaling Projects Rating: 3 out of 5 stars3/5The AI-Driven Leader: Harnessing AI to Make Faster, Smarter Decisions Rating: 0 out of 5 stars0 ratingsThe Dangers of Automation in Airliners: Accidents Waiting to Happen Rating: 5 out of 5 stars5/5ChatGPT Ultimate User Guide - How to Make Money Online Faster and More Precise Using AI Technology Rating: 1 out of 5 stars1/5
Reviews for Hands-On Machine Learning on Google Cloud Platform
0 ratings0 reviews
Book preview
Hands-On Machine Learning on Google Cloud Platform - Giuseppe Ciaburro
Hands-On Machine Learning on Google Cloud Platform
Implementing smart and efficient analytics using Cloud ML Engine
Giuseppe Ciaburro
V Kishore Ayyadevara
Alexis Perrier
BIRMINGHAM - MUMBAI
Hands-On Machine Learning on Google Cloud Platform
Copyright © 2018 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Commissioning Editor: Sunith Shetty
Acquisition Editor: Tushar Gupta
Content Development Editor: Cheryl Dsa
Technical Editor: Dinesh Pawar
Copy Editor: Vikrant Phadkay
Project Coordinator: Nidhi Joshi
Proofreader: Safis Editing
Indexer: Mariammal Chettiyar
Graphics: Tania Dutta
Production Coordinator: Arvindkumar Gupta
First published: April 2018
Production reference: 1260418
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.
ISBN 978-1-78839-348-5
www.packtpub.com
mapt.io
Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.
Why subscribe?
Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals
Improve your learning with Skill Plans built especially for you
Get a free eBook or video every month
Mapt is fully searchable
Copy and paste, print, and bookmark content
PacktPub.com
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.
Contributors
About the authors
Giuseppe Ciaburro holds a PhD in environmental technical physics and two master's degrees. His research is on machine learning applications in the study of urban sound environments. He works at Built Environment Control Laboratory, Università degli Studi della Campania Luigi Vanvitelli (Italy). He has over 15 years' experience in programming Python, R, and MATLAB, first in the field of combustion, and then in acoustics and noise control. He has several publications to his credit.
V Kishore Ayyadevara has over 9 years' experience of using analytics to solve business problems and setting up analytical work streams through his work at American Express, Amazon, and, more recently, a retail analytics consulting startup. He has an MBA from IIM Calcutta and is also an electronics and communications engineer. He has worked in credit risk analytics, supply chain analytics, and consulting for multiple FMCG companies to identify ways to improve their profitability.
Alexis Perrier is a data science consultant with experience in signal processing and stochastic algorithms. He holds a master's in mathematics from Université Pierre et Marie Curie Paris VI and a PhD in signal processing from Télécom ParisTech. He is actively involved in the DC data science community. He is also an avid book lover and proud owner of a real chalk blackboard, where he regularly shares his fascination of mathematical equations with his kids.
About the reviewers
Mikhail Berlyant is a data warehousing veteran. He has been a data developer since the late 1970s. Since 2000, he has led data systems, data mining, and data warehouse teams at Yahoo! and Myspace.
He is a Google Cloud expert and senior VP of Technology, at Viant Inc., a people-based advertising tech company that enables marketers to plan, execute, and measure their digital media investments through a cloud-based platform. At Viant, he led the migration of a petabyte-sized data warehouse to Google Cloud. He is currently focusing on self-serve/productivity tools for BigQuery/GCP.
I'd like to say thanks to my beautiful wife, Svetlana, for supporting me in all my endeavors.
Sanket Thodge is an entrepreneur by profession in Pune, India. He is an author of Cloud Analytics with Google Cloud Platform. He founded Pi R Square Digital Solutions. With expertise as a Hadoop developer, he has explored the cloud, IoT, machine learning, and blockchain. He has also applied for a patent in IoT and has worked with numerous startups and MNCs, providing consultancy, architecture building, development, and corporate training across the globe.
Antonio Gulli is a transformational software executive and business leader with a passion for establishing and managing global technological talent for innovation and execution. He is an expert in search engines, online services, machine learning, and so on. Currently, he is a site lead and director of cloud at Google Warsaw, driving European efforts for serverless, Kubernetes, and Google Cloud UX. Antonio has filed for 20+ patents, published multiple academic papers, and served as a senior PC member in multiple international conferences.
Chirag Nayyar is helping organizations to migrate their workload from on-premise to the public cloud. He has experience in web app migration, SAP workload on the cloud, and EDW. He is currently working at Cloud Kinetics Technology Solutions. He holds a wide range of certifications from all major public cloud platforms. He also runs meetups and is a regular speaker at various cloud events.
Packt is searching for authors like you
If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.
Table of Contents
Title Page
Copyright and Credits
Hands-On Machine Learning on Google Cloud Platform
Packt Upsell
Why subscribe?
PacktPub.com
Contributors
About the authors
About the reviewers
Packt is searching for authors like you
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Download the color images
Conventions used
Get in touch
Reviews
Introducing the Google Cloud Platform
ML and the cloud
The nature of the cloud
Public cloud
Managed cloud versus unmanaged cloud
IaaS versus PaaS versus SaaS
Costs and pricing
ML
Introducing the GCP
Mapping the GCP
Getting started with GCP
Project-based organization
Creating your first project
Roles and permissions
Further reading
Summary
Google Compute Engine
Google Compute Engine
VMs, disks, images, and snapshots
Creating a VM
Google Shell
Google Cloud Platform SDK
Gcloud
Gcloud config
Accessing your instance with gcloud
Transferring files with gcloud
Managing the VM
IPs
Setting up a data science stack on the VM
BOX the ipython console
Troubleshooting
Adding GPUs to instances
Startup scripts and stop scripts
Resources and further reading
Summary
Google Cloud Storage
Google Cloud Storage
Box–storage versus drive
Accessing control lists
Access and management through the web console
gsutil
gsutil cheatsheet
Advanced gsutil
Signed URLs
Creating a bucket in Google Cloud Storage
Google Storage namespace
Naming a bucket
Naming an object
Creating a bucket
Google Cloud Storage console
Google Cloud Storage gsutil
Life cycle management
Google Cloud SQL
Databases supported
Google Cloud SQL performance and scalability
Google Cloud SQL security and architecture
Creating Google Cloud SQL instances
Summary
Querying Your Data with BigQuery
Approaching big data
Data structuring
Querying the database
SQL basics
Google BigQuery
BigQuery basics
Using a graphical web UI
Visualizing data with Google Data Studio
Creating reports in Data Studio
Summary
Transforming Your Data
How to clean and prepare the data
Google Cloud Dataprep
Exploring Dataprep console
Removing empty cells
Replacing incorrect values
Mismatched values
Finding outliers in the data
Visual functionality
Statistical information
Removing outliers
Run Job
Scale of features
Min–max normalization
z score standardization
Google Cloud Dataflow
Summary
Essential Machine Learning
Applications of machine learning
Financial services
Retail industry
Telecom industry
Supervised and unsupervised machine learning
Overview of machine learning techniques
Objective function in regression
Linear regression
Decision tree
Random forest
Gradient boosting
Neural network
Logistic regression
Objective function in classification
Data splitting
Measuring the accuracy of a model
Absolute error
Root mean square error
The difference between machine learning and deep learning
Applications of deep learning
Summary
Google Machine Learning APIs
Vision API
Enabling the API
Opening an instance
Creating an instance using Cloud Shell
Label detection
Text detection
Logo detection
Landmark detection
Cloud Translation API
Enabling the API
Natural Language API
Speech-to-text API
Video Intelligence API
Summary
Creating ML Applications with Firebase
Features of Firebase
Building a web application
Building a mobile application
Summary
Neural Networks with TensorFlow and Keras
Overview of a neural network
Setting up Google Cloud Datalab
Installing and importing the required packages
Working details of a simple neural network
Backpropagation
Implementing a simple neural network in Keras
Understanding the various loss functions
Softmax activation
Building a more complex network in Keras
Activation functions
Optimizers
Increasing the depth of network
Impact on change in batch size
Implementing neural networks in TensorFlow
Using premade estimators
Creating custom estimators
Summary
Evaluating Results with TensorBoard
Setting up TensorBoard
Overview of summary operations
Ways to debug the code
Setting up TensorBoard from TensorFlow
Summaries from custom estimator
Summary
Optimizing the Model through Hyperparameter Tuning
The intuition of hyperparameter tuning
Overview of hyperparameter tuning
Hyperparameter tuning in Google Cloud
The model file
Configuration file
Setup file
The __init__ file
Summary
Preventing Overfitting with Regularization
Intuition of over/under fitting
Reducing overfitting
Implementing L2 regularization
Implementing L1 regularization
Implementing dropout
Reducing underfitting
Summary
Beyond Feedforward Networks – CNN and RNN
Convolutional neural networks
Convolution layer
Rectified Linear Units
Pooling layers
Fully connected layer
Structure of a CNN
TensorFlow overview
Handwriting Recognition using CNN and TensorFlow
Run Python code on Google Cloud Shell
Recurrent neural network
Fully recurrent neural networks
Recursive neural networks
Hopfield recurrent neural networks
Elman neural networks
Long short-term memory networks
Handwriting Recognition using RNN and TensorFlow
LSTM on Google Cloud Shell
Summary
Time Series with LSTMs
Introducing time series
Classical approach to time series
Estimation of the trend component
Estimating the seasonality component
Time series models
Autoregressive models
Moving average models
Autoregressive moving average model
Autoregressive integrated moving average models
Removing seasonality from a time series
Analyzing a time series dataset
Identifying a trend in a time series
Time series decomposition
Additive method
Multiplicative method
LSTM for time series analysis
Overview of the time series dataset
Data scaling
Data splitting
Building the model
Making predictions
Summary
Reinforcement Learning
Reinforcement learning introduction
Agent-Environment interface
Markov Decision Process
Discounted cumulative reward
Exploration versus exploitation
Reinforcement learning techniques
Q-learning
Temporal difference learning
Dynamic Programming
Monte Carlo methods
Deep Q-Network
OpenAI Gym
Cart-Pole system
Learning phase
Testing phase
Summary
Generative Neural Networks
Unsupervised learning
Generative models
Restricted Boltzmann machine
Boltzmann machine architecture
Boltzmann machine disadvantages
Deep Boltzmann machines
Autoencoder
Variational autoencoder
Generative adversarial network
Adversarial autoencoder
Feature extraction using RBM
Breast cancer dataset
Data preparation
Model fitting
Autoencoder with Keras
Load data
Keras model overview
Sequential model
Keras functional API
Define model architecture
Magenta
The NSynth dataset
Summary
Chatbots
Chatbots fundamentals
Chatbot history
The imitation game
Eliza
Parry
Jabberwacky
Dr. Sbaitso
ALICE
SmarterChild
IBM Watson
Building a bot
Intents
Entities
Context
Chatbots
Essential requirements
The importance of the text
Word transposition
Checking a value against a pattern
Maintaining context
Chatbots architecture
Natural language processing
Natural language understanding
Google Cloud Dialogflow
Dialogflow overview
Basics Dialogflow elements
Agents
Intent
Entity
Action
Context
Building a chatbot with Dialogflow
Agent creation
Intent definition
Summary
Preface
Google Cloud ML Engine combines the services of Google Cloud Platform with the power and flexibility of TensorFlow. With this book, you will not only learn how to build and train different complexities of machine learning models at scale, but also to host them in the cloud to make predictions.
This book is focused on making the most of the Google Machine Learning Platform for large datasets and complex problems. You will learn how to create powerful machine-learning-based applications from scratch for a wide variety of problems by leveraging different data services from the Google Cloud Platform. Applications include NLP, speech-to-text, reinforcement learning, time series, recommender systems, image classification, video content inference, and many others. We will implement a wide variety of deep learning use cases and will also make extensive use of data-related services comprising the Google Cloud Platform ecosystem, such as Firebase, Storage APIs, Datalab, and so forth. This will enable you to integrate machine learning and data processing features into your web and mobile applications.
By the end of this book, you will be aware of the main difficulties that you may encounter, and be familiar with appropriate strategies to overcome these difficulties and build efficient systems.
Who this book is for
This book is for data scientists, machine learning developers, and AI developers who want to learn Google Cloud Platform services to build machine learning applications. Since interaction with the Google ML platform is mostly done via the command line, the reader should have some familiarity with the bash shell and Python scripting. Some understanding of machine learning and data science concepts will also be handy.
What this book covers
Chapter 1, Introducing the Google Cloud Platform, explores different services that may be useful to build a machine learning pipeline based on GCP.
Chapter 2, Google Compute Engine, helps you to create and fully manage your VM via both the online console and command-line tools, as well as how to implement a data science workflow and a Jupyter Notebook workspace.
Chapter 3, Google Cloud Storage, shows how to upload data and manage it using the services provided by the Google Cloud Platform.
Chapter 4, Querying Your Data with BigQuery, shows you how to query data from Google Storage and visualize it with Google Data Studio.
Chapter 5, Transforming Your Data, presents Dataprep, a service useful for preprocessing data, extracting features, and cleaning up records. We also look at Dataflow, a service used to implement streaming and batch processing.
Chapter 6, Essential Machine Learning, starts our journey into machine learning and deep learning; we learn when to apply each one.
Chapter 7, Google Machine Learning APIs, teaches us how to use Google Cloud machine learning APIs for image analysis, text and speech processing, translation, and video inference.
Chapter 8, Creating ML Applications with Firebase, shows how to integrate different GCP services to build a seamless machine-learning-based application, mobile or web-based.
Chapter 9, Neural Networks with TensorFlow and Keras, gives a good understanding of the structure and key elements of a feedforward network, how to architecture one, and how to tinker and experiment with different parameters.
Chapter 10, Evaluating Results with TensorBoard, shows how the choice of different parameters and functions impacts the performance of the model.
Chapter 11, Optimizing the Model through Hyperparameter Tuning, teaches us how to use hypertuning in TensorFlow application code and interpret the results to select the best performing model.
Chapter 12, Preventing Overfitting with Regularization, shows how to identify overfitting and make our models more robust to previously unseen data by setting the right parameters and defining the proper architectures.
Chapter 13, Beyond Feedforward Networks – CNN and RNNs, teaches which type of neural network to apply to different problems, and how to define and implement them on GCP.
Chapter 14, Time Series with LSTMs, shows how to create LSTMs and apply them to time series predictions. We will also understand when LSTMs outperform more standard approaches.
Chapter 15, Reinforcement Learning, introduces the power of reinforcement learning and shows how to implement a simple use case on GCP.
Chapter 16, Generative Neural Networks, teaches us how to extract the content generated within the neural net with different types of content—text, images, and sounds.
Chapter 17, Chatbots, shows how to train a contextual chatbot while implementing it in a real mobile application.
To get the most out of this book
In this book, machine learning algorithms are implemented on the Google Cloud Platform. To reproduce the many examples in this book, you need to possess a working account on GCP. We have used Python 2.7 and above to build various applications. In that spirit, we have tried to keep all of the code as friendly and readable as possible. We feel that this will enable our readers to easily understand the code and readily use it in different scenarios.
Download the example code files
You can download the example code files for this book from your account at www.packtpub.com. If you purchased this book elsewhere, you can visit www.packtpub.com/support and register to have the files emailed directly to you.
You can download the code files by following these steps:
Log in or register at www.packtpub.com.
Select the SUPPORT tab.
Click on Code Downloads & Errata.
Enter the name of the book in the Search box and follow the onscreen instructions.
Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:
WinRAR/7-Zip for Windows
Zipeg/iZip/UnRarX for Mac
7-Zip/PeaZip for Linux
The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Hands-On-Machine-Learning-on-Google-Cloud-Platform. In case there's an update to the code, it will be updated on the existing GitHub repository.
We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
Download the color images
We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://www.packtpub.com/sites/default/files/downloads/HandsOnMachineLearningonGoogleCloudPlatform_ColorImages.pdf.
Conventions used
There are a number of text conventions used throughout this book.
CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: Where GROUP is a service or an account element and COMMAND is the command to send to the GROUP.
A block of code is set as follows:
import matplotlib.patches as patches
import numpy as np
fig,ax = plt.subplots(1)
When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:
text=this is a good text
from google.cloud.language_v1 import types
document = types.Document(
content=text,
type='PLAIN_TEXT')
Any command-line input or output is written as follows:
$ gcloud compute instances list
Bold: Indicates a new term, an important word, or words that you see onscreen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: Click on Create a new project.
Warnings or important notes appear like this.
Tips and tricks appear like this.
Get in touch
Feedback from our readers is always welcome.
General feedback: Email [email protected] and mention the book title in the subject of your message. If you have questions about any aspect of this book, please email us at [email protected].
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.
Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.
Reviews
Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!
For more information about Packt, please visit packtpub.com.
Introducing the Google Cloud Platform
The goal of this first introductory chapter is to give you an overview of the Google Cloud Platform (GCP). We start by explaining why machine learning (ML) and cloud computing go hand in hand as the demand for ever more hungry computing resources grows for today's ML applications. We then proceed with a 360° presentation of the platform's data-related services. Account and project creation as well as role allocation close the chapter.
A data science project follows a regular set of steps: in extracting the data, exploring, cleaning it, extracting information, training and assessing models, and finally building machine-learning-enabled applications. For each step of the data science flow, there are one or several services in the GCP that are adequate.
But, before we present the overall mapping of the GCP data-related services, it is important to understand why ML and cloud computing are truly made for each other.
In this chapter, we will cover the following topics:
ML and the cloud
Introducing the GCP
Data services of the Google platform
ML and the cloud
In short, artificial intelligence (AI) requires a lot of computing resources. Cloud computing addresses those concerns.
ML is a new type of microscope and telescope, allowing each of to us to push the boundaries of human knowledge and human activities. With ever more powerful ML platforms and open tools, we are able to conquer new realms of knowledge and grow new types of businesses. From the comfort of our laptops, at home, or at the office, we can better understand and predict human behavior in a wide range of domains. Think health care, transportation, energy, financial markets, human communication, human-machine interaction, social network dynamics, economic behavior, and nature (astronomy, global warming, or seismic activity). The list of domains affected by the explosion of AI is truly unlimited. The impact on society? Astounding.
With so many resources available to anyone with an online connection, the barrier to joining the AI revolution has never been lower than it is now. Books, tutorials, MOOCs, and meet-ups, as well as open source libraries in a myriad of languages, are freely available to both the seasoned and the beginner data scientist.
As veteran data scientists know well, data science is always hungry for more computational resources. Classification on the Iris or the MINST image datasets or predictive modeling on Titanic passengers does not reflect real-world data. Real-world data is by essence dirty, incomplete, noisy, multi-sourced, and more often than not, in large volumes. Exploiting these large datasets requires computational power, storage, CPUs, GPUs, and fast I/O.
However, more powerful machines are not sufficient to build meaningful ML applications. Grounded in science, data science requires a scientific mindset with concepts such as reproducibility and reviewing. Both aspects are made easier by working with online accessible resources. Sharing datasets and models and exposing results is always more difficult when the data lives on one person's computer. Reproducing results and maintaining models with new data also requires easy accessibility to assets. And as we work on ever more personalized and critical data (for instance in healthcare), privacy and security concerns become all the more important to the project stakeholders.
This is where the cloud comes in, by offering scalability and accessibility while providing an adequate level of security.
Before diving into GCP, let's learn a bit more about the cloud.
The nature of the cloud
ML projects are resource intensive. From storage to computational power, training models sometimes require resources that cannot be found on a simple standalone computer. Physical limitations in terms of storage have shrunk in recent years. As we now enjoy reliable terabyte storage accessible at reduced prices, storage is no longer an issue for most data projects that are not in the realm of big data. Computing power has also increased so much that what required expensive workstations a few years ago can now run on laptops.
However, despite all this amazingly rapid evolution, the power of the standalone PC is finite. There is an upper limit to the volume of data you can store on your machine and to the time you're willing to wait to get your model trained. New frontiers in AI, with speech-to-text, video captioning in real time, self-driving cars, music generation, or chatbots that can fool a human being and pass the turing test, require ever larger resources. This is especially true of deep learning models, which are too slow on standard CPUs and require GPU-based machines to train in a reasonable amount of time.
ML in the cloud does not face these limitations. What you get with cloud computing is direct access to high-performance computing (HPC). Before the cloud (roughly before AWS launched its Elastic Computing Cloud (EC2) service in 2006), HPC was only available via supercomputers, such as the Cray computers. Cray is a US company that has built some of the most powerful supercomputers since the 1960s. China's Tianhe-2 is now the most powerful supercomputer in the world, with a capacity of 100,000 petaflops (that's 10² x 10¹⁵, or 10 to the power of 17 floating-point operations per second!).
A supercomputer not only costs millions of US dollars but also requires its own physical infrastructure and has huge maintenance costs. It is also out of reach for individuals and for most companies. Engineers and researchers, hungry for HPC, now turn to on-demand cloud infrastructures. Cloud service offers are democratizing access to HPC.
Computing in the cloud is built on a distributed architecture. The processors are distributed across different servers instead of being aggregated in one single machine. With a few clicks or command lines, anyone can sign up massively complex banks of servers in a matter of minutes. The amount of power at your command can be mind-blowing.
Cloud computing can not only handle the most demanding optimization tasks but also carry out a simple regression on a tiny dataset. Cloud computing is extremely flexible.
To recap, cloud computing offers:
Instantaneity: Resources can be made available in a matter of minutes.
On-demand: Instances can be put on stand by or decommissioned when no longer needed.
Diversity: The wide range of operating systems, storage, and database solutions, allow the architect to create project-focused architectures, from simple mobile applications to ML APIs.
Unlimited resources: If not infinite yet, the volume of resources for storage computing and networks you can assemble is mind-blowing.
GPUs: Most PCs are based on CPUs (with the exception of machines optimized for gaming). Deep learning requires GPUs to achieve human-compatible speeds for training models. Cloud computing makes GPUs available at a fraction of the cost needed to buy GPU machines.
Controlled accessibility and security: With granular role definitions, service compartmentalization, encrypted connections, and user-based access control, cloud platforms greatly reduce the risk of intrusion and data loss.
Apart from these, there are several other types of cloud platforms and offers on the market.
Public cloud
There are two main types of cloud models depending on the needs of the customers: public versus private and multi-tenant versus single-tenant. These different cloud types offer different levels of management, security, and pricing.
A public cloud consists of resources that are located off-site over the internet. In a public cloud, the infrastructure is typically multi-tenant. Multiple customers can share the same underlying hardware or server. Resources such as networking, storage, power, cooling and computing are all shared. The customer usually has no visibility of where this infrastructure is hosted except for choosing a geographic region. The pricing mode of a public cloud service is based on the volume of data, the computing power that is used and other infrastructure-management-related services—or, more precisely, a mix of RAM, vCPUs, disk, and bandwidth.
In a private cloud, the resources are dedicated to a single customer; the architecture is single-tenant instead of multi-tenant. The servers are located on premise or in a remote data center. Customers own (or rent) the infrastructure and are responsible for maintaining it. Private cloud infrastructures are more expensive to operate as they require dedicated hardware to be secured for a single tenant. Customers of the private cloud have more control over their infrastructure, and therefore they can achieve their compliance and security requirements.
Hybrid clouds are composed of a mix of public clouds and private ones.
The GCP is a public multi-tenant cloud platform. You share the servers you use with other customers and let Google handle the support, the data centers, and the infrastructure.
Managed cloud versus unmanaged cloud
The cloud market has also diversified into two large segments—managed cloud versus unmanaged cloud.
In an unmanaged cloud platform, the infrastructure is self-served. In case of failure, it is the responsibility of the customer to have some mechanisms in place to restore the operations. Unmanaged cloud requires the customer to have the qualified expertise and resources to build, manage, and maintain cloud instances and infrastructures. Focused on self-serving applications, unmanaged cloud offers do not include support with their basic tiers.
In a managed cloud platform, the provider will support the underlying infrastructure by offering monitoring, troubleshooting, and around-the-clock customer service. Managed cloud brings along qualified expertise and resources to the team right away. For many companies, having a service provider to handle their public cloud can be easier and more cost-effective than hiring their own staff to operate their clouds.
The GCP is a public, multi-tenant, and unmanaged cloud service. So are AWS and Azure. Rackspace, on the other hand, is an example of a managed cloud service company. As an example, Rackspace just started offering managed services for GCP in March 2017.