Discover millions of ebooks, audiobooks, and so much more with a free trial

From $11.99/month after trial. Cancel anytime.

Hands-On Machine Learning on Google Cloud Platform: Implementing smart and efficient analytics using Cloud ML Engine
Hands-On Machine Learning on Google Cloud Platform: Implementing smart and efficient analytics using Cloud ML Engine
Hands-On Machine Learning on Google Cloud Platform: Implementing smart and efficient analytics using Cloud ML Engine
Ebook779 pages5 hours

Hands-On Machine Learning on Google Cloud Platform: Implementing smart and efficient analytics using Cloud ML Engine

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Unleash Google's Cloud Platform to build, train and optimize machine learning models

Key Features
  • Get well versed in GCP pre-existing services to build your own smart models
  • A comprehensive guide covering aspects from data processing, analyzing to building and training ML models
  • A practical approach to produce your trained ML models and port them to your mobile for easy access
Book Description

Google Cloud Machine Learning Engine combines the services of Google Cloud Platform with the power and flexibility of TensorFlow. With this book, you will not only learn to build and train different complexities of machine learning models at scale but also host them in the cloud to make predictions.

This book is focused on making the most of the Google Machine Learning Platform for large datasets and complex problems. You will learn from scratch how to create powerful machine learning based applications for a wide variety of problems by leveraging different data services from the Google Cloud Platform. Applications include NLP, Speech to text, Reinforcement learning, Time series, recommender systems, image classification, video content inference and many other. We will implement a wide variety of deep learning use cases and also make extensive use of data related services comprising the Google Cloud Platform ecosystem such as Firebase, Storage APIs, Datalab and so forth. This will enable you to integrate Machine Learning and data processing features into your web and mobile applications.

By the end of this book, you will know the main difficulties that you may encounter and get appropriate strategies to overcome these difficulties and build efficient systems.

What you will learn
  • Use Google Cloud Platform to build data-based applications for dashboards, web, and mobile
  • Create, train and optimize deep learning models for various data science problems on big data
  • Learn how to leverage BigQuery to explore big datasets
  • Use Google’s pre-trained TensorFlow models for NLP, image, video and much more
  • Create models and architectures for Time series, Reinforcement Learning, and generative models
  • Create, evaluate, and optimize TensorFlow and Keras models for a wide range of applications
Who this book is for

This book is for data scientists, machine learning developers and AI developers who want to learn Google Cloud Platform services to build machine learning applications. Since the interaction with the Google ML platform is mostly done via the command line, the reader is supposed to have some familiarity with the bash shell and Python scripting. Some understanding of machine learning and data science concepts will be handy

Giuseppe Ciaburro holds a PhD in environmental technical physics and two master's degrees. His research is on machine learning applications in the study of urban sound environments. He works at Built Environment Control Laboratory, Università degli Studi della Campania Luigi Vanvitelli (Italy). He has over 15 years' experience in programming Python, R, and MATLAB, first in the field of combustion, and then in acoustics and noise control. He has several publications to his credit. V Kishore Ayyadevara has over 9 years' experience of using analytics to solve business problems and setting up analytical work streams through his work at American Express, Amazon, and, more recently, a retail analytics consulting startup. He has an MBA from IIM Calcutta and is also an electronics and communications engineer. He has worked in credit risk analytics, supply chain analytics, and consulting for multiple FMCG companies to identify ways to improve their profitability. Alexis Perrier is a data science consultant with experience in signal processing and stochastic algorithms. He holds a master's in mathematics from Université Pierre et Marie Curie Paris VI and a PhD in signal processing
LanguageEnglish
Release dateApr 30, 2018
ISBN9781788398879
Hands-On Machine Learning on Google Cloud Platform: Implementing smart and efficient analytics using Cloud ML Engine

Read more from Giuseppe Ciaburro

Related to Hands-On Machine Learning on Google Cloud Platform

Related ebooks

Intelligence (AI) & Semantics For You

View More

Related articles

Reviews for Hands-On Machine Learning on Google Cloud Platform

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Hands-On Machine Learning on Google Cloud Platform - Giuseppe Ciaburro

    Hands-On Machine Learning on Google Cloud Platform

    Hands-On Machine Learning on Google Cloud Platform

    Implementing smart and efficient analytics using Cloud ML Engine

    Giuseppe Ciaburro

    V Kishore Ayyadevara

    Alexis Perrier

    BIRMINGHAM - MUMBAI

    Hands-On Machine Learning on Google Cloud Platform

    Copyright © 2018 Packt Publishing

    All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

    Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

    Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

    Commissioning Editor: Sunith Shetty

    Acquisition Editor: Tushar Gupta

    Content Development Editor: Cheryl Dsa

    Technical Editor: Dinesh Pawar

    Copy Editor: Vikrant Phadkay

    Project Coordinator: Nidhi Joshi

    Proofreader: Safis Editing

    Indexer: Mariammal Chettiyar

    Graphics: Tania Dutta

    Production Coordinator: Arvindkumar Gupta

    First published: April 2018

    Production reference: 1260418

    Published by Packt Publishing Ltd.

    Livery Place

    35 Livery Street

    Birmingham

    B3 2PB, UK.

    ISBN 978-1-78839-348-5

    www.packtpub.com

    mapt.io

    Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.

    Why subscribe?

    Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals

    Improve your learning with Skill Plans built especially for you

    Get a free eBook or video every month

    Mapt is fully searchable

    Copy and paste, print, and bookmark content

    PacktPub.com

    Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.

    At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.

    Contributors

    About the authors

    Giuseppe Ciaburro holds a PhD in environmental technical physics and two master's degrees. His research is on machine learning applications in the study of urban sound environments. He works at Built Environment Control Laboratory, Università degli Studi della Campania Luigi Vanvitelli (Italy). He has over 15 years' experience in programming Python, R, and MATLAB, first in the field of combustion, and then in acoustics and noise control. He has several publications to his credit.

    V Kishore Ayyadevara has over 9 years' experience of using analytics to solve business problems and setting up analytical work streams through his work at American Express, Amazon, and, more recently, a retail analytics consulting startup. He has an MBA from IIM Calcutta and is also an electronics and communications engineer. He has worked in credit risk analytics, supply chain analytics, and consulting for multiple FMCG companies to identify ways to improve their profitability.

    Alexis Perrier is a data science consultant with experience in signal processing and stochastic algorithms. He holds a master's in mathematics from Université Pierre et Marie Curie Paris VI and a PhD in signal processing from Télécom ParisTech. He is actively involved in the DC data science community. He is also an avid book lover and proud owner of a real chalk blackboard, where he regularly shares his fascination of mathematical equations with his kids.

    About the reviewers

    Mikhail Berlyant is a data warehousing veteran. He has been a data developer since the late 1970s. Since 2000, he has led data systems, data mining, and data warehouse teams at Yahoo! and Myspace.

    He is a Google Cloud expert and senior VP of Technology, at Viant Inc., a people-based advertising tech company that enables marketers to plan, execute, and measure their digital media investments through a cloud-based platform. At Viant, he led the migration of a petabyte-sized data warehouse to Google Cloud. He is currently focusing on self-serve/productivity tools for BigQuery/GCP.

    I'd like to say thanks to my beautiful wife, Svetlana, for supporting me in all my endeavors.

    Sanket Thodge is an entrepreneur by profession in Pune, India. He is an author of Cloud Analytics with Google Cloud Platform. He founded Pi R Square Digital Solutions. With expertise as a Hadoop developer, he has explored the cloud, IoT, machine learning, and blockchain. He has also applied for a patent in IoT and has worked with numerous startups and MNCs, providing consultancy, architecture building, development, and corporate training across the globe.

    Antonio Gulli is a transformational software executive and business leader with a passion for establishing and managing global technological talent for innovation and execution. He is an expert in search engines, online services, machine learning, and so on. Currently, he is a site lead and director of cloud at Google Warsaw, driving European efforts for serverless, Kubernetes, and Google Cloud UX. Antonio has filed for 20+ patents, published multiple academic papers, and served as a senior PC member in multiple international conferences.

    Chirag Nayyar is helping organizations to migrate their workload from on-premise to the public cloud. He has experience in web app migration, SAP workload on the cloud, and EDW. He is currently working at Cloud Kinetics Technology Solutions. He holds a wide range of certifications from all major public cloud platforms. He also runs meetups and is a regular speaker at various cloud events.

    Packt is searching for authors like you

    If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.

    Table of Contents

    Title Page

    Copyright and Credits

    Hands-On Machine Learning on Google Cloud Platform

    Packt Upsell

    Why subscribe?

    PacktPub.com

    Contributors

    About the authors

    About the reviewers

    Packt is searching for authors like you

    Preface

    Who this book is for

    What this book covers

    To get the most out of this book

    Download the example code files

    Download the color images

    Conventions used

    Get in touch

    Reviews

    Introducing the Google Cloud Platform

    ML and the cloud

    The nature of the cloud

    Public cloud

    Managed cloud versus unmanaged cloud

    IaaS versus PaaS versus SaaS

    Costs and pricing

    ML

    Introducing the GCP

    Mapping the GCP

    Getting started with GCP

    Project-based organization

    Creating your first project

    Roles and permissions

    Further reading

    Summary

    Google Compute Engine

    Google Compute Engine

    VMs, disks, images, and snapshots

    Creating a VM

    Google Shell

    Google Cloud Platform SDK

    Gcloud

    Gcloud config

    Accessing your instance with gcloud

    Transferring files with gcloud

    Managing the VM

    IPs

    Setting up a data science stack on the VM

    BOX the ipython console

    Troubleshooting

    Adding GPUs to instances

    Startup scripts and stop scripts

    Resources and further reading

    Summary

    Google Cloud Storage

    Google Cloud Storage

    Box–storage versus drive

    Accessing control lists

    Access and management through the web console

    gsutil

    gsutil cheatsheet

    Advanced gsutil

    Signed URLs

    Creating a bucket in Google Cloud Storage

    Google Storage namespace

    Naming a bucket

    Naming an object

    Creating a bucket

    Google Cloud Storage console

    Google Cloud Storage gsutil

    Life cycle management

    Google Cloud SQL

    Databases supported

    Google Cloud SQL performance and scalability

    Google Cloud SQL security and architecture

    Creating Google Cloud SQL instances

    Summary

    Querying Your Data with BigQuery

    Approaching big data

    Data structuring

    Querying the database

    SQL basics

    Google BigQuery

    BigQuery basics

    Using a graphical web UI

    Visualizing data with Google Data Studio

    Creating reports in Data Studio

    Summary

    Transforming Your Data

    How to clean and prepare the data

    Google Cloud Dataprep

    Exploring Dataprep console

    Removing empty cells

    Replacing incorrect values

    Mismatched values

    Finding outliers in the data

    Visual functionality

    Statistical information

    Removing outliers

    Run Job

    Scale of features

    Min–max normalization

    z score standardization

    Google Cloud Dataflow

    Summary

    Essential Machine Learning

    Applications of machine learning

    Financial services

    Retail industry

    Telecom industry

    Supervised and unsupervised machine learning

    Overview of machine learning techniques

    Objective function in regression

    Linear regression

    Decision tree

    Random forest

    Gradient boosting

    Neural network

    Logistic regression

    Objective function in classification

    Data splitting

    Measuring the accuracy of a model

    Absolute error

    Root mean square error

    The difference between machine learning and deep learning

    Applications of deep learning

    Summary

    Google Machine Learning APIs

    Vision API

    Enabling the API

    Opening an instance

    Creating an instance using Cloud Shell

    Label detection

    Text detection

    Logo detection

    Landmark detection

    Cloud Translation API

    Enabling the API

    Natural Language API

    Speech-to-text API

    Video Intelligence API

    Summary

    Creating ML Applications with Firebase

    Features of Firebase

    Building a web application

    Building a mobile application

    Summary

    Neural Networks with TensorFlow and Keras

    Overview of a neural network

    Setting up Google Cloud Datalab

    Installing and importing the required packages

    Working details of a simple neural network

    Backpropagation

    Implementing a simple neural network in Keras

    Understanding the various loss functions

    Softmax activation

    Building a more complex network in Keras

    Activation functions

    Optimizers

    Increasing the depth of network

    Impact on change in batch size

    Implementing neural networks in TensorFlow

    Using premade estimators

    Creating custom estimators

    Summary

    Evaluating Results with TensorBoard

    Setting up TensorBoard

    Overview of summary operations

    Ways to debug the code

    Setting up TensorBoard from TensorFlow

    Summaries from custom estimator

    Summary

    Optimizing the Model through Hyperparameter Tuning

    The intuition of hyperparameter tuning

    Overview of hyperparameter tuning

    Hyperparameter tuning in Google Cloud

    The model file

    Configuration file

    Setup file

    The __init__ file

    Summary

    Preventing Overfitting with Regularization

    Intuition of over/under fitting

    Reducing overfitting

    Implementing L2 regularization

    Implementing L1 regularization

    Implementing dropout

    Reducing underfitting

    Summary

    Beyond Feedforward Networks – CNN and RNN

    Convolutional neural networks

    Convolution layer

    Rectified Linear Units

    Pooling layers

    Fully connected layer

    Structure of a CNN

    TensorFlow overview

    Handwriting Recognition using CNN and TensorFlow

    Run Python code on Google Cloud Shell

    Recurrent neural network

    Fully recurrent neural networks

    Recursive neural networks

    Hopfield recurrent neural networks

    Elman neural networks

    Long short-term memory networks

    Handwriting Recognition using RNN and TensorFlow

    LSTM on Google Cloud Shell

    Summary

    Time Series with LSTMs

    Introducing time series 

    Classical approach to time series

    Estimation of the trend component

    Estimating the seasonality component

    Time series models

    Autoregressive models

    Moving average models

    Autoregressive moving average model 

    Autoregressive integrated moving average models

    Removing seasonality from a time series

    Analyzing a time series dataset

    Identifying a trend in a time series

    Time series decomposition

    Additive method

    Multiplicative method

    LSTM for time series analysis

    Overview of the time series dataset

    Data scaling

    Data splitting

    Building the model

    Making predictions

    Summary

    Reinforcement Learning

    Reinforcement learning introduction

    Agent-Environment interface

    Markov Decision Process

    Discounted cumulative reward

    Exploration versus exploitation

    Reinforcement learning techniques

    Q-learning

    Temporal difference learning

    Dynamic Programming

    Monte Carlo methods

    Deep Q-Network

    OpenAI Gym

    Cart-Pole system

    Learning phase

    Testing phase

    Summary

    Generative Neural Networks

    Unsupervised learning

    Generative models

    Restricted Boltzmann machine

    Boltzmann machine architecture

    Boltzmann machine disadvantages

    Deep Boltzmann machines

    Autoencoder

    Variational autoencoder

    Generative adversarial network

    Adversarial autoencoder 

    Feature extraction using RBM

    Breast cancer dataset

    Data preparation

    Model fitting

    Autoencoder with Keras

    Load data

    Keras model overview

    Sequential model

    Keras functional API

    Define model architecture

    Magenta

    The NSynth dataset

    Summary

    Chatbots

    Chatbots fundamentals

    Chatbot history

    The imitation game

    Eliza

    Parry

    Jabberwacky

    Dr. Sbaitso

    ALICE

    SmarterChild

    IBM Watson

    Building a bot

    Intents

    Entities

    Context

    Chatbots

    Essential requirements

    The importance of the text

    Word transposition

    Checking a value against a pattern

    Maintaining context

    Chatbots architecture

    Natural language processing

    Natural language understanding

    Google Cloud Dialogflow

    Dialogflow overview

    Basics Dialogflow elements

    Agents

    Intent

    Entity

    Action

    Context

    Building a chatbot with Dialogflow

    Agent creation

    Intent definition

    Summary

    Preface

    Google Cloud ML Engine combines the services of Google Cloud Platform with the power and flexibility of TensorFlow. With this book, you will not only learn how to build and train different complexities of machine learning models at scale, but also to host them in the cloud to make predictions.

    This book is focused on making the most of the Google Machine Learning Platform for large datasets and complex problems. You will learn how to create powerful machine-learning-based applications from scratch for a wide variety of problems by leveraging different data services from the Google Cloud Platform. Applications include NLP, speech-to-text, reinforcement learning, time series, recommender systems, image classification, video content inference, and many others. We will implement a wide variety of deep learning use cases and will also make extensive use of data-related services comprising the Google Cloud Platform ecosystem, such as Firebase, Storage APIs, Datalab, and so forth. This will enable you to integrate machine learning and data processing features into your web and mobile applications.

    By the end of this book, you will be aware of the main difficulties that you may encounter, and be familiar with appropriate strategies to overcome these difficulties and build efficient systems.

    Who this book is for

    This book is for data scientists, machine learning developers, and AI developers who want to learn Google Cloud Platform services to build machine learning applications. Since interaction with the Google ML platform is mostly done via the command line, the reader should have some familiarity with the bash shell and Python scripting. Some understanding of machine learning and data science concepts will also be handy.

    What this book covers

    Chapter 1, Introducing the Google Cloud Platform, explores different services that may be useful to build a machine learning pipeline based on GCP.

    Chapter 2, Google Compute Engine, helps you to create and fully manage your VM via both the online console and command-line tools, as well as how to implement a data science workflow and a Jupyter Notebook workspace.

    Chapter 3, Google Cloud Storage, shows how to upload data and manage it using the services provided by the Google Cloud Platform.

    Chapter 4, Querying Your Data with BigQuery, shows you how to query data from Google Storage and visualize it with Google Data Studio.

    Chapter 5, Transforming Your Data, presents Dataprep, a service useful for preprocessing data, extracting features, and cleaning up records. We also look at Dataflow, a service used to implement streaming and batch processing.

    Chapter 6, Essential Machine Learning, starts our journey into machine learning and deep learning; we learn when to apply each one.

    Chapter 7, Google Machine Learning APIs, teaches us how to use Google Cloud machine learning APIs for image analysis, text and speech processing, translation, and video inference.

    Chapter 8, Creating ML Applications with Firebase, shows how to integrate different GCP services to build a seamless machine-learning-based application, mobile or web-based.

    Chapter 9, Neural Networks with TensorFlow and Keras, gives a good understanding of the structure and key elements of a feedforward network, how to architecture one, and how to tinker and experiment with different parameters.

    Chapter 10, Evaluating Results with TensorBoard, shows how the choice of different parameters and functions impacts the performance of the model.

    Chapter 11, Optimizing the Model through Hyperparameter Tuning, teaches us how to use hypertuning in TensorFlow application code and interpret the results to select the best performing model.

    Chapter 12, Preventing Overfitting with Regularization, shows how to identify overfitting and make our models more robust to previously unseen data by setting the right parameters and defining the proper architectures.

    Chapter 13, Beyond Feedforward Networks – CNN and RNNs, teaches which type of neural network to apply to different problems, and how to define and implement them on GCP.

    Chapter 14, Time Series with LSTMs, shows how to create LSTMs and apply them to time series predictions. We will also understand when LSTMs outperform more standard approaches.

    Chapter 15, Reinforcement Learning, introduces the power of reinforcement learning and shows how to implement a simple use case on GCP.

    Chapter 16, Generative Neural Networks, teaches us how to extract the content generated within the neural net with different types of content—text, images, and sounds.

    Chapter 17, Chatbots, shows how to train a contextual chatbot while implementing it in a real mobile application.

    To get the most out of this book

    In this book, machine learning algorithms are implemented on the Google Cloud Platform. To reproduce the many examples in this book, you need to possess a working account on GCP. We have used Python 2.7 and above to build various applications. In that spirit, we have tried to keep all of the code as friendly and readable as possible. We feel that this will enable our readers to easily understand the code and readily use it in different scenarios.

    Download the example code files

    You can download the example code files for this book from your account at www.packtpub.com. If you purchased this book elsewhere, you can visit www.packtpub.com/support and register to have the files emailed directly to you.

    You can download the code files by following these steps:

    Log in or register at www.packtpub.com.

    Select the SUPPORT tab.

    Click on Code Downloads & Errata.

    Enter the name of the book in the Search box and follow the onscreen instructions.

    Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

    WinRAR/7-Zip for Windows

    Zipeg/iZip/UnRarX for Mac

    7-Zip/PeaZip for Linux

    The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Hands-On-Machine-Learning-on-Google-Cloud-Platform. In case there's an update to the code, it will be updated on the existing GitHub repository.

    We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

    Download the color images

    We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://www.packtpub.com/sites/default/files/downloads/HandsOnMachineLearningonGoogleCloudPlatform_ColorImages.pdf.

    Conventions used

    There are a number of text conventions used throughout this book.

    CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: Where GROUP is a service or an account element and COMMAND is the command to send to the GROUP.

    A block of code is set as follows:

    import matplotlib.patches as patches

    import numpy as np

    fig,ax = plt.subplots(1)

    When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

    text=this is a good text

    from google.cloud.language_v1 import types

    document = types.Document(

            content=text,

            type='PLAIN_TEXT')

    Any command-line input or output is written as follows:

    $ gcloud compute instances list

    Bold: Indicates a new term, an important word, or words that you see onscreen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: Click on Create a new project.

    Warnings or important notes appear like this.

    Tips and tricks appear like this.

    Get in touch

    Feedback from our readers is always welcome.

    General feedback: Email [email protected] and mention the book title in the subject of your message. If you have questions about any aspect of this book, please email us at [email protected].

    Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

    Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

    If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

    Reviews

    Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

    For more information about Packt, please visit packtpub.com.

    Introducing the Google Cloud Platform

    The goal of this first introductory chapter is to give you an overview of the Google Cloud Platform (GCP). We start by explaining why machine learning (ML) and cloud computing go hand in hand as the demand for ever more hungry computing resources grows for today's ML applications. We then proceed with a 360° presentation of the platform's data-related services. Account and project creation as well as role allocation close the chapter.

    A data science project follows a regular set of steps: in extracting the data, exploring, cleaning it, extracting information, training and assessing models, and finally building machine-learning-enabled applications. For each step of the data science flow, there are one or several services in the GCP that are adequate.

    But, before we present the overall mapping of the GCP data-related services, it is important to understand why ML and cloud computing are truly made for each other.

    In this chapter, we will cover the following topics:

    ML and the cloud

    Introducing the GCP

    Data services of the Google platform

    ML and the cloud

    In short, artificial intelligence (AI) requires a lot of computing resources. Cloud computing addresses those concerns.

    ML is a new type of microscope and telescope, allowing each of to us to push the boundaries of human knowledge and human activities. With ever more powerful ML platforms and open tools, we are able to conquer new realms of knowledge and grow new types of businesses. From the comfort of our laptops, at home, or at the office, we can better understand and predict human behavior in a wide range of domains. Think health care, transportation, energy, financial markets, human communication, human-machine interaction, social network dynamics, economic behavior, and nature (astronomy, global warming, or seismic activity). The list of domains affected by the explosion of AI is truly unlimited. The impact on society? Astounding.

    With so many resources available to anyone with an online connection, the barrier to joining the AI revolution has never been lower than it is now. Books, tutorials, MOOCs, and meet-ups, as well as open source libraries in a myriad of languages, are freely available to both the seasoned and the beginner data scientist.

    As veteran data scientists know well, data science is always hungry for more computational resources. Classification on the Iris or the MINST image datasets or predictive modeling on Titanic passengers does not reflect real-world data. Real-world data is by essence dirty, incomplete, noisy, multi-sourced, and more often than not, in large volumes. Exploiting these large datasets requires computational power, storage, CPUs, GPUs, and fast I/O.

    However, more powerful machines are not sufficient to build meaningful ML applications. Grounded in science, data science requires a scientific mindset with concepts such as reproducibility and reviewing. Both aspects are made easier by working with online accessible resources. Sharing datasets and models and exposing results is always more difficult when the data lives on one person's computer. Reproducing results and maintaining models with new data also requires easy accessibility to assets. And as we work on ever more personalized and critical data (for instance in healthcare), privacy and security concerns become all the more important to the project stakeholders.

    This is where the cloud comes in, by offering scalability and accessibility while providing an adequate level of security.

    Before diving into GCP, let's learn a bit more about the cloud.

    The nature of the cloud

    ML projects are resource intensive. From storage to computational power, training models sometimes require resources that cannot be found on a simple standalone computer. Physical limitations in terms of storage have shrunk in recent years. As we now enjoy reliable terabyte storage accessible at reduced prices, storage is no longer an issue for most data projects that are not in the realm of big data. Computing power has also increased so much that what required expensive workstations a few years ago can now run on laptops.

    However, despite all this amazingly rapid evolution, the power of the standalone PC is finite. There is an upper limit to the volume of data you can store on your machine and to the time you're willing to wait to get your model trained. New frontiers in AI, with speech-to-text, video captioning in real time, self-driving cars, music generation, or chatbots that can fool a human being and pass the turing test, require ever larger resources. This is especially true of deep learning models, which are too slow on standard CPUs and require GPU-based machines to train in a reasonable amount of time.

    ML in the cloud does not face these limitations. What you get with cloud computing is direct access to high-performance computing (HPC). Before the cloud (roughly before AWS launched its Elastic Computing Cloud (EC2) service in 2006), HPC was only available via supercomputers, such as the Cray computers. Cray is a US company that has built some of the most powerful supercomputers since the 1960s. China's Tianhe-2 is now the most powerful supercomputer in the world, with a capacity of 100,000 petaflops (that's 10² x 10¹⁵, or 10 to the power of 17 floating-point operations per second!).

    A supercomputer not only costs millions of US dollars but also requires its own physical infrastructure and has huge maintenance costs. It is also out of reach for individuals and for most companies. Engineers and researchers, hungry for HPC, now turn to on-demand cloud infrastructures. Cloud service offers are democratizing access to HPC.

    Computing in the cloud is built on a distributed architecture. The processors are distributed across different servers instead of being aggregated in one single machine. With a few clicks or command lines, anyone can sign up massively complex banks of servers in a matter of minutes. The amount of power at your command can be mind-blowing.

    Cloud computing can not only handle the most demanding optimization tasks but also carry out a simple regression on a tiny dataset. Cloud computing is extremely flexible.

    To recap, cloud computing offers:

    Instantaneity: Resources can be made available in a matter of minutes.

    On-demand: Instances can be put on stand by or decommissioned when no longer needed.

    Diversity: The wide range of operating systems, storage, and database solutions, allow the architect to create project-focused architectures, from simple mobile applications to ML APIs.

    Unlimited resources: If not infinite yet, the volume of resources for storage computing and networks you can assemble is mind-blowing.

    GPUs: Most PCs are based on CPUs (with the exception of machines optimized for gaming). Deep learning requires GPUs to achieve human-compatible speeds for training models. Cloud computing makes GPUs available at a fraction of the cost needed to buy GPU machines.

    Controlled accessibility and security: With granular role definitions, service compartmentalization, encrypted connections, and user-based access control, cloud platforms greatly reduce the risk of intrusion and data loss.

    Apart from these, there are several other types of cloud platforms and offers on the market.

    Public cloud

    There are two main types of cloud models depending on the needs of the customers: public versus private and multi-tenant versus single-tenant. These different cloud types offer different levels of management, security, and pricing.

    A public cloud consists of resources that are located off-site over the internet. In a public cloud, the infrastructure is typically multi-tenant. Multiple customers can share the same underlying hardware or server. Resources such as networking, storage, power, cooling and computing are all shared. The customer usually has no visibility of where this infrastructure is hosted except for choosing a geographic region. The pricing mode of a public cloud service is based on the volume of data, the computing power that is used and other infrastructure-management-related services—or, more precisely, a mix of RAM, vCPUs, disk, and bandwidth.

    In a private cloud, the resources are dedicated to a single customer; the architecture is single-tenant instead of multi-tenant. The servers are located on premise or in a remote data center. Customers own (or rent) the infrastructure and are responsible for maintaining it. Private cloud infrastructures are more expensive to operate as they require dedicated hardware to be secured for a single tenant. Customers of the private cloud have more control over their infrastructure, and therefore they can achieve their compliance and security requirements.

    Hybrid clouds are composed of a mix of public clouds and private ones.

    The GCP is a public multi-tenant cloud platform. You share the servers you use with other customers and let Google handle the support, the data centers, and the infrastructure.

    Managed cloud versus unmanaged cloud

    The cloud market has also diversified into two large segments—managed cloud versus unmanaged cloud.

    In an unmanaged cloud platform, the infrastructure is self-served. In case of failure, it is the responsibility of the customer to have some mechanisms in place to restore the operations. Unmanaged cloud requires the customer to have the qualified expertise and resources to build, manage, and maintain cloud instances and infrastructures. Focused on self-serving applications, unmanaged cloud offers do not include support with their basic tiers.

    In a managed cloud platform, the provider will support the underlying infrastructure by offering monitoring, troubleshooting, and around-the-clock customer service. Managed cloud brings along qualified expertise and resources to the team right away. For many companies, having a service provider to handle their public cloud can be easier and more cost-effective than hiring their own staff to operate their clouds.

    The GCP is a public, multi-tenant, and unmanaged cloud service. So are AWS and Azure. Rackspace, on the other hand, is an example of a managed cloud service company. As an example, Rackspace just started offering managed services for GCP in March 2017.

    IaaS versus

    Enjoying the preview?
    Page 1 of 1