Intro To Deep Learning
Intro To Deep Learning
March 2016
What is Deep Learning?
GPUs and DL
AGENDA DL in practice
Scaling up DL
2
What is Deep Learning?
3
DEEP LEARNING EVERYWHERE
INTERNET & CLOUD MEDICINE & BIOLOGY MEDIA & ENTERTAINMENT SECURITY & DEFENSE AUTONOMOUS MACHINES
Image Classification Cancer Cell Detection Video Captioning Face Detection Pedestrian Detection
Speech Recognition Diabetic Grading Video Search Video Surveillance Lane Tracking
Language Translation Drug Discovery Real Time Translation Satellite Imagery Recognize Traffic Sign
Language Processing
Sentiment Analysis
Recommendation
4
Traditional machine perception
Hand crafted feature extractors
Classifier/
Raw data Feature extraction Result
detector
SVM,
shallow neural net,
HMM,
Speaker ID,
shallow neural net, speech transcription,
Topic classification,
machine translation,
Clustering, HMM,
LDA, LSA sentiment analysis
5
Deep learning approach
Train:
Errors
Dog
MODEL
Dog
Cat
Cat Raccoon
Honey badger
Deploy:
MODEL Dog
6
Artificial neural network
A collection of simple, trainable mathematical units that collectively
learn complex functions
Hidden layers
Given sufficient training data an artificial neural network can approximate very complex
functions mapping raw data to output decisions
7
Artificial neurons
w1 w2 w3
x1 x2 x3
From Stanford cs231n lecture notes
y=F(w1x1+w2x2+w3x3)
F(x)=max(0,x)
8
Deep neural network (dnn)
Raw data Low-level features Mid-level features High-level features
Application components:
Task objective
e.g. Identify face
Training data
10-100M images
Network architecture
~10 layers
1B parameters
Input Result Learning algorithm
~30 Exaflops
~30 GPU days
9
Deep learning benefits
Robust
No need to design the features ahead of time features are automatically learned to
be optimal for the task at hand
Generalizable
The same neural net approach can be used for many different applications and data
types
Scalable
Performance improves with more data, method is massively parallelizable
10
Baidu Deep Speech 2
End-to-end Deep Learning for English and Mandarin Speech Recognition
http://svail.github.io/mandarin/
http://arxiv.org/abs/1512.02595
11
AlphaGo
First Computer Program to Beat a Human Go Professional
13
Deep Learning Synthesis
Texture synthesis and transfer using CNNs. Timo Aila et al., NVIDIA Research
14
THE AI RACE IS ON
IMAGENET
Accuracy Rate
100%
Traditional CV Deep Learning
90%
80%
60%
50%
40%
30%
20%
10% Google Toyota Invests $1B Microsoft & U. Science & Tech, China
Launches TensorFlow in AI Labs Beat Humans on IQ
0%
2009 2010 2011 2012 2013 2014 2015 2016 15
The Big Bang in Machine Learning
Googles AI engine also reflects how the world of computer hardware is changing.
(It) depends on machines equipped with GPUs And it depends on these chips more
than the larger tech universe realizes.
16
GPUs and DL
17
Deep learning development cycle
18
Three Kinds of Networks
19
DNN
Key operation is dense M x V
21
CNN
Requires convolution and M x V
24
13x Faster Training
Caffe
TESLA M40 GPU Server with Reduce Training Time from 13 Days to just 1 Day
Worlds Fastest Accelerator 4x TESLA M40
for Deep Learning Training
0 1 2 3 4 5 6 7 8 9 10 11 12 13
Number of Days
GDDR5 Memory 12 GB
NVIDIA Whitepaper GPU based deep learning inference: A performance and power analysis. 26
DL in practice
27
The Engine of Modern AI
EDUCATION BIG SUR TENSORFLOW WATSON CNTK
TORCH CAFFE
THEANO MATCONVNET
MINERVA MXNET*
SCHULTS
LABORATORIES VITRUVIAN
29
GPU-accelerated Deep Learning Tiled FFT up to 2x faster than FFT
subroutines 2.5x
80
60
40
20
0
cuDNN 1 cuDNN 2 cuDNN 3 cuDNN 4
developer.nvidia.com/cudnn
30
Caffe Performance
6
M40+cuDNN4
CUDA BOOSTS
M40+cuDNN3
DEEP LEARNING
Performance
3
5X IN 2 YEARS
2
K40+cuDNN1
K40
1
0
11/2013 9/2014 7/2015 12/2015
31
NVIDIA DIGITS
Interactive Deep Learning GPU Training System
Process Data Configure DNN Monitor Progress Visualize Layers
Test Image
developer.nvidia.com/digits
32
ONE ARCHITECTURE END-TO-END AI
PC GAMING
33
Scaling DL
34
Scaling Neural Networks
Data Parallelism
W Sync. W
Image 1 Image 2
Machine 1 Machine 2
Notes:
Need to sync model across machines.
Largest models do not fit on one GPU.
Requires P-fold larger batch size.
Works across many nodes parameter server approach linear speedup.
Adam Coates, Brody Huval, Tao Wang, David J. Wu, Andrew Ng and Bryan Catanzaro 35
Multiple GPUs
Near linear scaling data parallel.
Ren Wu et al, Baidu, Deep Image: Scaling up Image Recognition. arXiv 2015 36
Scaling Neural Networks
Model Parallelism
W
Image 1
Machine 1 Machine 2
Notes:
Allows for larger models than fit on one GPU.
Requires much more frequent communication between GPUs.
Most commonly used within a node GPU P2P.
Effective for the fully connected layers.
Adam Coates, Brody Huval, Tao Wang, David J. Wu, Andrew Ng and Bryan Catanzaro 37
Scaling Neural Networks
Hyper Parameter Parallelism
Try many alternative neural networks in parallel on different CPU / GPU / Machines.
Probably the most obvious and effective way!
38
Deep Learning Everywhere
NVIDIA DRIVE PX
NVIDIA Tesla
NVIDIA Jetson
NVIDIA Titan X
Contact: jbarker@nvidia.com
39