Chap6 (Neural Network)

Neural Networks:
Representation
Non-linear
hypotheses
Machine Learning
Non-linear Classification
x2
x1
size
# bedrooms
# floors
age
Andrew Ng
What is this?
You see this:
But the camera sees this:
Andrew Ng
Computer Vision: Car detection
Cars Not a car
Testing:
What is this?
Andrew Ng
pixel 1
Learning
Algorithm
pixel 2
Raw image
pixel 2
Cars pixel 1
“Non”-Cars Andrew Ng
pixel 1
Learning
Algorithm
pixel 2
Raw image
pixel 2
Cars pixel 1
“Non”-Cars Andrew Ng
pixel 1
Learning
Algorithm
pixel 2
Raw image 50 x 50 pixel images→ 2500 pixels

pixel 2 (7500 if RGB)
pixel 1 intensity
pixel 2 intensity
pixel 2500 intensity
Cars pixel 1 Quadratic features ( ): ≈3 million

“Non”-Cars features
Andrew Ng
Neural Networks:
Representation
Neurons and
the brain
Machine Learning
Neural Networks
Origins: Algorithms that try to mimic the brain.
Was very widely used in 80s and early 90s; popularity
diminished in late 90s.
Recent resurgence: State-of-the-art technique for many
applications
Andrew Ng
The “one learning algorithm” hypothesis
Auditory Cortex
Auditory cortex learns to see
[Roe et al., 1992] Andrew Ng

The “one learning algorithm” hypothesis
Somatosensory Cortex
Somatosensory cortex learns to see
[Metin & Frost, 1989] Andrew Ng

Neural Networks:
Representation
Model
representation I
Machine Learning
Neuron in the brain
Andrew Ng
Neurons in the brain
[Credit: US National Institutes of Health, National Institute on Aging] Andrew Ng

Neuron model: Logistic unit
Sigmoid (logistic) activation function.
Andrew Ng
Neural Network
Layer 1 Layer 2 Layer 3

Andrew Ng
Neural Network
“activation” of unit in layer
matrix of weights controlling
function mapping from layer to
layer
If network has units in layer , units in layer , then

will be of dimension .
Andrew Ng
Neural Networks:
Representation
Model
representation II
Machine Learning
Forward propagation: Vectorized implementation
Add .
Andrew Ng
Neural Network learning its own features
Layer 1 Layer 2 Layer 3
Andrew Ng
Other network architectures
Layer 1 Layer 2 Layer 3 Layer 4
Andrew Ng
Neural Networks:
Representation
Examples and
intuitions I
Machine Learning
Non-linear classification example: XOR/XNOR
, are binary (0 or 1).
x2
x2
x1
x1
Andrew Ng
Simple example: AND 1.0
0 0
0 1
1 0
1 1
Andrew Ng
Example: OR function
-10
20 0 0
20 0 1
1 0
1 1
Andrew Ng
Neural Networks:
Representation
Examples and
intuitions II
Machine Learning
Negation:
0
1
Andrew Ng
Putting it together:
-30 10 -10
20 -20 20
20 -20 20
0 0
0 1
1 0
1 1
Andrew Ng
Neural Network intuition
Andrew Ng
Handwritten digit classification
[Courtesy of Yann LeCun] Andrew Ng

Handwritten digit classification
[Courtesy of Yann LeCun] Andrew Ng

Neural Networks:
Representation
Multi-class
classification
Machine Learning
Andrew Ng
Multiple output units: One-vs-all.
Pedestrian Car Motorcycle Truck
Want , , , etc.
when pedestrian when car when motorcycle
Andrew Ng
Multiple output units: One-vs-all.
Want , , , etc.
when pedestrian when car when motorcycle
Training set:
one of , , ,
pedestrian car motorcycle truck
Andrew Ng
Neural Networks:
Learning
Cost function
Machine Learning
Andrew Ng
Neural Network (Classification)
total no. of layers in network
no. of units (not counting bias unit) in
layer
Binary classification Multi-class classification (K classes)

E.g. , , ,
pedestrian car motorcycle truck
1 output unit K output units
Andrew Ng
Cost function
Logistic regression:
Neural network:
Andrew Ng
Neural Networks:
Learning
Backpropagation
algorithm
Machine Learning
Andrew Ng
Gradient computation
Need code to compute:

-
-
Andrew Ng
Gradient computation
Given one training example ( , ):
Forward propagation:
Andrew Ng
Gradient computation: Error
• Adds a little change 𝛥𝑧!"

• Instead of outputting g(𝑧!" ),
neuron output g(𝑧!" + 𝛥𝑧!" )
• This change causing the
overall cost to change by:
∂# "
" 𝛥𝑧!
∂$!
∂# ∂#
• Define error 𝛿%" = " = " g’(𝑧!" )
∂$! ∂&!
Andrew Ng
Gradient computation: Backpropagation algorithm
Intuition: “error” of node in layer .
For each output unit (layer L = 4)
Andrew Ng
Backpropagation algorithm
Training set
Set (for all ).
For
Set
Perform forward propagation to compute for
Using , compute
Compute
Andrew Ng
Neural Networks:
Learning
Backpropagation
intuition
Machine Learning
Andrew Ng
Forward Propagation
Andrew Ng
Forward Propagation
Andrew Ng
What is backpropagation doing?
Focusing on a single example , , the case of 1 output unit,

and ignoring regularization ( ),
(Think of )
I.e. how well is the network doing on example i?
Andrew Ng
Forward Propagation
“error” of cost for (unit in layer ).

Formally, (for ), where
Andrew Ng
Neural Networks:
Learning
Random
initialization
Machine Learning
Andrew Ng
Initial value of
For gradient descent and advanced optimization
method, need initial value for .
optTheta = fminunc(@costFunction,
initialTheta, options)
Consider gradient descent

Set initialTheta = zeros(n,1) ?
Andrew Ng
Zero initialization
After each update, parameters corresponding to inputs going into each of

two hidden units are identical.
Andrew Ng
Random initialization: Symmetry breaking
Initialize each to a random value in
(i.e. )
E.g.
Theta1 = rand(10,11)*(2*INIT_EPSILON)
- INIT_EPSILON;
Theta2 = rand(1,11)*(2*INIT_EPSILON)
- INIT_EPSILON;
Andrew Ng
Neural Networks:
Learning
Putting it
Machine Learning
together
Andrew Ng
Training a neural network
Pick a network architecture (connectivity pattern between neurons)
No. of input units: Dimension of features

No. output units: Number of classes
Reasonable default: 1 hidden layer, or if >1 hidden layer, have same no. of hidden
units in every layer (usually the more the better)
Andrew Ng
1. Randomly initialize weights
2. Implement forward propagation to get for any
3. Implement code to compute cost function
4. Implement backprop to compute partial derivatives
for i = 1:m
Perform forward propagation and backpropagation using
example
(Get activations and delta terms for ).
Andrew Ng
5. Use gradient checking to compare computed using
backpropagation vs. using numerical estimate of gradient
of .
Then disable gradient checking code.
6. Use gradient descent or advanced optimization method with
backpropagation to try to minimize as a function of
parameters
Andrew Ng
Andrew Ng
Neural Network Libraries
Andrew Ng
Scikit-learn
• https://scikit-
learn.org/stable/modules/generated/sklearn.
neural_network.MLPClassifier.html
Andrew Ng
Keras
• https://keras.io/guides/sequential_model/
• https://keras.io/api/models/model_training_a
pis/
Andrew Ng
Pytorch
• https://pytorch.org/tutorials/beginner/basics/
buildmodel_tutorial.html
Andrew Ng
Colab
• https://colab.research.google.com/
Andrew Ng
Bài tập – scikit-learn
Trong dữ liệu ex7data.mat chưa dữ liệu lưu dưới dạng dict gồm:
X: 5000x400 là 5000 ảnh nhị phân chữ số viết tay có kích thước 20x20
y: 5000x1 là nhãn của các ảnh tương ứng
Các bạn làm các công việc sau:
- Đọc dữ liệu, reshape một số ảnh về kích thước 20x20 rồi show ra màn hình (plt.imshow)
- Chia dữ liệu thành 70% train, 30% test (train_test_split) đảm bảo tính ngẫu nhiên và đồng đều về
nhãn.
- Train một mạng neural network với các thông số sau:
+ 2 lớp ẩn: 100, 50
+ Hàm kích hoạt: sigmoid
+ Learning rate khởi tạo: 0.1
+ Trộn dữ liệu mỗi vòng lặp
+ Phương pháp tối ưu: stochastic gradient descent
+ Hệ số regularization: 0.1
- Show đường cong loss trong quá trình học
- Show độ chính xác trên tập test
Andrew Ng

Chap6 (Neural Network)

Uploaded by

Copyright:

Available Formats

Chap6 (Neural Network)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chap6 (Neural Network)

Uploaded by

Copyright:

Available Formats

Neural Networks:

But the camera sees this:

Cars Not a car

Raw image 50 x 50 pixel images→ 2500 pixels

pixel 2500 intensity

Cars pixel 1 Quadratic features ( ): ≈3 million

Auditory cortex learns to see

[Roe et al., 1992] Andrew Ng

Somatosensory cortex learns to see

[Metin & Frost, 1989] Andrew Ng

[Credit: US National Institutes of Health, National Institute on Aging] Andrew Ng

Sigmoid (logistic) activation function.

Layer 1 Layer 2 Layer 3

If network has units in layer , units in layer , then

Layer 1 Layer 2 Layer 3

Layer 1 Layer 2 Layer 3 Layer 4

Layer 1 Layer 2 Layer 3 Layer 4

[Courtesy of Yann LeCun] Andrew Ng

[Courtesy of Yann LeCun] Andrew Ng

Pedestrian Car Motorcycle Truck

Binary classification Multi-class classification (K classes)

pedestrian car motorcycle truck

1 output unit K output units

Need code to compute:

Layer 1 Layer 2 Layer 3 Layer 4

• Adds a little change 𝛥𝑧!"

For each output unit (layer L = 4)

Layer 1 Layer 2 Layer 3 Layer 4

Focusing on a single example , , the case of 1 output unit,

“error” of cost for (unit in layer ).

Consider gradient descent

After each update, parameters corresponding to inputs going into each of

No. of input units: Dimension of features

You might also like