Chap6 (Neural Network)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 63

Neural Networks:

Representation
Non-linear
hypotheses
Machine Learning
Non-linear Classification

x2

x1
size
# bedrooms
# floors
age

Andrew Ng
What is this?
You see this:

But the camera sees this:

Andrew Ng
Computer Vision: Car detection

Cars Not a car

Testing:

What is this?
Andrew Ng
pixel 1

Learning
Algorithm
pixel 2

Raw image
pixel 2

Cars pixel 1
“Non”-Cars Andrew Ng
pixel 1

Learning
Algorithm
pixel 2

Raw image
pixel 2

Cars pixel 1
“Non”-Cars Andrew Ng
pixel 1

Learning
Algorithm
pixel 2

Raw image 50 x 50 pixel images→ 2500 pixels


pixel 2 (7500 if RGB)

pixel 1 intensity
pixel 2 intensity

pixel 2500 intensity

Cars pixel 1 Quadratic features ( ): ≈3 million


“Non”-Cars features
Andrew Ng
Neural Networks:
Representation
Neurons and
the brain
Machine Learning
Neural Networks
Origins: Algorithms that try to mimic the brain.
Was very widely used in 80s and early 90s; popularity
diminished in late 90s.
Recent resurgence: State-of-the-art technique for many
applications

Andrew Ng
The “one learning algorithm” hypothesis

Auditory Cortex

Auditory cortex learns to see

[Roe et al., 1992] Andrew Ng


The “one learning algorithm” hypothesis

Somatosensory Cortex

Somatosensory cortex learns to see

[Metin & Frost, 1989] Andrew Ng


Neural Networks:
Representation
Model
representation I
Machine Learning
Neuron in the brain

Andrew Ng
Neurons in the brain

[Credit: US National Institutes of Health, National Institute on Aging] Andrew Ng


Neuron model: Logistic unit

Sigmoid (logistic) activation function.

Andrew Ng
Neural Network

Layer 1 Layer 2 Layer 3


Andrew Ng
Neural Network
“activation” of unit in layer
matrix of weights controlling
function mapping from layer to
layer

If network has units in layer , units in layer , then


will be of dimension .
Andrew Ng
Neural Networks:
Representation
Model
representation II
Machine Learning
Forward propagation: Vectorized implementation

Add .

Andrew Ng
Neural Network learning its own features

Layer 1 Layer 2 Layer 3

Andrew Ng
Other network architectures

Layer 1 Layer 2 Layer 3 Layer 4

Andrew Ng
Neural Networks:
Representation
Examples and
intuitions I
Machine Learning
Non-linear classification example: XOR/XNOR
, are binary (0 or 1).

x2
x2

x1

x1

Andrew Ng
Simple example: AND 1.0

0 0
0 1
1 0
1 1

Andrew Ng
Example: OR function

-10

20 0 0
20 0 1
1 0
1 1

Andrew Ng
Neural Networks:
Representation
Examples and
intuitions II
Machine Learning
Negation:

0
1

Andrew Ng
Putting it together:

-30 10 -10

20 -20 20
20 -20 20

0 0
0 1
1 0
1 1

Andrew Ng
Neural Network intuition

Layer 1 Layer 2 Layer 3 Layer 4

Andrew Ng
Handwritten digit classification

[Courtesy of Yann LeCun] Andrew Ng


Handwritten digit classification

[Courtesy of Yann LeCun] Andrew Ng


Neural Networks:
Representation
Multi-class
classification
Machine Learning

Andrew Ng
Multiple output units: One-vs-all.

Pedestrian Car Motorcycle Truck

Want , , , etc.
when pedestrian when car when motorcycle
Andrew Ng
Multiple output units: One-vs-all.

Want , , , etc.
when pedestrian when car when motorcycle
Training set:

one of , , ,
pedestrian car motorcycle truck
Andrew Ng
Neural Networks:
Learning
Cost function
Machine Learning

Andrew Ng
Neural Network (Classification)
total no. of layers in network
no. of units (not counting bias unit) in
layer
Layer 1 Layer 2 Layer 3 Layer 4

Binary classification Multi-class classification (K classes)


E.g. , , ,

pedestrian car motorcycle truck

1 output unit K output units

Andrew Ng
Cost function
Logistic regression:

Neural network:

Andrew Ng
Neural Networks:
Learning
Backpropagation
algorithm
Machine Learning

Andrew Ng
Gradient computation

Need code to compute:


-
-

Andrew Ng
Gradient computation
Given one training example ( , ):
Forward propagation:

Layer 1 Layer 2 Layer 3 Layer 4

Andrew Ng
Gradient computation: Error

• Adds a little change 𝛥𝑧!"


• Instead of outputting g(𝑧!" ),
neuron output g(𝑧!" + 𝛥𝑧!" )
• This change causing the
overall cost to change by:
∂# "
" 𝛥𝑧!
∂$!
∂# ∂#
• Define error 𝛿%" = " = " g’(𝑧!" )
∂$! ∂&!

Andrew Ng
Gradient computation: Backpropagation algorithm
Intuition: “error” of node in layer .

For each output unit (layer L = 4)

Layer 1 Layer 2 Layer 3 Layer 4

Andrew Ng
Backpropagation algorithm
Training set
Set (for all ).
For
Set
Perform forward propagation to compute for
Using , compute
Compute

Andrew Ng
Neural Networks:
Learning
Backpropagation
intuition
Machine Learning

Andrew Ng
Forward Propagation

Andrew Ng
Forward Propagation

Andrew Ng
What is backpropagation doing?

Focusing on a single example , , the case of 1 output unit,


and ignoring regularization ( ),

(Think of )
I.e. how well is the network doing on example i?
Andrew Ng
Forward Propagation

“error” of cost for (unit in layer ).


Formally, (for ), where

Andrew Ng
Neural Networks:
Learning
Random
initialization
Machine Learning

Andrew Ng
Initial value of
For gradient descent and advanced optimization
method, need initial value for .
optTheta = fminunc(@costFunction,
initialTheta, options)

Consider gradient descent


Set initialTheta = zeros(n,1) ?

Andrew Ng
Zero initialization

After each update, parameters corresponding to inputs going into each of


two hidden units are identical.

Andrew Ng
Random initialization: Symmetry breaking
Initialize each to a random value in
(i.e. )

E.g.
Theta1 = rand(10,11)*(2*INIT_EPSILON)
- INIT_EPSILON;

Theta2 = rand(1,11)*(2*INIT_EPSILON)
- INIT_EPSILON;

Andrew Ng
Neural Networks:
Learning
Putting it
Machine Learning
together
Andrew Ng
Training a neural network
Pick a network architecture (connectivity pattern between neurons)

No. of input units: Dimension of features


No. output units: Number of classes
Reasonable default: 1 hidden layer, or if >1 hidden layer, have same no. of hidden
units in every layer (usually the more the better)

Andrew Ng
Training a neural network
1. Randomly initialize weights
2. Implement forward propagation to get for any
3. Implement code to compute cost function
4. Implement backprop to compute partial derivatives
for i = 1:m
Perform forward propagation and backpropagation using
example
(Get activations and delta terms for ).

Andrew Ng
Training a neural network
5. Use gradient checking to compare computed using
backpropagation vs. using numerical estimate of gradient
of .
Then disable gradient checking code.
6. Use gradient descent or advanced optimization method with
backpropagation to try to minimize as a function of
parameters

Andrew Ng
Andrew Ng
Neural Network Libraries

Andrew Ng
Scikit-learn
• https://scikit-
learn.org/stable/modules/generated/sklearn.
neural_network.MLPClassifier.html

Andrew Ng
Keras
• https://keras.io/guides/sequential_model/
• https://keras.io/api/models/model_training_a
pis/

Andrew Ng
Pytorch
• https://pytorch.org/tutorials/beginner/basics/
buildmodel_tutorial.html

Andrew Ng
Colab
• https://colab.research.google.com/

Andrew Ng
Bài tập – scikit-learn
Trong dữ liệu ex7data.mat chưa dữ liệu lưu dưới dạng dict gồm:
X: 5000x400 là 5000 ảnh nhị phân chữ số viết tay có kích thước 20x20
y: 5000x1 là nhãn của các ảnh tương ứng
Các bạn làm các công việc sau:
- Đọc dữ liệu, reshape một số ảnh về kích thước 20x20 rồi show ra màn hình (plt.imshow)
- Chia dữ liệu thành 70% train, 30% test (train_test_split) đảm bảo tính ngẫu nhiên và đồng đều về
nhãn.
- Train một mạng neural network với các thông số sau:
+ 2 lớp ẩn: 100, 50
+ Hàm kích hoạt: sigmoid
+ Learning rate khởi tạo: 0.1
+ Trộn dữ liệu mỗi vòng lặp
+ Phương pháp tối ưu: stochastic gradient descent
+ Hệ số regularization: 0.1
- Show đường cong loss trong quá trình học
- Show độ chính xác trên tập test
Andrew Ng

You might also like