Neural Networks (Representation) : 1a. Non-Linear Hypothesis

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 11

Neural Networks (Representation)

1. Motivations
I would like to give full credits to the respective authors as these are my personal python notebooks
taken from deep learning courses from Andrew Ng, Data School and Udemy :) This is a simple
python notebook hosted generously through Github Pages that is on my main personal notes
repository on https://github.com/ritchieng/ritchieng.github.io. They are meant for my personal
review but I have open-source my repository of personal notes as a lot of people found it useful.
1a. Non-linear Hypothesis
 You can add more features
o But it will be slow to process
 If you have an image with 50 x 50 pixels (greyscale, not RGB)
o n = 50 x 50 = 2500
o quadratic features = (2500 x 2500) / 2

o
 Neural networks are much better for a complex nonlinear hypothesis
1b. Neurons and the Brain
 Origins
o Algorithms that try to mimic the brain
 Was very widely used in the 80s and early 90’s
o Popularity diminished in the late 90’s
 Recent resurgence
o State-of-the-art techniques for many applications
 The “one learning algorithm” hypothesis
o Auditory cortex handles hearing
 Re-wire to learn to see
o Somatosensory cortex handles feeling
 Re-wire to learn to see
o Plug in data and the brain will learn accordingly
 Examples of learning

2. Neural Networks
2a. Model Representation I
 Neuron in the brain
o Many neurons in our brain
o Dendrite: receive input
o Axon: produce output
 When it sends a m

e
ssage through the Axon to another neuron
 It sends to another neuron’s Dendrite 
 Neuron model: logistic unit
o Yellow circle: body of neuron
o Input wires: dendrites
o Output wire: axon 
 Neural Network
o 3 Layers
 1 Layer: input layer
 2 Layer: hidden layer
 Unable to observe values
 Anything other than input or output layer
 3 Layer: output
layer 
 We calculate each of the layer-2 activations based on the input values with the
bias term (which is equal to 1)
 i.e. x0 to x3
 We then calculate the final hypothesis (i.e. the single node in layer 3)
using exactly the same logic, except in input is not x values, but the
activation values from the preceding layer
 The activation value on each hidden unit (e.g. a12 ) is equal to the sigmoid
function applied to the linear combination of inputs
 Three input units
 Ɵ(1) is the matrix of parameters governing the mapping of the input units
to hidden units
 Ɵ(1) here is a [3 x 4] dimensional matrix
 Three hidden units
 Then Ɵ(2) is the matrix of parameters governing the mapping of the
hidden layer to the output layer
 Ɵ(2) here is a [1 x 4] dimensional matrix (i.e. a row vector)
 Every input/activation goes to every node in following layer
 Which means each “layer transition” uses a matrix of parameters with

the following significance 


 j (first of two subscript numbers)= ranges from 1 to the number
of units in layer l+1
 i (second of two subscript numbers) = ranges from 0 to the
number of units in layer l
 l is the layer you’re moving FROM

 
 Notation 

2a. Model Representation II


 Here we’ll look at how to carry out the computation efficiently through a vectorized implementation.
We’ll also consider why neural networks are good and how we can use them to learn complex non-
linear things
 Forward propagation: vectorized implementation
o g applies sigmoid-function element-wise to z
o This process of calculating H(x) is called forward propagation
 Worked out from the first layer
 Starts off with activations of input unit
 Propagate forward and calculate the activation of each layer
sequentially 

 Similar to logistic regression if you leave out the first layer


o Only second and third layer
o Third layer resembles a logistic regression node
o The features in layer 2 are calculated/learned, not original

features 
o Neural network, learns its own features
 The features a’s are learned from x’s
 It learns its own features to feed into logistic regression
 Better hypothesis than if we were constrained with just x1, x2, x3
 We can have whatever features we want to feed to the final logistic regression
function
 Implemention in Octave for a2
 a2 = sigmoid (Theta1 *

x); 

 Other network architectures


o Layer 2 and 3 are hidden

layers 

2. Neural Network Application


2a. Examples and Intuitions I
 XOR/XNOR
o XOR: or
o XNOR: not

or   

 AND function
o Outputs 1 only if x1 and x2 are 1
o Draw a table to determine if OR or

AND 
 NAND function
o NOT AND 
 OR

function 

2b. Examples and Intuitions II


 NOT function

o
 XNOR function
o NOT XOR
o NOT an exclusive or
 Hence we would want
 AND
Neither 

2c. Multi-class Classification


 Example: identify 4 classes
o You would want a 4 x 1 vector for h_theta(X)
o 4 logistic regression classifiers in the output
layer 

o There will be 4 output


o y would be a 4 x 1 vector instead of an
integer 

You might also like