Neural Networks (Representation) : 1a. Non-Linear Hypothesis
Neural Networks (Representation) : 1a. Non-Linear Hypothesis
Neural Networks (Representation) : 1a. Non-Linear Hypothesis
1. Motivations
I would like to give full credits to the respective authors as these are my personal python notebooks
taken from deep learning courses from Andrew Ng, Data School and Udemy :) This is a simple
python notebook hosted generously through Github Pages that is on my main personal notes
repository on https://github.com/ritchieng/ritchieng.github.io. They are meant for my personal
review but I have open-source my repository of personal notes as a lot of people found it useful.
1a. Non-linear Hypothesis
You can add more features
o But it will be slow to process
If you have an image with 50 x 50 pixels (greyscale, not RGB)
o n = 50 x 50 = 2500
o quadratic features = (2500 x 2500) / 2
o
Neural networks are much better for a complex nonlinear hypothesis
1b. Neurons and the Brain
Origins
o Algorithms that try to mimic the brain
Was very widely used in the 80s and early 90’s
o Popularity diminished in the late 90’s
Recent resurgence
o State-of-the-art techniques for many applications
The “one learning algorithm” hypothesis
o Auditory cortex handles hearing
Re-wire to learn to see
o Somatosensory cortex handles feeling
Re-wire to learn to see
o Plug in data and the brain will learn accordingly
Examples of learning
2. Neural Networks
2a. Model Representation I
Neuron in the brain
o Many neurons in our brain
o Dendrite: receive input
o Axon: produce output
When it sends a m
e
ssage through the Axon to another neuron
It sends to another neuron’s Dendrite
Neuron model: logistic unit
o Yellow circle: body of neuron
o Input wires: dendrites
o Output wire: axon
Neural Network
o 3 Layers
1 Layer: input layer
2 Layer: hidden layer
Unable to observe values
Anything other than input or output layer
3 Layer: output
layer
We calculate each of the layer-2 activations based on the input values with the
bias term (which is equal to 1)
i.e. x0 to x3
We then calculate the final hypothesis (i.e. the single node in layer 3)
using exactly the same logic, except in input is not x values, but the
activation values from the preceding layer
The activation value on each hidden unit (e.g. a12 ) is equal to the sigmoid
function applied to the linear combination of inputs
Three input units
Ɵ(1) is the matrix of parameters governing the mapping of the input units
to hidden units
Ɵ(1) here is a [3 x 4] dimensional matrix
Three hidden units
Then Ɵ(2) is the matrix of parameters governing the mapping of the
hidden layer to the output layer
Ɵ(2) here is a [1 x 4] dimensional matrix (i.e. a row vector)
Every input/activation goes to every node in following layer
Which means each “layer transition” uses a matrix of parameters with
Notation
features
o Neural network, learns its own features
The features a’s are learned from x’s
It learns its own features to feed into logistic regression
Better hypothesis than if we were constrained with just x1, x2, x3
We can have whatever features we want to feed to the final logistic regression
function
Implemention in Octave for a2
a2 = sigmoid (Theta1 *
x);
layers
or
AND function
o Outputs 1 only if x1 and x2 are 1
o Draw a table to determine if OR or
AND
NAND function
o NOT AND
OR
function
o
XNOR function
o NOT XOR
o NOT an exclusive or
Hence we would want
AND
Neither