CNN and Autoencoder
CNN and Autoencoder
• A fundamental goal of using CNN with images is to remove cumbersome and ultimately
limiting feature selection process.
• A convolutional neural network is a feed-forward neural network that is generally used to
analyze visual images by processing data with grid-like topology. It’s also known as
a ConvNet. A convolutional neural network is used to detect and classify objects in an
image.
• They can identify faces, individuals, street signs and many other aspects of visual data.
They are good at text and sound analysis.
• CNN works well with self driving cars, drones, robotics and treatment for visually impaired.
• They are good at building positions and rotation invariant features from raw image data.
• They help in building a more robust feature space based on signal.
In deep learning, a convolutional neural network (CNN/ConvNet) is a
class of deep neural networks, most commonly applied to analyze
visual imagery.
Agenda
Consider an example to classify between letters X and O
REPRESENTATION OF AN IMAGE IN CNN
In CNN, every image is represented in the form of an array of pixel values.
Some more examples:
Layers in a Convolutional Neural Network
A convolution neural network has multiple hidden layers that help in extracting information from an
image. The four important layers in CNN are:
1.Convolution layer
2.ReLU layer
3.Pooling layer
Filter size: It can be 3*3, 5*5, 7*7. It is advisable to keep small size filter.
There are several pooling functions such as the average of the rectangular neighborhood, L2 norm of the
rectangular neighborhood, and a weighted average based on the distance from the central pixel. However, the
most popular process is max pooling, which reports the maximum output from the neighborhood.
If we have an activation map of size W x W x D, a pooling kernel of spatial size F, and stride S, then the size of
output volume can be determined by the following formula:
•Several companies, such as Tesla and Uber, are using convolutional neural networks as the computer vision component of a self-
driving car.
•A self-driving car’s computer vision system must be capable of localization, obstacle avoidance, and path planning.
•Let us consider the case of pedestrian detection. A pedestrian is a kind of obstacle which moves. A convolutional neural network must
be able to identify the location of the pedestrian and extrapolate their current motion in order to calculate if a collision is imminent.
•A convolutional neural network for object detection is slightly more complex than a classification model, in that it must not only
classify an object, but also return the four coordinates of its bounding box.
•Furthermore, the convolutional neural network designer must avoid unnecessary false alarms for irrelevant objects, such as litter, but
also take into account the high cost of mis categorizing a true pedestrian and causing a fatal accident.
•A major challenge for this kind of use is collecting labeled training data. Google’s Captcha system is used for authenticating on
websites, where a user is asked to categorize images as fire hydrants, traffic lights, cars, etc. This is actually a useful way to collect
labeled training images for purposes such as self-driving cars and Google Street View.
AUTOENCODERS
Parts:
Traditionally, autoencoders were used for dimensionality reduction or feature learning.
Code size: It represents the number of nodes in the middle layer. Smaller size results in more compression.
Number of nodes per layer: The number of nodes per layer decreases with each subsequent layer of the encoder,
and increases back in the decoder. The decoder is symmetric to the encoder in terms of the layer structure.
Loss function: We either use mean squared error or binary cross-entropy. If the input values are in the range [0, 1]
then we typically use cross-entropy, otherwise, we use the mean squared error.
This is done by balancing two criteria:
Autoencoder types
Undercomplete autoencoders
Regularized autoencoders
Sparse autoencoders
Denoising autoencoder
Contractive autoencoders
51
UNDERCOMPLETE AUTOENCODER
The learning process is described simply as minimizing a loss function L(x, g(f(x))) where:
L is a loss function penalizing g(f(x)) for being dissimilar from x, such as the mean squared
error.
Advantages-
Undercomplete autoencoders do not need any regularization as they maximize the probability of data
rather than copying the input to the output.
Drawbacks-
Using an overparameterized model due to lack of sufficient training data can create overfitting.
Applications
Denoising: input clean image + noise and train to reproduce the clean image.
55
Applications
Image colorization: input black and white and train to produce color images
56