Unit 3 Full Notes
Unit 3 Full Notes
Unit 3 Full Notes
Deep learning is based on the branch of machine learning, which is a subset of artificial intelligence.
Since neural networks imitate the human brain and so deep learning will do. In deep learning, nothing
is programmed explicitly. Basically, it is a machine learning class that makes use of numerous
nonlinear processing units so as to perform feature extraction as well as transformation. The output
from each preceding layer is taken as input by each one of the successive layers.
Deep learning models are capable enough to focus on the accurate features themselves by requiring a
little guidance from the programmer and are very helpful in solving out the problem of
dimensionality. Deep learning algorithms are used, especially when we have a huge no of inputs and
outputs.
Since deep learning has been evolved by the machine learning, which itself is a subset of artificial
intelligence and as the idea behind the artificial intelligence is to mimic the human behavior, so same
is "the idea of deep learning to build such algorithm that can mimic the brain".
Deep learning is implemented with the help of Neural Networks, and the idea behind the motivation
of Neural Network is the biological neurons, which is nothing but a brain cell.
Deep learning is a collection of statistical techniques of machine learning for learning feature
hierarchies that are actually based on artificial neural networks.
So basically, deep learning is implemented by the help of deep networks, which are nothing but neural
networks with multiple hidden layers.
Applications:
o Data Compression
o Pattern Recognition
o Computer Vision
o Speech Recognition
AD
Recurrent neural networks are yet another variation of feed-forward networks. Here each of the
neurons present in the hidden layers receives an input with a specific delay in time. The Recurrent
neural network mainly accesses the preceding info of existing iterations. For example, to guess the
succeeding word in any sentence, one must have knowledge about the words that were previously
used. It not only processes the inputs but also shares the length as well as weights crossways time. It
does not let the size of the model to increase with the increase in the input size. However, the only
problem with this recurrent neural network is that it has slow computational speed as well as it does
not contemplate any future input for the current state. It has a problem with reminiscing prior
information.
Applications:
o Machine Translation
o Robot Control
o Speech Synthesis
o Rhythm Learning
o Music Composition
Convolutional Neural Networks are a special kind of neural network mainly used for image
classification, clustering of images and object recognition. DNNs enable unsupervised construction of
hierarchical image representations. To achieve the best accuracy, deep convolutional neural networks
are preferred more than any other neural network.
Applications:
o Image Recognition.
o Video Analysis.
o NLP.
o Anomaly Detection.
o Drug Discovery.
o Checkers Game.
RBMs are yet another variant of Boltzmann Machines. Here the neurons present in the input layer and
the hidden layer encompasses symmetric connections amid them. However, there is no internal
association within the respective layer. But in contrast to RBM, Boltzmann machines do encompass
internal connections inside the hidden layer. These restrictions in BMs helps the model to train
efficiently.
Applications:
o Filtering.
o Feature Learning.
o Classification.
o Risk Detection.
5. Autoencoders
An autoencoder neural network is another kind of unsupervised machine learning algorithm. Here the
number of hidden cells is merely small than that of the input cells. But the number of input cells is
equivalent to the number of output cells. An autoencoder network is trained to display the output
similar to the fed input to force AEs to find common patterns and generalize the data. The
autoencoders are mainly used for the smaller representation of the input. It helps in the reconstruction
of the original data from compressed data. This algorithm is comparatively simple as it only
necessitates the output identical to the input.
Applications:
o Classification.
o Clustering.
o Feature Compression.
AD
o Self-Driving Cars
In self-driven cars, it is able to capture the images around it by processing a huge amount of
data, and then it will decide which actions should be incorporated to take a left or right or
should it stop. So, accordingly, it will decide what actions it should take, which will further
reduce the accidents that happen every year.
o Voice Controlled Assistance
When we talk about voice control assistance, then Siri is the one thing that comes into our
mind. So, you can tell Siri whatever you want it to do it for you, and it will search it for you
and display it for you.
o Automatic Image Caption Generation
Whatever image that you upload, the algorithm will work in such a way that it will generate
caption accordingly. If you say blue colored eye, it will display a blue-colored eye with a
caption at the bottom of the image.
o Automatic Machine Translation
With the help of automatic machine translation, we are able to convert one language into
another with the help of deep learning.
Limitations
Advantages
Disadvantages
Deep learning is based on the concept of artificial neural networks, or computational systems that
mimic the way the human brain functions. And so, our brief history of deep learning must start with
those neural networks.
1943: Warren McCulloch and Walter Pitts create a computational model for neural networks based on
mathematics and algorithms called threshold logic.
1958: Frank Rosenblatt creates the perceptron, an algorithm for pattern recognition based on a two-
layer computer neural network using simple addition and subtraction. He also proposed additional
layers with mathematical notations, but these wouldn’t be realised until 1975.
1980: Kunihiko Fukushima proposes the Neoconitron, a hierarchical, multilayered artificial neural
network that has been used for handwriting recognition and other pattern recognition problems.
1989: Scientists were able to create algorithms that used deep neural networks, but training times for
the systems were measured in days, making them impractical for real-world use.
1992: Juyang Weng publishes Cresceptron, a method for performing 3-D object recognition
automatically from cluttered scenes.
Mid-2000s: The term “deep learning” begins to gain popularity after a paper by Geoffrey Hinton and
Ruslan Salakhutdinov showed how a many-layered neural network could be pre-trained one layer at a
time.
2009: NIPS Workshop on Deep Learning for Speech Recognition discovers that with a large enough
data set, the neural networks don’t need pre-training, and the error rates drop significantly.
2015: Facebook puts deep learning technology – called DeepFace – into operations to automatically
tag and identify Facebook users in photographs. Algorithms perform superior face recognition tasks
using deep networks that take into account 120 million parameters.
2016: Google DeepMind’s algorithm AlphaGo masters the art of the complex board game Go and
beats the professional go player Lee Sedol at a highly publicised tournament in Seoul.
Trojan horses go neural. Every new technology comes with new security threats—and
machine learning is no exception.
Sidney Hough
’s excellent debut TDS post introduces us to an emerging concern around neural Trojan
attacks, where “malicious functionality is embedded into the weights of a neural
network.” It’s a thorough overview of an important topic, and it also invites practitioners
to imagine new defense strategies.
’s new post covers both the mathematical underpinnings and practical aspects of using
backpropagation in the context of neural networks. If you’d like to gain a deeper
understanding of this supervised-learning algorithm, here’s your chance.
Michael Bronstein
does (with coauthors Joan Bruna, Taco Cohen, and Petar Veličković) in a new series that
traces the origins of deep learning all the way back to Greek geometry and 19th-century
mathematicians.
Cassie Kozyrkov
shares her insights on the risks of building a data team on the wrong foundation:
“Research for research’s sake is a high-risk investment and very few companies can
afford it, because getting nothing of value out of it is a very real possibility.”
Gabriele Orlandi
points out in his first TDS article, time-series forecasting has been around for a long
time, and data scientists usually leverage a limited set of established methods to work on
it. The recent emergence of new architectures, however, provides practitioners with an
opportunity to shake things up—and there’s a growing interest in exploring deep
learning’s predictive capabilities.
Deep learning is making inroads in computer graphics, too. Using computers to create
physically correct and realistic 3D images is a complex task.
Limitations and Challenges
The primary drawback of deep learning models is that they only learn from observations. They
therefore only know the information included in the training data. The models won’t learn in a way
that can be generalized if a user just has a limited amount of data or if it originates from a single
source that is not necessarily representative of the larger functional area.
Biases are another significant concern with deep learning algorithms. When a model is trained on
biased data, it will replicate similar biases in its predictions. Deep learning programmers have
struggled with this issue since models learn to distinguish based on minute differences in data pieces.
The crucial factors it decides are frequently implicit to the programmer. Thus, without the
programmer’s knowledge, a facial recognition model may make judgments about a person’s features
based on factors like ethnicity or gender.
Deep learning models may face significant difficulties due to the learning pace. The model will
converge too rapidly if the rate is too high, leading to a less-than-ideal outcome. It may become stuck
in the process and be much more difficult to find a solution if the pace is too low.
Limitations may also result from deep learning models’ hardware specifications. To ensure increased
effectiveness and lower time consumption, multicore high-performing graphics processing units
(GPUs) and other processing units are needed. However, these devices are pricey and consume a lot
of energy. Random access memory, a hard drive (HDD), or a RAM-based solid-state drive are
additional hardware requirements (SSD).
Large volumes of data are necessary for deep learning. Additionally, the more accurate and powerful
models will need more parameters, which calls for more data.
Deep learning models are rigid and incapable of multitasking after they have been trained. Only one
unique problem can they effectively and precisely solve. Even resolving a comparable issue would
need system retraining.
Even with vast amounts of data, existing deep learning approaches cannot handle any application that
needs thinking, like programming or using the scientific method. They are also utterly incapable of
long-term planning and algorithmic-like data manipulation.
A deep neural network (DNN) is an ANN with multiple hidden layers between the input and output
layers. Similar to shallow ANNs, DNNs can model complex non-linear relationships.
The main purpose of a neural network is to receive a set of inputs, perform progressively complex
calculations on them, and give output to solve real world problems like classification. We restrict
ourselves to feed forward neural networks.
In deep learning, the number of hidden layers, mostly non-linear, can be large; say about 1000 layers.
We mostly use the gradient descent method for optimizing the network and minimising the loss
function.
We can use the Imagenet, a repository of millions of digital images to classify a dataset into
categories like cats and dogs. DL nets are increasingly used for dynamic images apart from static ones
and for time series and text analysis.
Training the data sets forms an important part of Deep Learning models. In addition, Backpropagation
is the main algorithm in training DL models.
DL deals with training large neural networks with complex input output transformations.
One example of DL is the mapping of a photo to the name of the person(s) in photo as they do on
social networks and describing a picture with a phrase is another recent application of DL.
Neural networks are functions that have inputs like x1,x2,x3…that are transformed to outputs like
z1,z2,z3 and so on in two (shallow networks) or several intermediate operations also called layers
(deep networks).
The weights and biases change from layer to layer. ‘w’ and ‘v’ are the weights or synapses of layers
of the neural networks.
The best use case of deep learning is the supervised learning problem.Here,we have large set of data
inputs with a desired set of outputs.
The most basic data set of deep learning is the MNIST, a dataset of handwritten digits.
We can train deep a Convolutional Neural Network with Keras to classify images of handwritten
digits from this dataset.
The firing or activation of a neural net classifier produces a score. For example,to classify patients as
sick and healthy,we consider parameters such as height, weight and body temperature, blood pressure
etc.
A high score means patient is sick and a low score means he is healthy.
Each node in output and hidden layers has its own classifiers. The input layer takes inputs and passes
on its scores to the next hidden layer for further activation and this goes on till the output is reached.
This progress from input to output from left to right in the forward direction is called forward
propagation.
Credit assignment path (CAP) in a neural network is the series of transformations starting from the
input to the output. CAPs elaborate probable causal connections between the input and the output.
CAP depth for a given feed forward neural network or the CAP depth is the number of hidden layers
plus one as the output layer is included. For recurrent neural networks, where a signal may propagate
through a layer several times, the CAP depth can be potentially limitless.
Introduction
In one of the early projects, I was working with the Marketing Department of a bank. The Marketing
Director called me for a meeting. The subject said – “Data Science Project”. I was excited, completely
charged and raring to go. I was hoping to get a specific problem, where I could apply my data science
wizardry and benefit my customer.
The meeting started on time. The Director said “Please use all the data we have about our customers
and tell us the insights about our customers, which we don’t know. We really want to use data science
to improve our business.”
Data scientists use a variety of machine learning algorithms to extract actionable insights from the
data they’re provided. The majority of them are supervised learning problems, because you already
know what you are required to predict. The data you are given comes with a lot of details to help you
reach your end goal.
On the other hand, unsupervised learning is a complex challenge. But it’s advantages are numerous. It
has the potential to unlock previously unsolvable problems and has gained a lot of traction in the
machine learning and deep learning community.
I am planning to write a series of articles focused on Unsupervised Deep Learning applications. This
article specifically aims to give you an intuitive introduction to what the topic entails, along with an
application of a real life problem. In the next few articles, I will focus more on the internal workings
of the techniques involved in deep learning.
A typical workflow in a machine learning project is designed in a supervised manner. We tell the
algorithm what to do and what not to do. This generally gives a structure for solving a problem, but it
limits the potential of that algorithm in two ways:
It is bound by the biases in which it is being supervised in. Of course it learns how to
perform that task on its own. But it is prohibited to think of other corner cases that could
occur when solving the problem.
As the learning is supervised – there is a huge manual effort involved in creating the
labels for our algorithm. So fewer the manual labels you create, less is the training that you
can perform for your algorithm.
To solve this issue in an intelligent way, we can use unsupervised learning algorithms. These
algorithms derive insights directly from the data itself, and work as summarizing the data or grouping
it, so that we can use these insights to make data driven decisions.
Let’s take an example to better understand this concept. Let’s say a bank wants to divide its customers
so that they can recommend the right products to them. They can do this in a data-driven way – by
segmenting the customers on the basis of their ages and then deriving insights from these segments.
This would help the bank give better product recommendations to their customers, thus increasing
customer satisfaction.
Case Study of Unsupervised Deep Learning
In this article, we will take a look at a case study of unsupervised learning on unstructured data. As
you might be aware, Deep Learning techniques are usually most impactful where a lot of unstructured
data is present. So we will take an example of Deep Learning being applied to the Image Processing
domain to understand this concept.
I have 2000+ photos in my smartphone right now. If I had been a selfie freak, the photo count would
easily be 10 times more. Sifting through these photos is a nightmare, because every third photo turns
out to be unnecessary and useless for me. I’m sure most of you will be able to relate to my plight!
Ideally, what I would want is an app which organizes the photos in such a manner that I can go
through most of the photos and have a peek at it if I want. This would actually give me context as
such of the different kinds of photos I have right now.
To get a clearer perspective of the problem, I went through my mobile and tried to identify the
categories of the images by myself. Here are the insights I gathered:
First and foremost, I found that one-third of my photo gallery is filled with memes (Thanks to
my lovely friends on WhatsApp).
I personally collect interesting quotes / shares I come across on Reddit. These are mostly
motivational or funny, depending on which subreddit I downloaded them from.
There are at least 200 images I captured, or my colleagues shared, of the famous DataHack
Summit and the subsequent AV outing we had in Kerala
There are a few photos of whiteboard discussions that happen frequently during meetings.
Then there are a few images/screenshots of code tracebacks/bugs that require internal team
discussions. They are a necessary evil that has to be purged after use.
I also found dispersed “private & personal” images, such as selfies, group photos and a few
objects/sceneries. They are few, but they are my prized possessions.
Last but not the least – there were numerous “good morning”, “happy birthday” and
“happy diwali” posts that I desperately want to delete from my gallery. No matter how much
I exterminate them, they just keep coming back!
Now that you know the scenerio, can you think of the different ways to better organize my photos
through an automated algorithm? You can discuss your thoughts on this discussion thread.
In the below sections, we will discuss a few approaches I have come up with to solve this problem.
The upside of this will be that all the events that happened on that day will be stored together. The
downside of this approach is that it is too generic. Each day, I could have photos that are from an
outing, or a motivational quote, etc. Both of them will be mixed together – which defeats the purpose
altogether.
A comparatively better approach would be to arrange the photos based on where they were taken. So,
for example, with each camera click we would capture where the image was taken. Then we can make
folders on the basis of these locations – either country/city wise or locality wise, depending on the
granularity we want. This approach is also being used by most photo apps.
The downside of this approach is the simplistic idea on which it was created. How can we define the
location of a meme, or a cartoon – which takes a fair share of my image gallery? So this approach
lacks ingenuity as well.
Approach 3 – Extract Semantic meaning from the image and use it to define my collection
The approaches we have seen so far were mostly dependent on the metadata that is captured along
with the image. A better way to organize the photos would be to extract semantic information from
the image itself and use that information intelligently.
Let’s break this idea down into parts. Suppose we have a similar variety of photos (as mentioned
above). What trends should our algorithm capture?
This approach is what we call an “unsupervised way” to solve problems. We did not directly define
the outcome that we want. Instead, we trained an algorithm to find those outcomes for us! Our
algorithm summarizes the data in an intelligent manner, and then tries to solve the problem on the
basis of these inferences. Pretty cool, right?
Now you may be wondering – how can we leverage Deep Learning for unsupervised learning
problems?
As we saw in the case study above, by extracting semantic information from the image, we can get a
better view of the similarity of images. Thus, our problem can be formulated as – how can we reduce
the dimensions of an image so that we can reconstruct the image back from these encoded
representations?
Let me give you a high level overview of Auto Encoders. The idea behind using this algorithm is that
you are training it to recreate what it just learnt. But the catch is that it has to use a much smaller
representation phase to recreate it.
For example, an Auto Encoder with encoding set to 10 is trained on images of cats, each of size
100×100. So the input dimension is 10,000, and the Auto Encoder has to represent all this information
in a vector of size 10 (as seen in the image below).
An auto encoder can be logically divided into two parts: an encoder and a decoder. The task of the
encoder is to convert the input to a lower dimensional representation, while the task of the decoder is
to recreate the input from this lower dimensional representation.
This was a very high level overview of auto encoders. In the next article – we will look at them in
more detail.
Note – This is more of a forewarning; but the current state-of-the-art methods still aren’t mature
enough to handle industry level problems with ease. Although research in this field is booming, it
would take a few more years for our algorithms to become “industrially accepted”.
Reinforcement learning
Before we move on, let’s have a look at some of the definitions that you’ll encounter when learning about the
Reinforcement Learning.
Agent - Agent (A) takes actions that affect the environment. Citing an example, the machine learning to play
chess is the agent.
Action - It is the set of all possible operations/moves the agent can make. The agent makes a decision on
which action to take from a set of discrete actions (a).
Environment - All actions that the reinforcement learning agent makes directly affect the environment.
Here, the board of chess is the environment. The environment takes the agent's present state and action as
information and returns the reward to the agent with a new state.
For example, the move made by the bot will either have a negative/positive effect on the whole game and the
arrangement of the board. This will decide the next action and state of the board.
State - A state (S) is a particular situation in which the agent finds itself.
This can be the state of the agent at any intermediate time (t).
Reward (R) - The environment gives feedback by which we determine the validity of the agent’s actions in
each state. It is crucial in the scenario of Reinforcement Learning where we want the machine to learn all by
itself and the only critic that would help it in learning is the feedback/reward it receives.
For example, in a chess game scenario it happens when the bot takes the place of an opponent's piece and
later captures it.
Discount factor - Over time, the discount factor modifies the importance of incentives. Given the
uncertainty of the future it’s better to add variance to the value estimates. Discount factor helps in reducing
the degree to which future rewards affect our value function estimates.
Policy (π) - It decides what action to take in a certain state to maximize the reward.
Value (V)—It measures the optimality of a specific state. It is the expected discounted rewards that the agent
collects following the specific policy.
Q-value or action-value - Q Value is a measure of the overall expected reward if the agent (A) is in state (s)
and takes action (a), and then plays until the end of the episode according to some policy (π).
1. Model-based algorithms
2. Model-free algorithms
Model-based algorithms
Model-based algorithm use the transition and reward function to estimate the optimal policy.
They are used in scenarios where we have complete knowledge of the environment and how it reacts
to different actions.
In Model-based Reinforcement Learning the agent has access to the model of the environment i.e.,
action required to be performed to go from one state to another, probabilities attached, and
corresponding rewards attached.
They allow the reinforcement learning agent to plan ahead by thinking ahead.
Model-free algorithms
Model-free algorithms find the optimal policy with very limited knowledge of the dynamics of the
environment. They do no thave any transition/reward function to judge the best policy.
They estimate the optimal policy directly from experience i.e., interaction between agent and
environment without having any hint of the reward function.
In real-world, we don't have a fixed environment. Self-driving cars have a dynamic environment
with changing traffic conditions, route diversions etc. In such scenarios, Model-free algorithms
outperform other techniques