DL Que
DL Que
DL Que
A list of top frequently asked Deep Learning Interview Questions and answers are given below.
Deep learning is a part of machine learning with an algorithm inspired by the structure and function
of the brain, which is called an artificial neural network. In the mid-1960s, Alexey Grigorevich
Ivakhnenko published the first general, while working on deep learning network. Deep learning is
suited over a range of fields such as computer vision, speech recognition, natural language
processing, etc.
2) What are the main differences between AI, Machine Learning, and Deep
Learning?
AI stands for Artificial Intelligence. It is a technique which enables machines to mimic human
behavior.
Deep learning is a part of Machine learning, which makes the computation of multi-layer
neural networks feasible. It takes advantage of neural networks to simulate human-like
decision making.
https://www.javatpoint.com/deep-learning-interview-questions 2/19
7/19/23, 10:20 PM Top 50 Deep Learning Interview Questions (2023) - javatpoint
AD
Supervised learning is a system in which both input and desired output data are provided.
Input and output data are labeled to provide a learning basis for future data processing.
Unsupervised procedure does not need labeling information explicitly, and the operations
can be carried out without the same. The common unsupervised learning method is cluster
analysis. It is used for exploratory data analysis to find hidden patterns or grouping in data.
Computer vision
Machine translation
Sentiment analysis
Both shallow and deep networks are good enough and capable of approximating any function. But
for the same level of accuracy, deeper networks can be much more efficient in terms of
computation and number of parameters. Deeper networks can create deep representations. At
every layer, the network learns a new, more abstract representation of the input.
https://www.javatpoint.com/deep-learning-interview-questions 3/19
7/19/23, 10:20 PM Top 50 Deep Learning Interview Questions (2023) - javatpoint
Overfitting is the most common issue which occurs in deep learning. It usually occurs when a deep
learning algorithm apprehends the sound of specific data. It also appears when the particular
algorithm is well suitable for the data and shows up when the algorithm or model represents high
variance and low bias.
7) What is Backpropagation?
Backpropagation is a training algorithm which is used for multilayer neural networks. It transfers the
error information from the end of the network to all the weights inside the network. It allows the
efficient computation of the gradient.
It can forward propagation of training data through the network to generate output.
It uses target value and output value to compute error derivative concerning output
activations.
It can backpropagate to compute the derivative of the error concerning output activations in
the previous layer and continue for all hidden layers.
It uses the previously calculated derivatives for output and all hidden layers to calculate the
error derivative concerning weights.
Fourier transform package is highly efficient for analyzing, maintaining, and managing a large
databases. The software is created with a high-quality feature known as the special portrayal. One
can effectively utilize it to generate real-time array data, which is extremely helpful for processing all
categories of signals.
There are several forms and categories available for the particular subject, but the autonomous
pattern represents independent or unspecified mathematical bases which are free from any specific
categorizer or formula.
https://www.javatpoint.com/deep-learning-interview-questions 4/19
7/19/23, 10:20 PM Top 50 Deep Learning Interview Questions (2023) - javatpoint
10) What is the use of Deep learning in today's age, and how is it adding
data scientists?
Deep learning has brought significant changes or revolution in the field of machine learning and
data science. The concept of a complex neural network (CNN) is the main center of attention for
data scientists. It is widely taken because of its advantages in performing next-level machine
learning operations. The advantages of deep learning also include the process of clarifying and
simplifying issues based on an algorithm due to its utmost flexible and adaptable nature. It is one of
the rare procedures which allow the movement of data in independent pathways. Most of the data
scientists are viewing this particular medium as an advanced additive and extended way to the
existing process of machine learning and utilizing the same for solving complex day to day issues.
Tensorflow, Keras, Chainer, Pytorch, Theano & Ecosystem, Caffe2, CNTK, DyNetGensim, DSSTNE,
Gluon, Paddle, Mxnet, BigDL
Deep learning model takes longer time to execute the model. In some cases, it even takes
several days to execute a single model depends on complexity.
The deep learning model is not good for small data sets, and it fails here.
In neural networking, weight initialization is one of the essential factors. A bad weight initialization
prevents a network from learning. On the other side, a good weight initialization helps in giving a
quicker convergence and a better overall error. Biases can be initialized to zero. The standard rule
for setting the weights is to be close to zero without being too small.
Data normalization is an essential preprocessing step, which is used to rescale values to fit in a
specific range. It assures better convergence during backpropagation. In general, data normalization
boils down to subtracting the mean of each data point and dividing by its standard deviation.
https://www.javatpoint.com/deep-learning-interview-questions 5/19
7/19/23, 10:20 PM Top 50 Deep Learning Interview Questions (2023) - javatpoint
If the set of weights in the network is put to a zero, then all the neurons at each layer will start
producing the same output and the same gradients during backpropagation.
As a result, the network cannot learn at all because there is no source of asymmetry between
neurons. That is the reason why we need to add randomness to the weight initialization process.
AD
There are some basic requirements for starting in Deep Learning, which are:
Machine Learning
Mathematics
Python Programming
Auto Encoders
Input Layer
The input layer contains input neurons which send information to the hidden layer.
Hidden Layer
The hidden layer is used to send data to the output layer.
https://www.javatpoint.com/deep-learning-interview-questions 6/19
7/19/23, 10:20 PM Top 50 Deep Learning Interview Questions (2023) - javatpoint
Output Layer
The data is made available at the output layer.
The activation function is used to introduce nonlinearity into the neural network so that it can learn
more complex function. Without the Activation function, the neural network would be only able to
learn function, which is a linear combination of its input data.
Activation function translates the inputs into outputs. The activation function is responsible for
deciding whether a neuron should be activated or not. It makes the decision by calculating the
weighted sum and further adding bias with it. The basic purpose of the activation function is to
introduce non-linearity into the output of a neuron.
Binary Step
Sigmoid
Tanh
ReLU
Leaky ReLU
Softmax
Swish
The binary step function is an activation function, which is usually based on a threshold. If the input
value is above or below a particular threshold limit, the neuron is activated, then it sends the same
signal to the next layer. This function does not allow multi-value outputs.
The sigmoid activation function is also called the logistic function. It is traditionally a trendy
activation function for neural networks. The input data to the function is transformed into a value
between 0.0 and 1.0. Input values that are much larger than 1.0 are transformed to the value 1.0.
Similarly, values that are much smaller than 0.0 are transformed into 0.0. The shape of the function
for all possible inputs is an S-shape from zero up through 0.5 to 1.0. It was the default activation
used on neural networks, in the early 1990s.
https://www.javatpoint.com/deep-learning-interview-questions 7/19
7/19/23, 10:20 PM Top 50 Deep Learning Interview Questions (2023) - javatpoint
AD
The hyperbolic tangent function, also known as tanh for short, is a similar shaped nonlinear
activation function. It provides output values between -1.0 and 1.0. Later in the 1990s and through
the 2000s, this function was preferred over the sigmoid activation function as models. It was easier
to train and often had better predictive performance.
A node or unit which implements the activation function is referred to as a rectified linear
activation unit or ReLU for short. Generally, networks that use the rectifier function for the hidden
layers are referred to as rectified networks.
Adoption of ReLU may easily be considered one of the few milestones in the deep learning
revolution.
The Leaky ReLU (LReLU or LReL) manages the function to allow small negative values when the input
is less than zero.
The softmax function is used to calculate the probability distribution of the event over 'n' different
events. One of the main advantages of using softmax is the output probabilities range. The range
will be between 0 to 1, and the sum of all the probabilities will be equal to one. When the softmax
function is used for multi-classification model, it returns the probabilities of each class, and the
target class will have a high probability.
Swish is a new, self-gated activation function. Researchers at Google discovered the Swish function.
According to their paper, it performs better than ReLU with a similar level of computational
efficiency.
https://www.javatpoint.com/deep-learning-interview-questions 8/19
7/19/23, 10:20 PM Top 50 Deep Learning Interview Questions (2023) - javatpoint
Relu function is the most used activation function. It helps us to solve vanishing gradient problems.
Autoencoder is an artificial neural network. It can learn representation for a set of data without any
supervision. The network automatically learns by copying its input to the output; typically,internet
representation consists of smaller dimensions than the input vector. As a result, they can learn
efficient ways of representing the data. Autoencoder consists of two parts; an encoder tries to fit the
inputs to the internal representation, and a decoder converts the internal state to the outputs.
Dropout is a cheap regulation technique used for reducing overfitting in neural networks. We
randomly drop out a set of nodes at each training step. As a result, we create a different model for
each training case, and all of these models share weights. It's a form of model averaging.
Tensors are nothing but a de facto for representing the data in deep learning. They are just
multidimensional arrays, which allows us to represent the data having higher dimensions. In general,
we deal with high dimensional data sets where dimensions refer to different features present in the
data set.
A Boltzmann machine (also known as stochastic Hopfield network with hidden units) is a type of
recurrent neural network. In a Boltzmann machine, nodes make binary decisions with some bias.
Boltzmann machines can be strung together to create more sophisticated systems such as deep
belief networks. Boltzmann Machines can be used to optimize the solution to a problem.
https://www.javatpoint.com/deep-learning-interview-questions 9/19
7/19/23, 10:20 PM Top 50 Deep Learning Interview Questions (2023) - javatpoint
It consists of stochastic neurons, which include one of the two possible states, either 1 or 0.
The neurons present in this are either in an adaptive state (free state) or clamped state
(frozen state).
The capacity of a deep learning neural network controls the scope of the types of mapping
functions that it can learn. Model capacity can approximate any given function. When there is a
higher model capacity, it means that the larger amount of information can be stored in the network.
A cost function describes us how well the neural network is performing with respect to its given
training sample and the expected output. It may depend on variables such as weights and biases.It
provides the performance of a neural network as a whole. In deep learning, our priority is to
minimize the cost function. That's why we prefer to use the concept of gradient descent.
An optimization algorithm that is used to minimize some function by repeatedly moving in the
direction of steepest descent as specified by the negative of the gradient is known as gradient
descent. It's an iteration algorithm, in every iteration algorithm, we compute the gradient of a cost
function, concerning each parameter and update the parameter of the function via the following
formula:
Where,
α - learning rate,
https://www.javatpoint.com/deep-learning-interview-questions 10/19
7/19/23, 10:20 PM Top 50 Deep Learning Interview Questions (2023) - javatpoint
In machine learning, it is used to update the parameters of our model. Parameters represent the
coefficients in linear regression and weights in neural networks.
Element-wise matrix multiplication is used to take two matrices of the same dimensions. It further
produces another combined matrix with the elements that are a product of corresponding elements
of matrix a and b.
https://www.javatpoint.com/deep-learning-interview-questions 11/19
7/19/23, 10:20 PM Top 50 Deep Learning Interview Questions (2023) - javatpoint
A convolutional neural network, often called CNN, is a feedforward neural network. It uses
convolution in at least one of its layers. The convolutional layer contains a set of filter (kernels). This
filter is sliding across the entire input image, computing the dot product between the weights of the
filter and the input image. As a result of training, the network automatically learns filters that can
detect specific features.
There are four layered concepts that we should understand in CNN (Convolutional Neural Network):
Convolution
This layer comprises of a set of independent filters. All these filters are initialized randomly.
These filters then become our parameters which will be learned by the network subsequently.
ReLU
The ReLu layer is used with the convolutional layer.
Pooling
It reduces the spatial size of the representation to lower the number of parameters and
computation in the network. This layer operates on each feature map independently.
Full Collectedness
Neurons in a completely connected layer have complete connections to all activations in the
previous layer, as seen in regular Neural Networks. Their activations can be easily computed
with a matrix multiplication followed by a bias offset.
RNN stands for Recurrent Neural Networks. These are the artificial neural networks which are
designed to recognize patterns in sequences of data such as handwriting, text, the spoken word,
genomes, and numerical time series data. RNN use backpropagation algorithm for training because
of their internal memory. RNN can remember important things about the input they received, which
enables them to be very precise in predicting what's coming next.
45) What are the issues faced while training in Recurrent Networks?
Recurrent Neural Network uses backpropagation algorithm for training, but it is applied on every
timestamp. It is usually known as Back-propagation Through Time (BTT).
https://www.javatpoint.com/deep-learning-interview-questions 12/19
7/19/23, 10:20 PM Top 50 Deep Learning Interview Questions (2023) - javatpoint
Vanishing Gradient
When we perform Back-propagation, the gradients tend to get smaller and smaller because
we keep on moving backward in the Network. As a result, the neurons in the earlier layer
learn very slowly if we compare it with the neurons in the later layers.Earlier layers are more
valuable because they are responsible for learning and detecting simple patterns. They are
the building blocks of the network.
If they provide improper or inaccurate results, then how can we expect the next layers and
complete network to perform nicely and provide accurate results. The training procedure
tales long, and the prediction accuracy of the model decreases.
Exploding Gradient
Exploding gradients are the main problem when large error gradients accumulate. They
provide result in very large updates to neural network model weights during training.
Gradient Descent process works best when updates are small and controlled. When the
magnitudes of the gradient accumulate, an unstable network is likely to occur. It can cause
poor prediction of results or even a model that reports nothing useful.
LSTM stands for Long short-term memory. It is an artificial RNN (Recurrent Neural Network)
architecture, which is used in the field of deep learning. LSTM has feedback connections which
makes it a "general purpose computer." It can process not only a single data point but also entire
sequences of data.
They are a special kind of RNN which are capable of learning long-term dependencies.
Encoder
The encoder is used to compress the input into a latent space representation. It encodes the
input images as a compressed representation in a reduced dimension. The compressed
images are the distorted version of the original image.
Code
The code layer is used to represent the compressed input which is fed to the decoder.
Decoder
The decoder layer decodes the encoded image back to its original dimension. The decoded
image is a reduced reconstruction of the original image. It is automatically reconstructed
from the latent space representation.
https://www.javatpoint.com/deep-learning-interview-questions 13/19
7/19/23, 10:20 PM Top 50 Deep Learning Interview Questions (2023) - javatpoint
Deep Autoencoder is the extension of the simple Autoencoder. The first layer present in
DeepAutoencoder is responsible for first-order functions in the raw input. The second layer is
responsible for second-order functions corresponding to patterns in the appearance of first-order
functions. Deeper layers which are available in the Deep Autoencoder tend to learn even high-order
features.
The other combination of four or five layers makes up the decoding half.
49) What are the three steps to developing the necessary assumption
structure in Deep learning?
The first step contains algorithm development. This particular process is lengthy.
The second step contains algorithm analyzing, which represents the in-process methodology.
The third step is about implementing the general algorithm in the final procedure. The entire
framework is interlinked and required for throughout the process.
A perceptron is a neural network unit (an artificial neuron) that does certain computations to detect
features. It is an algorithm for supervised learning of binary classifiers. This algorithm is used to
enable neurons to learn and processes elements in the training set one at a time.
https://www.javatpoint.com/deep-learning-interview-questions 14/19
7/19/23, 10:20 PM Top 50 Deep Learning Interview Questions (2023) - javatpoint
Single-Layer Perceptron
Single layer perceptrons can learn only linearly separable patterns.
Multilayer Perceptrons
Multilayer perceptrons or feedforward neural networks with two or more layers have the
higher processing power.
https://www.javatpoint.com/deep-learning-interview-questions 15/19