Arora 2020

Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

2020 2nd International Conference on Advances in Computing, Communication Control and Networking (ICACCCN)

Diving deep in Deep Convolutional Neural Network


2020 2nd International Conference on Advances in Computing, Communication Control and Networking (ICACCCN) | 978-1-7281-8337-4/20/$31.00 ©2020 IEEE | DOI: 10.1109/ICACCCN51052.2020.9362907

Divya Arora Mehak Garg Megha Gupta


Computer Science Department Computer Science Department Computer Science Department
Guru Gobind Singh Indraprastha Guru Gobind Singh Indraprastha Guru Gobind Singh Indraprastha
University University University
Delhi, India Delhi, India Delhi, India

Abstract—Artificial Neural networks have been proved most Reverse transmission means back propagation and is used to
efficient in Deep Learning mainly because of large number of calculate error between the results of the generated output
datasets it can handle. The most widely used is the Convolutional transmission and the actual sample [4].
Neural Network (CNN). It has been proved useful for computer
vision, pattern recognition and Natural Language Processing Because of the performance of CNN in image processing, the
(NLP). CNN is so vastly used, as, unlike traditional Neural Nets, network usage has increased exponentially. Earlier image
it reduces number of parameters and focus more on domain retrieval systems were based on text; the images were
specific features. There are various CNN architectures proposed, manually analyzed and then indexed accordingly. As the
such as LeNet, AlexNet, GoogleNet. In this paper, we talk about image database increased per user the task became
structure of CNN and all the models of CNN which are proposed burdensome and hard.
till date.
CNN is a feed forward neural network, that’s why it can
Keywords—Artificial Neural Networks, Convolutional Neural
Network, CNN models, deep learning
extract topological properties from image and due to this it can
recognize extreme variable patterns. That’s the reason why
I. INTRODUCTION many researchers have used Deep CNN for image processing
purpose [5].
Deep Convolution Neural Network has now achieved a
state-of-the-art work in the area of computer vision and image CNN is a collection of neurons arranged in an acyclic graph,
recognition. CNN is so successful because of the hidden layers with multiple hidden layer neurons which is connected to a
which are not fully connected to the previous layers [1], and subset of neurons (only to a few not to all) from previous
do layer. This arrangement is done to promote the network to
multiple successive computations between convolution and learn implicitly.
pooling (subsampling layer). Apart from other deep learning
neural networks, CNN is easy to train due to backpropagation Convolutional Neural Net or ConvNet is a class of feed
because they have very sparse connectivity at each layer. A forward, deep artificial neural networks, applied to analyze
linear filter is used for convolution purpose [2]. The visual imagery. CNN was inspired by biological neurons
Convolution Neural Network got its name due to a connectivity found in animal’s visual cortex.
mathematical operation Convolution which means roll
together (use together) two or more mathematical operations Just like other Artificial Neural Networks it has neurons with
such as sigmoid, leaky ReLU or Tanh. learnable weights and bias. Although not every neuron in the
layer is connected with the neuron in previous layer, the
The first ever architecture was proposed by LeCun et al. in neurons receives many inputs, take the weights of sum and
1990. It was designed to predict handwritten digits, but the passes it to an activation function, and produces the desired
architecture was not very successful due to less amount of output.
training data and computing power. In 2012, AlexNet was
proposed by Krizhevsky et al. which successfully reduced the
CNN has shown most efficient responses in image and video
error rate and won ILSVRC 2012. After that CNN became the
processing and recognition, image classifier, medical image
most used Neural Network, it is not only used in object
analysis and natural language processing.
recognition, but also in object tracking [3], pose estimation,
text recognition and many more.
Convolutional Neural Net has four basic layers naming
Convolution Layer, Pooling Layer, Fully Connected Layer and
II. STRUCTURE OF CONVOLUTIONAL
Loss Layer [6]. The detailed description of each layer is as
NEURAL NETWORK
follows:
Convolutional Neural Network, have a multiple hierarchical
network structure. It can have feed forward as well as back A. Convolution Layer
propagation structure. Feed forward propagation implies that It is the basic layer of ConvNet which involves all the
input goes forward in the structure through multiple layers and computational part. The parameters are learnable filters or
is operated with respective activation function and finally the kernels. It gets its name from mathematical term convolve
output is obtained in the output layer. which means combining more than one functions, here also

ISBN:978-1-7281-8337-4/20/$31.00 ©2020 IEEE 749

Authorized licensed use limited to: Raytheon Technologies. Downloaded on May 20,2021 at 08:52:25 UTC from IEEE Xplore. Restrictions apply.
2020 2nd International Conference on Advances in Computing, Communication Control and Networking (ICACCCN)
more the activation functions are combined. The need for E. Activation Functions
convolution is that, suppose an image is of 32X32 pixels and Sigmoid: It is a logistic regression function which squashes
we need pass it to the next layer with 3 neurons, so we have to the number between -1 and +1 and returns the probability of
make 32 X 32 X 3 connections, if we add 2 more neurons the the output being positive.
connections become more complex. And the input can even
be larger than this. So rather than making so many 1
f(x)=
connections if we focus on connecting only local points say 1+𝑒 −𝑥
image than the connections become less [7].
Tanh: tanh squashes the output between -1 and +1.
32X32
tanh(x) = 2f(2x) - 1
Hidden layer
ReLU: ReLU function is mostly used as the basic function for
Neural Nets. It returns 0 if output is negative and if its greater
than zero than returns the number itself.

rect(x) = max(0, x)

Fig. 1. Obtaining local region III. DIFFERENT CNN ARCHITECTURES


B. Pooling Layer After the failure of LeNet, there was no work on CNN for a
long time. In 2012 Alex Krizhevsky et al. proposed a new
It has all the pooling functions like max pooling, stochastic model for CNN and won ILSVRC-2012. Later in 2013 an
pooling, spatial pyramid pooling and spectral pooling. improved model of AlexNet was proposed which won
ILSVRC-2013. After that Google proposed GoogleNet.
The basic architecture has alternate convolution layer and
pooling layer and then the latter layers so as to “reduce spatial A. LeNet
dimensions of activation map”. How many numbers of
convolution layer and pooling layers will the network have LeNet, introduced by LeCun et al., was the first deep
depended on the architecture? For example, the AlexNet has convolution neural net architecture which was used primarily
all the convolution layer stacked together instead of having to recognize handwritten numbers and printed numbers on
alternate convolution layer and pooling layer. cheques digitized in 32X32 grayscale images.

C. Fully Connected Layer The architecture got many new versions in which original
LeNet is always included; the most used is LeNet 5. Total of 7
In this layer the neurons are connected fully with the neurons layers are included which includes an input layer, convolution
of previous layer. Since the neurons are not arranged spatially layer, pooling layer, fully connected layer and an output layer
and all the neurons are connected with previous neurons there [8].
cannot be a convolution layer after a FC Layer.

D. Loss Layer
It calculates the overall loss and error between the actual and
desired output.

Fig. 3. LeNet5 Architecture [9]

Fig. 2. Layers of CNN [6]

750

Authorized licensed use limited to: Raytheon Technologies. Downloaded on May 20,2021 at 08:52:25 UTC from IEEE Xplore. Restrictions apply.
2020 2nd International Conference on Advances in Computing, Communication Control and Networking (ICACCCN)

Dataset used: MNIST handwritten Image Dataset [10]

V. ACKNOWLEDGEMENT
We would like to thank the staff of UIRC, GGSIPU, Delhi for
providing the resources and support pivotal for the research
work of this document. We would also like to extend our
thanks to the varied authors of the research papers used for the
reference for the work on this document and providing valuable
insights into the fields of Neural Nets and its implementations.

VI. REFERENCES

[1] Saad Albawi, Tareq Abed Mohammed, Saad Al-Zawi,


“Understanding a Convolutional Neural Network”, ICET,
2017.
Fig. 4. MNIST Dataset [2] Yanwei Pang, Manli Sun, Xiaoheng Jiang, Xuelong Li,
“Convolution in Convolution for Network in Network”, IEEE
B. AlexNet Transactions on Neural Networks and Learning Systems, vol.
9, No. 5, May 2018.
AlexNet was built by Alex Krizhevsky, Geoffrey Hinton
[3] J. Fan, W. Xu, Y. Wu, and Y. Gong, “Human tracking using
and Ilya Sutskever. It was originally written in CUDA and convolutional neural networks”, IEEE Transactions on Neural
was created for better performance than its previous Networks, vol. 21, 2010.
ConvNet and to measure even larger datasets. It is deeper
[4] Liu Hui, Song Yu-Jie, “Research on face recognition
and contain 8 layers in which 5 are Convolution Layer and algorithm based on improved Convolution Neural Network”,
remaining 3 are Fully Connected Layers [11]. The IEEE
architecture was inspired by LeNet but the difference was Xplore, 2 June 2018.
that all the convolution layers were stacked together [12]. [5] Mr. Pukale D. D., Dr. S. G. Bhirud, Dr. V. D. Katkar,
“Content based image retrieval using Deep Convolution
Neural Network”, IEEE Xplore, 2017.
conv1 Conv2 Conv3 [6] Neena Aloysius, Geetha M, “A Review on Deep Convolution
Kernal size 11 Data 55*55*96 ReLu func NeuralNetworks”,InternationalConferenceon
Communication and Signal Processing, April 6-8, 2017.
[7] O. Russakovsky, J. Deng, H. Su, J. Krause et al., “ImageNet
Conv4 Conv5 Fully Connected Large Scale Visual Recognition Challenge”, 11 April 2015.
Data Pool Layer Layer 1 [8] Shuai Tan, Zhi Tan, “Improved LeNet 5 Model Based on
Handwritten Numerical Model”, The 31st Chinese Control
and
Fully Connected Fully Connected
Decision Conference, 2019.
Layer 2 Layer 3 [9] Yann Lecun, Leon Bottou, Yoshua Bengio, Patrick Hafner,
“Gradient Based Learning Applied to Document
Recognition”, Proceedings of the IEEE, vol. 86, No. 11, 1998.
Fig. 5. AlexNet Layers
[10] Yann Lecun, Corinna Cortes, Christopher J.C. Burges,
C. GoogleNet MNIST Handwritten digits Dataset, [Accessed 1st April 2020],
Available from: http://yann.lecun.com/exdb/mnist/
It was proposed by Google in ILSVRC 2014 and won the
[11] Xu Zhang, Wei Pan, Perry Xiao, “In-Vivo Skin Capacitive
competition. The model used less parameters compared to Image Using AlexNet Convolution Neural Network”, 3rd
AlexNet. It used batch normalization and gave more IEEE Conference on Image, Vision and Computing, 2018.
utilization of computer resources [13]. [12] Alex Krizhevsky, Sutskever I, and Hinton G.E, Imagenet
classification with deep convolutional neural networks. In
IV. CONCLUSION NIPS, 2012.
Conclusion of this paper is that Convolutional Neural [13] S. Ioffe and C. Szegedy, “Batch Normalization: Accelerating
Deep Network Training by Reducing Internal Covariate
Network has proved to be an important tool in Machine Shift”, In the proceedings of The 32nd International
Learning field. Although there still lies open issues like Conference of Machine Learnings, 2015.
reducing the parameters of Fully Connected layer and
reducing the error rate when the network is exposed to
dense data. It has proved that adding more layers improve
the performance but also increase the parameters with each
layer.

751

Authorized licensed use limited to: Raytheon Technologies. Downloaded on May 20,2021 at 08:52:25 UTC from IEEE Xplore. Restrictions apply.

You might also like