Image Recognition With Deep Learning

ICIIBMS 2018, Track 2: Artificial Intelligent, Robotics, and Human-Computer Interaction, Bangkok, Thailand

Image Recognition with Deep Learning

Md Tohidul Islam B.M. Nafiz Karim Siddique

Department of Computer Science and Engineering Department of Computer Science and Engineering
East West University East West University
Dhaka, Bangladesh Dhaka, Bangladesh
[email protected] [email protected]

Sagidur Rahman Taskeed Jabid

Department of Computer Science and Engineering Department of Computer Science and Engineering
East West University East West University
Dhaka, Bangladesh Dhaka, Bangladesh
[email protected] [email protected]

food recognition from image. Food recognition is a

challenging task since there is large similarity between food
Abstract—Image recognition is one of the most important classes. Besides any image recognition also needs more
fields of image processing and computer vision. Food image computation power than most of the text base data
classification is an unique branch of image recognition classification. But to benefit from food recognition model
problem. In modern days people are more conscious about people should be able use it in a less expensive device. In
their health. A system that can classify food from image is present day there are affordable smartphones with high
necessary for a dietary assessment system. Classification of
computation power than can process high quality image data.
food images is very challenging since the dataset of food images
So the model described in this paper can be implemented in
is highly non-linear. In this paper we proposed a method that
can classify food categories with images. We used
convolutional neural network to classify food images. The Food classification from image is a challenging task
CNNs are a very effective class of neural networks that is because same class of foods may have many differences in
highly effective at the task of image classifying, object detection their image. In [2] they implemented k-nearest neighbor
and other computer vision problems. We classified a food algorithm and vocabulary tree algorithm to classify 42 food
dataset consisting different food categories with 16643 images. categories with 1453 images. To measure the distance they
We obtained an accuracy of 92.86% in our experiment.
chose L1-norm for the Stochastic Coordinate Descent
Keywords—Deep Learning, CNN, Computer Vision, Image
(SCD), EFD and GFD features and the Euclidean distance
Processin (L2-norm) for the Dual Coordinate Descent (DCD) feature.
They got 84.2 % top 4 accuracy and 64.5 % top 1 accuracy
with combinations of DCD, MDSIFT, SCD, SIFT features.
In [3] they proposed a method that applies SIFT and
I. INTRODUCTION LBP (Local Binary Pattern) features in SVM classifier with
In the present day obesity and other health problems are PFI dataset. SIFT feature is used detect and describe local
rising day by day. According to [1] obesity doubled since features in images and LBP is a type of visual descriptor
1980 in more than 70 countries. The obesity may lead to and was used because it has many advantages. LBP is
other types of chronic disease such as heart disease, diabetes, simple to compute, immune to illumination changes.
arthritis, etc. There should be given more importance to the
nutritional value of food to prevent these diseases. Dietary In [4] they proposed a method that classifies food
management is the key to regularize the food habits of images with sphere shaped support vector machine. The
people. It will help people with dietary management if they support vector machine can efficiently perform non-linear
know the information about the food they are eating. To get classification with kernel trick. They applied this method to
information about the food a system is required that will FoodLog dataset which consisting of 6512 images. They
detect the food from image and then analyze the dietary and used FCM algorithm to classify food images. FCM is
calorie information. In this paper the researchers mainly similar to k-means clustering algorithm. In FCM at first
concerned with the detection of food from images. coefficients are assigned randomly to each data point for
being in the clusters then centroid for each cluster is
Image processing and computer vision techniques are computed and for each data point coefficients is computed
now being applied in many domains. One of the domains is of being in the clusters and until convergence these steps are

repeated. After applying FCM to segment food images they divided into three parts: training set with 9866 images,
used sphere shaped support vector machine (SVM) to validation set with 3430 images and evaluation set with
classify segmented images. In this experiment they got an 3347 images.
accuracy of 95%.
The authors of [5] used random forests classifier to
classify food-101 dataset. Food-101 is a large dataset with B. Image Pre-processing
101 categories. The researchers got 50.76% of accuracy We applied some image pre-processing technique to
with the RFDC approach. increase efficiency to our system by speeding up training
In recent literatures there are also several approaches time and also . The pre-processing such as random rotation
that used deep convolution neural network to classify food and horizontal flips help convolutional neural network
images. Since convolutional neural networks are scalable models to be insensitive of the exact position of the object
for large datasets it is more suitable to use CNNs for food in the images. The ZCA whitening reduces the redundancy
image classification. Deep learning was used in [6] to in the matrix of pixel images and highlights the structures
classify UEC-256 food image for dietary assessment system and features of the images to the convolutional neural
using digital device. They obtained a top-1 accuracy of network. First we resized all our images to 299 x 299 x 3 to
54.7 % and 81.5% accuracy of top-5. increase processing time and also to fit in Inception V3.
After that we applied following image pre-processing
A Pre-trained DNN was applied in [7]. The deep CNN techniques.
was pre-trained on ImageNet with 1000 food related
categories than fine tuned to classify UEC-FOOD100  We set input mean to 0 over the dataset.
dataset. They achieved 78.77% top-1 accuracy.
 We also set each sample mean to 0.
In [8] they used GoogLeNet to classify Thai fast food
images in TFF food dataset. They achieved 88.33%  Then we divided inputs by standard deviation
accuracy for 11 classes. In [9] they implemented and of the dataset.
compared several convolutional neural network models with  We also ensured that each input is divided its
food-11 dataset. They got 70.12% with their proposed standard deviation.
approach, 80.51% with Caffenet and 82.07% accuracy with
Alexnet.  We applied ZCA whitening.
In our paper we implemented deep learning with  We randomly rotated images in the range from
convolution neural network to classify food-11 dataset and 0 to 180 degrees to make our learning model
we got an accuracy of 92.86%. invariant to the object location in the image.
 Randomly shift images horizontally.
 Randomly shift images vertically.
We used convolutional neural network in our approach
to classify food images. The convolutional neural network is  Randomly flip images.
a category of neural network that has been proven very
efficient in image classification. The convolutional neural C. Deep Learning
network learns the filters that in traditional algorithms were
Our research utilizes the inception V3[10] model. The
hand-engineered. In our method we used an inception v3
architecture of the model is given in figure 2.
[10] model pre-trained with ImageNet [11]. The method of
our task classifying food images consisting of four The layers of the inception V3 model are:
 Convolution Layer: At the beginning
 Select a food image dataset convolution layer with input size 299 x 299 x 3
to create feature maps by convolving input
 Image Pre-processing images.
 Train dataset using deep learning algorithm  Max Pooling Layer: Max-pooling is a sample-
 Classification of food images based discretization process. Max pooling is
done by applying a max filter to non-
The figure 1 depicts the methodology of our objective. overlapping sub regions of the input matrices.
Max-pooling extracts the most important
A. Input Image features like vertical edgesand horizontal edges
We used the Food-11 dataset for our research. The [12].
dataset was created by the authors who proposed [7]. The
dataset consists of 16643 images grouped into 11 major
food categories. The 11 food categories are Bread, Dairy
products, Dessert, Egg, Fried food, Meat, Pasta, Rice,
Seafood, Soup, and Vegetables. The food images are

Fig. 1. Methodology of food image classification

Fig. 2. Typical CNN Architecture.

Fig. 3. Inception V3 Architecture [10].

Model Accuracy
Proposed Approach 92.86%
 Average Pooling Layer: Average pooling layer
reduces the variance and complexity in the data. It
also divides the input into rectangular pooling
regions and computing the average values of each
matrix to downsample the input features [10].
 Concat Layer: The Concat layer concatenates its
multiple input blobs to one single output blob
 Dropout Layer: The dropout layer randomly
drops elements from a layer in the neural network.
Dropout is a technique used to improve over-fit
on neural networks [10].
 Fully Connected Layer: The fully connected (FC)
layer in the CNN represents the feature vector for
the input. This feature vector holds information
that is vital to the input [10].
 Softmax Layer: The softmax assigns decimal Fig. 4. Plot of model accuracy on training and validation datasets.
probabilities to each class in a multi-class
recognition problem. Those decimal probabilities We applied the models described in [9]. We see that Fine
must add up to 1.0. This additional constraint tuned Alexnet [9] has 82.23% and Caffenet [9] has 80.12 % of
make training to converge more quickly [10]. accuracy with our dataset. The proposed approach which uses
transfer learning technique with inception V3 convolution
D. Clasificaation neural network has an accuracy of 92.86%. The transfer
learning uses the knowledge earned from previous learning in
We trained our dataset of SGD [13] optimizer with initial
new training dataset to classify images that is why our
learning rate of 0.01 and 0.9 momentum. We used a learning
proposed approach has a better accuracy.
scheduler to set learning rate to 0.002 after 15 epochs and
0.0004 after 28 epochs.


This section describes the results found in our experiments.

A. Evaluation of model
Our dataset was divided into three parts: training,
validation and evaluation. We used training and validation
parts of the dataset while training the model and we used
evaluation part of our dataset during the evaluation of our
model. We resized the images of evaluation part in 299 x x299
x 3We evaluated the accuracy of the model by true positive
(TP), true negative (TN), false positive (FP) and false negative
(FN) after classification.

Fig. 5. Plot of model loss on training and validation datasets.

Though the accuracy is good from the accuracy curve in

B. Obtained Results figure 3 we see that there is some gap between training and
validation accuracy. This indicates that there was some
The results obtained by running different models with overfitting but the gap gradually got smaller after some epochs.
food-11 dataset are given in Table 1. From loss curve in figure 4 we see that the loss 1.2 to 0.3.


Model Accuracy In this paper we tried to use convolution neural network to
CNN1[9] 82.23% classify food images. The convolutional neural networks use a
variation of multilayer perceptrons and require minimal pre-
CNN2[9] 80.12%
processing. From our experiment we see that convolution

neural network works very well in food classification task. Vision – ECCV 2014 (Cham) (David Fleet, Tomas Pajdla, Bernt Schiele,
However the there can be still some improvement to reduce and Tinne Tuytelaars, eds.), Springer International Publishing, 2014, pp.
the gap between training and test accuracy in our model. Other
[6] Chang Liu, Yu Cao, Yan Luo, Guanling Chen, Vinod Vokkarane, and
neural network models such as recurrent neural network, Yunsheng Ma, Deepfood: Deep learning-based food image recognition
dilated convolutional neural network, etc can be applied to for computer-aided dietary assessment, CoRR abs/1606.05675 (2016).
classify food images. Convolution neural network models take [7] K. Yanai and Y. Kawano, "Food image recognition using deep
time for computation but once the model is trained it can be convolutional network with pre-training and fine-tuning," 2015 IEEE
easily used for classification. We can also improve our International Conference on Multimedia & Expo Workshops (ICMEW),
research by applying feature based models which will take less Turin, 2015, pp. 1-6.
computational time. We can also apply the different machine [8] N. Hnoohom and S. Yuenyong, "Thai fast food image classification
using deep learning," 2018 International ECTI Northern Section
learning algorithms with larger datasets with more categories. Conference on Electrical, Electronics, Computer and
Telecommunications Engineering (ECTI-NCON), Chiang Rai, 2018, pp.
[9] G. Özsert Yi̇ ği̇ t and B. M. Özyildirim, "Comparison of convolutional
