10 1109@FSKD 2018 8687165

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

2018 14th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD)

Automatic Classification of Chinese Herbal Based on


Deep Learning Method
Shupeng Liu 1 Weiyang Chen 1,* Xiangjun Dong1,*
College of Information, Qilu College of Information, Qilu College of Information, Qilu
University of Technology University of Technology University of Technology
(Shandong Academy of (Shandong Academy of (Shandong Academy of
Sciences), Jinan 250353, China Sciences), Jinan 250353, China Sciences), Jinan 250353, China
[email protected] [email protected]

Abstract—In today's society, people's living standards are getting Before the people have done a lot of work on the
better and better. At the same time, many problems have also classification, and have achieved quite good results. Pushpa BR
appeared in the diet, which has led to an increase in the incidence et al. [1] proposed a computer-aided plant species recognition
of diseases. Chinese herbal medicine has been widely used in the technology (CAPSI) method based on leaf image matching
technology, which extracted different biological characteristics,
treatment of many diseases. But it is a problem for the collection
such as diameter, long axis, short axis, area, and perimeter and
and classification of Chinese herbal medicines. There are a wide
leaf image aspect ratio. Sandeep Kumar E et al. [2] proposed a
variety of Chinese herbal medicine plants, and there are also design methodology system that gives recognition of medicinal
some Chinese herbal medicine plants that look very similar. Even plants based on their edge characteristics. Color images are
a taxonomist can hardly distinguish every herbal medicine, let converted to their grayscale images and calculated from the
alone for beginners. So we designs a method to automatically grayscale image edge histogram. The Canny edge detection
identify and classify Chinese herbal medicines by processing algorithm is implemented in this work. However, this method
images and deep learning method, which can greatly reduce the is limited to mature leaves, because the tender leaves change
workload, and improve the efficiency of work. The technology of slightly with maturity. Subsequently, in 2013 Charles et al. [3]
Chinese herbal medicine recognition and identification based on described a method that can improve the recognition rate under
image processing and deep learning method can effectively the conditions of small training set size by extracting the shape,
overcome the shortcomings of manual recognition that require
texture, and edge features of leaf, and then combining multiple
features together using a K-nearest neighbor classifier
rich experience. At present, deep learning is more and more
combined with each feature vector, and increasing the
popular, especially for image classification, so we use GoogLeNet recognition rate through density estimation. Kue-Bum Lee et
to classify 50 kinds of Chinese herbal medicine by their images al. [4] proposed a method to extract leaf vein features from leaf
under natural conditions with complex backgrounds. And the contours. First, they transformed the color image into gray
method achieved good performance. TOP-1 achieved an accuracy level and then converted it into binary image to extract the
of 62.8%, and TOP-5 achieved an accuracy of 89.4%. contour of the leaf. In order to extract the veins of the leaves,
Key words: deep learning, image processing, Chinese herbal an open operation is performed on a gray image, and gets the
medicine difference between the final image and the gray image, so as to
obtain the feature. Abdolvahab Ehsanirad et al. [5] proposed a
I. INTRODUCTION method to extract leaf texture features for classification. There
are two different ways, PCA and gray level co-occurrence
Chinese herbal medicine is the gem of the traditional matrix (GLCM), with an accuracy rate of 78%. Abdul Kadir et
culture of the Chinese nation for 5000 years. It is the al. [6] proposed a method of plants recognition, they use the
crystallization of medical practice for thousands of years and morphological opening to extract the texture, shape, and color
the essence of the excellent culture of the world. With the features of the leaves, and using probabilistic neural network to
continuous improvement of people's living standards and the classify plants. Many other people [7-11], use low-level
emergence of new health concepts, the demand for Chinese features such as color, shape, and texture to classify plants.
medicine in the domestic market has grown rapidly and the Zhang et al. [12] applied supervised local projection analysis to
competitiveness has been strengthened. The development of classify plant leaves and obtained good classification results.
Chinese medicine is facing tremendous challenges and Unger et al. [13] used a support vector machine (SVM) with
opportunities. At present, the collection and classification of Fourier characteristics and morphometric methods to identify
Chinese herbal medicine still has many problems, classification species in two test sets, one with 26 species and the other with
is a very important task, because if the classification is wrong, 17 species. In each case, 10 images per species with
it may lead to directly reduce the efficacy. Therefore, we corresponding accuracy of 73.21% and 84%. Luo Dehan et al.
require that the accuracy of the classification be improved as [14] also used the two algorithms, PCA and SVM, and made a
much as possible to avoid errors in the subsequent use. corresponding comparison. It is found that the SVM can get

978-1-5386-8097-1/18/$31.00 ©2018 IEEE 235


better results. In 2016, Maolin Wang et al. [15] used the SOM of the model, and multiple processes were running at the same
algorithm to classify Chinese herbal medicine image. Although time, eventually reaching the 6.7% error rate, and then won the
quite good results were achieved, the number of images was first place in the ImageNet Challenge. In 2015, Kaiming He
relatively small, and there are not many types of images. [21] and his team proposed the architecture of ResNet
(Residual Network). ResNet is a 152-tier network architecture
In above studies, the used pictures have a very clean that has reached an unprecedented depth. In addition, ResNet's
background, and the results can be better if the types and the performance has also broken the record of ILSVRC2015,
total number of pictures are relatively small. But this does not reaching an incredible 3.6% (usually professionals can only
comply with our actual situation. Usually the plants would be reach 5 to 10% error rate).
influenced by many factors, such as noise, light, or the
background is doped with other plants, all of these factors Due to the excellent performance of deep learning in image
make it even more difficult for us to classify them. Now, the processing, people began to use deep learning to process
classification of several Chinese herbal medicines alone has not images in different fields. In 2016, Xin Sun et al. [22] used the
met our needs, because there are a large number of Chinese VGG-16 model to process a large number of Chinese herbal
herbal medicine species, we need to classify more kinds and medicine images. There are altogether 95 categories, and the
quantities of Chinese herbal medicine. Recently, deep learning number of pictures has reached more than 6,000, far beyond
has been developed and has achieved good results in various the previous research using the amount of data, its overall
fields, especially in terms of image classification, and it has accuracy rate reached 71%. In 2017, Jose Carranza-Rojas et al.
achieved the state-of-the-art performance. [23] used the GoogLeNet to deal with plant specimens and
fine-tuned the original model, the number of pictures has
The concept of neural network was proposed by W.S. reached unprecedented levels, and multiple databases have
McCilloch and W. Pitts. And it started people's research on been used for different data sets to perform experiments.
artificial neural networks. In 1958, F. Rosenblatt proposed
Perceptron artificial neural network model, the first time MCP
was used for machine learning (classification). In 1969, II. ALGORITHM
Minsky proved that the perceptron could only deal with linear Since previous studies have more or less problems, for
classification problems, from which the study of the neural example, when the accuracy rate is high,the number of types of
network also stagnated. In 1986, Geoffrey Hinton invented the data they used and the total number of pictures are relatively
BP (Backpropagation) algorithm for multi-layer perceptron small, and the used pictures all have a clean background. As
(MLP), and adopted Sigmoid for nonlinear mapping, we know that the clean background does not exist in our actual
effectively solving the problem of nonlinear classification and life, so this does not meet our requirements. In addition,when
learning, and later it is found that there is a gradient their data types and the total number of pictures increases,the
disappearance problem in the BP algorithm, which directly accuracy rate decreases. In this article we designed method to
hinders the further development of deep learning. classify the actual pictures use GoogLeNet,because the
In 2011, the ReLU activation function was proposed, which parameters are less.
can effectively suppress the disappearance of gradient. In 2012, First, we collect actual images. The Plant Photo Bank of
Hinton's research team demonstrated the potential of deep China (PPBC) included 475 subjects and 4729 genera and
learning. Hinton and his student Alex Krizhevsk [16] 3729195 images of various types. We selected a total of 50
participated in the ImageNet, image recognition competition species and 8500 Chinese herbal plant images, and most of
for the first time. Through the construction of the CNN them are pictures taken in the natural state. There are more
network AlexNet won the visual field competition (ImageNet complicated backgrounds. Only a few pictures have a relatively
Large Scale ILSVRC 2012 Visual Recognition Challenge) of clean background. Figure 1 shows some of the pictures.
the crown, in the order of millions of ImageNet data sets, the
effect far exceeds the traditional method, from the traditional
more than 70% to more than 80%. Since then, deep learning
has ignited an upsurge. After that, ILSVRC has frequently been
refreshed by Deep Learning. The network model is getting
deeper and wider, and the effect has been better for a year and
more. The winner of 2013 ILSVRC competition is Matthew
Zeiler [17] and Rob Fergus, the designed the model is called
ZF Net, it is mainly through Deconvnet to visualization the
convolution network, to understand and adjust the convolution
network, this method reduces the error rate to 11.2%. In the
2014 ImageNet competition, The VGG(Visual Geometry et al
Group) model designed by K. Simonyan et al. [18] was ranked
second in the ImageNet Challenge (ILSVRC14), where he
deepened the depth of the network and limited the error rate to
7.3%. At the same time C. Szegedy et al. [19] made extensive
use of the NIN (Network in network) structure of M. Lin [20].
Fig 1. Example of herbal medicine images. We list 5 medicine categories,
The AlexNet model is developed based on GoogLeNet. It uses whose corresponding names are given below.
a 22-layer model to enhance and improve the depth and width

236
In this paper, we use the GoogLeNet in caffe, the main use
of the Inception module [19] model, this model uses 3 1*1
convolution kernel and a pool layer respectively on the front
layer operation, then the operation after the convolution of two
1*1 the results of a convolution, convolution layer using 3*3
and 5*5, while using 1*1 convolution of pool layer, which
increases the width of the network, they will finally come
together and then the next step operation. This inception
module improves the performance and does not increase the
amount of calculation, and the effect is very excellent. At the
same time, in order to avoid the problem of gradient
disappearance, the network added 2 additional auxiliary
softmax for the forward gradient.
While the image is converted to LMDB format, we convert
the picture size to 256*256.

Fig 3. The change in the accuracy of Stepsize at 7000


It can be seen from Fig. 3 that the accuracy is basically the
same after iteration to a certain number of times, and there will
be no more big changes. Then we modified Stepsize, and the
learning rate has been updated 5-10 times, and similar
conclusions have been obtained. Fig. 4 shows the final
accuracy we obtained with different Stepsize. It also shows that
Stepsize will not change much after 5000, and both TOP-1 and
TOP-5 can be obtained when Stepsize is 7000. With the
highest accuracy, the accuracy of TOP-1 is 62.8%, and the
accuracy of top-5 is 89.4%, the accuracy of top-5 was 89.4%.
This has achieved a state-of-the-art classification of multiple
Fig2.Flow chart from obtaining pictures to getting results.
types and quantities of Chinese herbal plant images.

III. Experiments
Database:Our database has 50 categories of Chinese herbal
medicine, containing a total of 8500 images, we divided it into
training and test sets, the training set of 6400 pictures, the test
set of 2100 pictures, The number of pictures of each kind is
different, with only a few dozen pictures and many more than a
few hundred. The distribution ratios are also different, some are
divided into 15% and 85%, some are divided into 40% and
60%, but all are within this range.
Computer configuration: RAM:8G, processor: Intel (R)
Core (TM) i7-6700 CPU @ 3.40GHZ 3.40GHz, GPU:GT730
Network parameter settings: The batch_size of our TEST is
set to 10 (this will not work if the setting is too high). Because
our data has only 50 categories of herbal images, the num_out
of all softmax is set to 50. Test_iter is set to 210, the learning
rate is 0.0035, momentum is set to 0.9, and weight reduction of Fig 4. Accuracy rate obtained by different Stepsize
0.0005., Stepsize is set to 7000 (tested to get the best results),
Max_iter is set to 35000 (the experiment shows that the
accuracy is basically unchanged after 5 iterations) and is finally IV. Discussion
set to GPU training. This article mainly solves some problems in previous
studies. The first is that the pictures we used contain more
Experimental results:We got the accuracy of TOP-1 and
TOP-5 respectively,and Fig. 3 shows the change in the complicated backgrounds, and the second is the number of
accuracy of our operation. TOP-1 means the accuracy of the species and the total number of pictures. This makes our model
one prediction,TOP-5 means the accuracy of the five can be applied to practical problems, also can overcome the
prediction. shortcomings of manual classification that require rich
experience, and can save a lot of time.

237
V. ACKNOWLEDGEMENTS [12] Zhang Shan—wen, Lei Ying—ke, Dong Tian·bao, et a1. Label
propagation based supervised locality pmjection analysis for plant leaf
This work was supported by Natural Science Foundation classification[J]. Pattern Recognition, 2013, 46(7):1891-1897.
of Shandong Province, China (ZR2017BF041, [13] Unger J, Merhof D, Renner S. Computer vision applied to herbarium
ZR2018MF011) and National Natural Science Foundation of specimens of german trees: testing the future utility of the millions of
China (71271125, 61502260). herbarium specimen images for automated identification. BMC Evol
Biol.2016;16(1):248.
VI. COMPETING INTERESTS
[14] Luo Dehan, Wang Jian, ChenYimin Classification of Chinese Herbal
The authors declare that they have no competing Medicines Based on SVM(J) DOI: 10.1109/InfoSEEE.2014.6948152
interests. [15] Maolin Wang,[a, b] Li Li,[b] Changyuan Yu,[a] Aixia Yan,Zhongzhen
Zhao, Ge Zhang,Miao Jiang,Aiping Lu, and Johann Gasteiger Classification
REFERENCE of Mixtures of Chinese Herbal Medicines Based on a Self-organizing Map
[1] Pushpa BR, Anand C and Mithun Nambiar P. Ayurvedic Plant Species (SOM)(J) DOI: 10.1002/minf.201500115
Recognition using Statistical Parameters on Leaf Images. Volume 11, Number [16] A. Krizhevsky, I. Sutskever, and G. Hinton. Im-agenet classification with
7 (2016) pp 5142-5147 deep convolutional neural networks. In Advances in Neural Information
[2] Kumar, S. E, “Leaf Color, Area and Edge features based approach for Processing Systems 25, pages 1106–1114, 2012.
Identification of Indian Medicinal Plants”. International Journal of Computer [17] M. D. Zeiler and R. Fergus. Visualizing and under-standing
Science and Engineering, 2012. 3(3), pp.436-442
convolutional networks. In D. J. Fleet, T. Pa-jdla, B. Schiele, and T.
[3] MALLAH C, COPE J, O WELL J. Plant leaf classification using Tuytelaars, editors, ECCV, volume 8689 of Lecture Notes in Computer
probabilistic integration of shape, texture and margin features. 2014--01--06 Science, pages 818–833. Springer, 2014.
[4] Lee, K.B. and Hong, K.S. An implementation of leaf recognition system [18] K. Simonyan and A. Zisserman. Very deep convolutional networks for
using leaf vein and shape.International J ournal of B io-Science and large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
BioTechnology, 2013.5(2), pp.57-66. [19] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D.
[5] Ehsanirad, A. and Sharath Kumar, Y.H. Leaf recognition for plant Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In
classification using GLCM and PCA methods. Oriental Journal of Computer Proceedings of the IEEE Conference on Computer Vision and Pattern
Science and Technology, 2010.3(1), pp.31-36. Recognition, pages 1–9, 2015.
[6] Kadir, A., Nugroho, L.E., Susanto, A. and Santosa,P.I., 2013. Leaf [20] M. Lin, Q. Chen, and S. Yan. Network in network. CoRR,
classification using shape , color,and texture features. arXiv preprint
arXiv:1401.4447. abs/1312.4400, 2013.
[21] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun. Deep Residual
[7] Li Z., “Chinese herbal medicine feature extraction and identification
system,” in Master Thesis, 2013. Learning for Image Recognition. Proceedings of the 2016 IEEE Conference
[8] Perez, A.J., Lopez, F., Benlloch, J.V. and Christensen, S. Colour and shape on Computer, Vision, Pattern Recognition,2016
analysis techniques for weed detection in cereal fields. Computers and [22] Xin Sun, Huinan Qian . Chinese Herbal Medicine Image Recognition and
electronics in agriculture, 2000. 25(3), pp.197-212. Retrieval by Convolutional Neural Network(J). PLOS ONE |
[9] Vijayashree, T. and Gopal, A. Classification of Tulsi Leaves Based on DOI:10.1371/journal.pone.0156327 June 3, 2016
Texture Analysis, 2015. [23] Jose Carranza-Rojas, Herve Goeau, Pierre Bonnet , Erick Mata-Montero1
[10] Zhang J, Huang K, Yu Y, and Tan T, “Boosted local structured hog-lbp and Alexis Joly . Going deeper in the automated identification of Herbarium
for object localization,” in Computer Vision and Pattern Recognition, 2011. specimens(J). BMC Evolutionary Biology (2017) 17:181 DOI
[11] Abdel-Hakim A. E. and Farag A. A., “Csift: A sift descriptor with color 10.1186/s12862-017-1014-z
invariant characteristics,” in Computer Vision and Pattern Recognition, 2006

238

You might also like