A Survey of Relation Extraction of Knowledge Graphs
A Survey of Relation Extraction of Knowledge Graphs
A Survey of Relation Extraction of Knowledge Graphs
Founding Editors
Gerhard Goos
Karlsruhe Institute of Technology, Karlsruhe, Germany
Juris Hartmanis
Cornell University, Ithaca, NY, USA
123
Editors
Jingkuan Song Xiaofeng Zhu
University of Electronic Science Massey University
and Technology of China Auckland, New Zealand
Chengdu, China
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
The Asia Pacific Web (APWeb) and Web-Age Information Management (WAIM)
Joint Conference on Web and Big Data is a leading international conference for
researchers, practitioners, developers, and users to share and exchange their
cutting-edge ideas, results, experiences, techniques, and tools in connection with all
aspects of Web data management. This third joint event of APWeb and WAIM
(APWeb-WAIM 2019) was held in Chengdu, China, during August 1–3, 2019, and it
attracted participants from all over the world.
Along with the main conference, APWeb-WAIM 2019 workshops provided an
international forum for researchers to discuss and share research results. After
reviewing the workshop proposals, we were able to accept two workshops, which
focused on topics in knowledge graph and data science. The covered topics in these
workshops contributed to the main themes of the APWeb-WAIM conference. For these
workshops, we accepted 8 full papers that were carefully reviewed from 18
submissions. The two workshops were as follows:
– The Second International Workshop on Knowledge Graph Management and
Analysis (KGMA 2019)
– The First International Workshop on Data Science for Emerging Applications
(DSEA 2019)
The workshop program would not have been possible without the authors who
chose APWeb-WAIM for disseminating their findings. We would like to thank our
authors who improved and extended their papers based on the reviewers’ feedback and
the discussions held during APWeb-WAIM 2019. We would also like express our
thanks to all the workshop organizers for their great effort in making the
APWeb-WAIM 2019 workshops a success, and the conference general co-chairs Heng
Tao Shen, Kotagiri Ramamohanarao, and Jiliu Zhou, and Program Committee
co-chairs Jie Shao, Man Lung Yiu, and Masashi Toyoda for their great support.
Volunteers helped with local arrangements and on-site setups, and many other
important tasks. While it is difficult to list all their names here, we would like to take
this opportunity to sincerely thank them all.
KGMA 2019
Workshop Co-chairs
Xin Wang Tianjin University, China
Yuan-Fang Li Monash University, Australia
DSEA 2019
Program Co-chairs
Lianli Gao University of Electronic Science and Technology
of China, China
Han Su University of Electronic Science and Technology
of China, China
viii Organization
KGMA
DSEA
Peng Peng(B)
Abstract. RDF is increasingly being used to encode data for the seman-
tic web and data exchange. There have been a large number of stud-
ies that address RDF data management over different distributed plat-
forms. In this paper we provide an overview of these studies. This paper
divide the studies of existing distributed RDF systems into two cate-
gories: partitioning-based approaches and cloud-based approaches. We
also introduce a partition-tolerant distributed RDF system, gStoreD .
1 Background
Since Google launched the knowledge graph project at 2012, there are an increas-
ing number of institutes and companies following the project to propose their
own knowledge graphs. Essentially, knowledge graph is a semantic network,
which models the entities (including properties) and the relation between each
other.
Right now, Resource Description Framework (RDF) is the de facto standard
of the knowledge graph. RDF is a family of specifications originally designed
as a metadata data model. It has also been used in knowledge management
applications. Based on the model of RDF, machine in the web can understand
the information on it and the interrelationships among them.
In general, RDF represents data as a collection of triples of the form <subject,
property, object>. A triple can be naturally seen as a pair of entities connected
by a named relationship or an entity associated with a named attribute value.
Thus, an RDF dataset can be represented as a graph where subjects and objects
are vertices, and triples are edges with property names as edge labels. On the
other hand, to retrieve the RDF dataset, a query language SPARQL is designed.
A SPARQL query is a set of triple patterns with variables and can be also seen as
a query graph with variables. Essentially, answering a SPARQL query requires
finding subgraph matches of the query graph over an RDF graph.
Figure 1(a) shows an example RDF graph, which describe some facts about
the philosopher Boethius and is a part of a well-known RDF graph, DBpedia [7].
c Springer Nature Switzerland AG 2019
J. Song and X. Zhu (Eds.): APWeb-WAIM 2019 Workshops, LNCS 11809, pp. 3–7, 2019.
https://doi.org/10.1007/978-3-030-33982-1_1
4 P. Peng
An example query of four edges to retrieve all people influencing people born in
Rome is given in Fig. 1(b). After evaluating the query over the RDF graph, we
can find out that Cicero and Proclus influence Boethius who is born in Rome.
"Rome"@en "Boethius"@en
dbo:name dbo:name
dbr:Rome dbo:birthPlace dbr:Boethius
As more and more people publish their datasets in the model of RDF, the
sizes of RDF graphs are beyond the capacity of a single machine. For example,
YAGO [9] extracted from Wikipedia by Max Planck Institute contains about
284 million triples; Freebase [2], a collaboratively created knowledge graph for
structuring human knowledge, contains more than 2 billion triples; DBpedia
[7] extracted from Wikipedia by multiple institute contains more than 9 billion
triples. Thus, designing a distributed RDF database system is essential.
In this paper, we provide an overview of some distributed distributed RDF
systems. We first present a brief survey of distributed RDF systems and catego-
rize them into two categories in Sect. 2. We further discuss a partition-tolerant
distributed RDF system implemented by our labs, named gStoreD , in Sect. 3.
Finally, Sect. 4 concludes our findings.
There have been many distributed RDF systems for distributed SPARQL query
evaluation, and two very good surveys are [1,5]. In general, these distributed
RDF systems can be divided into two categories: partitioning-based approaches
and cloud-based approaches.
S1
Local partial
matches in S1
Assemble
SPARQL S2 Local partial all local SPARQL
queries matches in S2 partial matches
matches
Sn
Local partial
matches in Sn
4 Conclusions
In this paper, we classify and present a brief overview of systems in each category.
There are many additional works on distributed RDF management that are
omitted in this paper. Most notably, works on federated RDF systems are topics
that are not covered.
Acknowledgment. This work was supported by NSFC under grant 61702171, Hunan
Provincial Natural Science Foundation of China under grant 2018JJ3065, and the Fun-
damental Research Funds for the Central Universities.
References
1. Abdelaziz, I., Harbi, R., Khayyat, Z., Kalnis, P.: A survey and experimental com-
parison of distributed SPARQL engines for very large RDF data. PVLDB 10(13),
2049–2060 (2017)
2. Google: Freebase data dumps (2017)
3. He, L., et al.: Stylus: a strongly-typed store for serving massive RDF data. PVLDB
11(2), 203–216 (2017)
4. Huang, J., Abadi, D.J., Ren, K.: Scalable SPARQL querying of large RDF graphs.
PVLDB 4(11), 1123–1134 (2011)
5. Kaoudi, Z., Manolescu, I.: RDF in the clouds: a survey. VLDB J. 24(1), 67–91
(2015)
6. Karypis, G., Kumar, V.: Multilevel graph partitioning schemes. In: ICPP, pp. 113–
122 (1995)
7. Lehmann, J., et al.: DBpedia - a large-scale, multilingual knowledge base extracted
from Wikipedia. Semant. Web 6(2), 167–195 (2015)
8. Madkour, A., Aly, A.M., Aref, W.G.: WORQ: workload-driven RDF query pro-
cessing. In: Vrandečić, D., et al. (eds.) ISWC 2018. LNCS, vol. 11136, pp. 583–599.
Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00671-6 34
9. Mahdisoltani, F., Biega, J., Suchanek, F.M.: YAGO3: a knowledge base from mul-
tilingual Wikipedias (2015)
10. Peng, P., Zou, L., Chen, L., Zhao, D.: Query workload-based RDF graph fragmen-
tation and allocation. In: EDBT, pp. 377–388 (2016)
Distributed Query Evaluation over Large RDF Graphs 7
11. Peng, P., Zou, L., Chen, L., Zhao, D.: Adaptive distributed RDF graph fragmen-
tation and allocation based on query workload. IEEE Trans. Knowl. Data Eng.
31(4), 670–685 (2019)
12. Peng, P., Zou, L., Guan, R.: Accelerating partial evaluation in distributed SPARQL
query evaluation. In: ICDE, pp. 112–123 (2019)
13. Peng, P., Zou, L., Özsu, M.T., Chen, L., Zhao, D.: Processing SPARQL queries
over distributed RDF graphs. VLDB J. 25(2), 243–268 (2016)
14. Schätzle, A., Przyjaciel-Zablocki, M., Skilevic, S., Lausen, G.: S2RDF: RDF query-
ing with SPARQL on spark. PVLDB 9(10), 804–815 (2016)
15. Shao, B., Wang, H., Li, Y.: Trinity: a distributed graph engine on a memory cloud.
In: SIGMOD, pp. 505–516 (2013)
16. Wu, B., Zhou, Y., Yuan, P., Liu, L., Jin, H.: Scalable SPARQL querying using
path partitioning. In: ICDE, pp. 795–806 (2015)
17. Wylot, M., Mauroux, P.: DiploCloud: efficient and scalable management of RDF
data in the cloud. TKDE, PP(99) (2015)
Classification-Based Emoji Recommendation
for User Social Networks
1 Introduction
Multiple forms of visual expression, including emoticons, emojis, stickers, and memes,
have become prevalent along the development of the Internet, gradually changing
people’s narrative structure. Especially, emojis are being adopted at a faster rate than
any other “language” and most of us now use the colorful symbols to communicate.
But only limited emojis can be displayed on a very small screen of cellphone. With the
increasing of the variety and quantity of visual expression, users always have to open
visual expression folder and scroll up and down to choose the most suitable one from
hundreds of different emoji icons, which seriously affects the experience of people and
also increases the search time of user. Therefore, it becomes an interesting research
issues that how to automatically recommend suitable emojis to users according to the
contexts.
In order to solve this problem, researchers put forward emoji prediction methods by
exploring the relationship between texts and emojis. However there still exist some
problems that causing recommendation results inaccurate. The deeper cause is failure
to focus on users’ motivation to use emojis in real social media. First of all, people tend
to use emojis on social media with text in order to express their emotions or reveal
meanings hidden by figurative language. Secondly, emoji cannot be used as the sole
marker of the context, and the same context may be suitable for adding different emojis.
For example, you send a short message to a friend: “I fell and hit my head on the
cupboard”. Your friends probably do not know whether to sympathize with you or
laugh at you. If you add a “sad” emoji at the end of this sentence, it is equivalent to
providing a non-verbal clue, “I feel very painful”. If you add a “tear in laughter” emoji
at the end, it means a little self-mockery. What’s more, there is no relevant work
focusing on the cross-cutting between the use of emojis and its cultural background.
For example, there are some hot words (like “ ”) which should be correctly
explained in Chinese context. Another key reason is the incomplete feature extraction
of word vectors, because the limitation of corpus.
This problem meets some challenges. At first, the understanding to the semantic
relationship between texts and emojis is a challenging task because emojis often have
different interpretations that depend on the reader. Second, the imbalance of the emoji
data set used in social context, because there are some emojis are often used, while
others are rarely used.
This paper studied the problem and proposed a solution. The main contributions
can be summarized as follows:
(1) A method of constructing emoji-related features is proposed. Specifically, it
includes emoji-related context document representation, constructing emoji-
related keywords dictionary, and constructing relevance features of emojis.
(2) A method of emoji recommendation by integrating emoji-related features is
proposed, as well as the method of classifying emojis based on emotions also is
defined.
(3) The emoji usage data set is established and the proposed method is evaluated
based on the data set. By designing a special micro-blog specialized crawler, 5
million data of Weibo are collected. Based on this data set, the validity of the
proposed method is verified.
2 Related Work
Emoji prediction is an emerging problem [1], which combines the nuances of emo-
tional analysis and the noisy data features of social media. The current research work
mainly focuses on single text feature based emoji prediction and multi-feature fusion
based emoji prediction.
The research based on single text feature mainly focuses on context representation
and semantic analysis of context. Effrosynidis et al. [2] used TF-IDF vector context
combined with linear SVC classification algorithm, using word tri-grams to train
prediction model. Chen et al. [3] proposed a method based on vector similarity to
generate a vector for tweet, and then used cosine similarity method to find the most
appropriate emoji symbols. Çöltekin and Rama [4] proposed using n-gram to represent
10 Y. Wang et al.
text features and using linear classification model SVM to capture local dependencies
to achieve emoji prediction, and the result is better than RNN. Wang and Pedersen [5]
realized multi-channel CNN network model to predict Emoji by improving word
embedding method. As the constant development and Improvement of natural language
information processing, more attention has been paid to the use of semantic analysis
techniques to analyze text. Barbieri et al. [1] proposed a semantic model of emojis
based on LSTM neural network, and explored the relationship between words and
emoji. Xie et al. [6] encoded the context information in the dialogue using a hierar-
chical LSTM network, and then predicted it according to the dialog representation they
learned. Wu et al. [7] proposed a method to representation tweets through combining
CNN and LSTM layers.
The research based on multi-feature fusion mainly considers fusion text features,
emotional features, external environment features and so on. Guibon et al. [8] proposed
a multi-label stochastic forest algorithm model which combines text features and
emotion-related features to predict emoji. Choudhary et al. [9] constructed an emotional
classification tool, using emoji as a tag for emotional analysis, to further predict emoji
without emoji data. Baziotis et al. [10] used Bi-LSTM with attention and pre-trained
word2vec vectors to predict emoji by using external resources to link each tweet with
emotional, specific and familiarity information. Liu [11] put forward two models to
improve the classification of multiple emoji by utilizing general text features and
adding some external resources such as various artificial dictionaries of affective words.
Emoji can smooth online communication, particularly when tones such as sarcasm
and dry wit are difficult to display in text-based communication [12]. In fact, indi-
viduals can incorporate playful elements into the more mundane message by adding
different emoji [13, 14]. For instance, Zhou et al. [15] reported that their participants
would add an emoji if they believed that their text might cause negative feelings on the
receiving end.
The above research work focuses on how to effectively extract emoji-related text
features to represent the relationship between emojis and texts from the perspective of
text. In fact, every emoji has its corresponding semantic representation. In this paper,
emoji is labeled by the way of “[ ]”. For example, corresponds to “[sad]” in the text.
The word “sad” and its related words are very important for understanding the rela-
tionship between texts and emojis. Our main contribution in this paper is to effectively
represent the relationship between emojis and texts from the perspective of emoji. At
the same time, most emojis have obvious emotional tendencies and people also tend to
use emoji’s emotional characteristics to express their emotions or attitudes for one
people or one thing. Our work focuses on recommendation several emojis for a sen-
tence, because users may express different emotions for one sentence with different
emojis. Based on above reasons, we first propose extracting emoji-related features,
including emoji-related keywords feature, each emoji context document feature rep-
resentation, relevance features of emojis. Next we propose a method to compute
similarity score for every sentence with different emojis. Finally, several emojis with
high similarity are used as recommended results.
Classification-Based Emoji Recommendation 11
tfi;j represents
P the frequently of wi , ni;j represents the number of wi in j emoji-related
document. nk;j represents the sum of the number of occurrences of k words in j
k
document. The inverse document frequency calculation is shown in formula (2).
jDj
idfj ¼ log ð2Þ
1 þ j fj : t i 2 di gj
IDF is a measure of the universal importance of a word. Where: jDj represents the
total number of all documents. jfj : ti 2 di gj denotes the number of documents con-
taining the word ti . Finally, the calculation of the normalization of TF-ID is shown in
formula (3).
Finally, we gain the TF-IDF values for each word in each target document, as well
as the eigenvector of each emoji-related target document is also generated.
Classification-Based Emoji Recommendation 13
The Skip-Gram model predicts the context Sw(t) = (w(t−k), …, w(t−1), w(t), w(t+1),
…, w(t + k)) by inputting word w(t), K is the size of the context window of w(t), that is
the number of selected words in left and in right. The CBOW model is contrary to
Skip-Gram model, it predict w(t) through context Sw(t).
We use gensim of python to implement Word2Vec. In order to train model, the
model input is processed 250,000 emoji-related corpus. Corresponding semantic words
of emojis are used as target feature, the model outputs the related words with target
feature through calculating the similarity between the feature words and others words
of corpus.
14 Y. Wang et al.
The second method is described as below. We first merge the existing sentiment
dictionaries, then delete the repetitive words, and finally get a new sentiment dic-
tionary. We travel word segmentation results of each emoji-related target document to
match new sentiment dictionary, we gain each emoji-related sentiment lexicon finally.
A new representative emoji keyword dictionary is obtained by merging the feature
words obtained in two ways. The final result of construction for “[Disappointment]’
and “[be shocked]” are shown in the Table 2.
Feature words
[ Disappointment ]
[ be shocked ]
Construction of
custom dictionary
We can gain each emoji document context representation with the methods of the
Sect. 3.2. In this section, the cosine similarity between each emoji context set and all
emoji context set is calculated by cycling each emoji context set, and n multi-
dimensional feature vectors are obtained. The larger value in the vector, the more
similar context two emoji within, which means two emoji are closer. Formula (4)
shows the cosine similarity calculation between documents.
Classification-Based Emoji Recommendation 15
Pn
i¼1 ðai bi Þ AB
cos h ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Pn qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
P ¼ ð4Þ
2 n 2 j Aj jBj
i¼1 ðai Þ i¼1 ðbi Þ
A represents the feature vector of one emoji document,ai represents ith element in
vector A. B represents the feature vector of another emoji document,bi represents ith
element in vector B. A B represents the dot product of the two vectors. jAj jBj
represents the product of absolute values of two vectors. And the specific algorithm is
shown in Algorithm 1.
Through the calculation, the similar weight vectors of each emoji and other emojis
are obtained. The similar weight vectors of each emoji can be mapped to a point in high
latitude space. The omni-directional N-dimension of the vector measures the use of
context-based expressions. K-Means algorithm is an unsupervised clustering algorithm.
It is relatively simple to implement and has good clustering effect. So we choose it to
cluster emoji. Finally, emojis are divided into several categories.
Three emoji-related features have obtained in Sect. 3. Given three features: emoji-
related context document representation, emoji-related keywords dictionary, the cor-
relations features between of emojis. Next, we propose the recommendation method by
integrating them. The details of our method are shown as follows.
number of feature words in each emoji-related dictionary, and each feature number
represents the correlation between the input text and one emoji. In order to calculate the
similarity between input text and different emojis, we sum each emoji-related feature
number as the total feature number, and then use the quotient between the feature
numbers of each emoji and the total feature number to represent the similarity. On the
other hand, the results of word segmentation are used for text representation according
Sect. 3.2, and then utilize cosine similarity of the vector to calculate the similarity
between test document and each target document, and each value in the vector rep-
resents the similarity between the test document and each emoji. Finally, we add up the
two similarity values to get a new value, which is used as the final recommendation
result.
All emojis are divided into categories by calculating the correlations features
between emojis, which category represents the similarity of emoji usage. We find that
the classification result tends to classify emoji emotions, and it is highly relevant to the
user’s motivation to use it. We ask five experts to fine-tune the classification results to
ensure the accuracy of the classification results. Finally, the classification results are
used to expand the emoji verification set, the aim is to classification emotions of
emojis.
This section mainly introduces how to design experiments and proves the effectiveness
of the proposed method. Section 5.1 introduces the work of data acquisition and
cleaning. Section 5.2 introduces the processing of extracting emojis and dividing the
experimental dataset. Section 5.3 defines the evaluation indicators of experiments
result. And the last section analyses the experimental results.
Emoji
In order to balance the test corpus, we cycle each emoji label and randomly select
20 contexts of each emoji to form a test set containing 940 contexts. The statistics of
dataset are listed in Table 4.
N is used to represent the total number of test sets and n is used to represent the
number of correct recommendations. For the definition of n, considering that emoji
classification is based on the contextual similarity of emoji, so emoji belonging to the
same category are very similar in the context usage. Therefore, we use emoji classi-
fication results to expand the emoji verification set, as long as one of the recommended
results is included in the emoji verification set. We define the recommendation is
correct.
Based on the above analysis, we find that D is the smallest when K = 6, 10, 13, but
consider that there are 47 emojis categories, when K = 13, it accounts for more than
1/3 of the number of categories to remove this point. Next, we test K = 6 and 10, and
compare which classification is better. The following tables present the results of emoji
classification when K = 6 and K = 10.
Secondly, we set the scope of N according to the actual needs of users. For the
selection of recommendation threshold, we surveyed 30 emoji users. More than half of
users mentioned that they use emoji more often every day and spend more than 10 s on
finding emoji on average. They stress that if can narrow downed the emoji choices,
they might save a few minutes or even longer to do something more valuable every
day. At the same time, ninety percent of them believed that the number of recom-
mended emoji should be between 3 and 5. They believed that too many recommen-
dation results would increase their operate burden, and too few recommendation results
might be less efficient than users’ search efficiency. Therefore, this paper select 3, 4 and
5 recommendation numbers as experimental thresholds, and finally determines the most
effective recommendation number according to the accuracy of the recommendation
(Tables 5 and 6).
For each emoji in the experiment, we randomly select 20 contexts to compose our
test set. Verifying recommendation method by using test sets, we come to the con-
clusion as show in the figure below. Figure 4 shows the trend of correct recommen-
dation for each emoji under different combinations of emoji classification and
recommendation thresholds. The abscissa represents the each emoji and the ordinate
represents the number of correct recommendation in 20 contexts. We can see that there
are some emojis with low accuracy in Fig. 4. We analyze the possible reasons as
follows. The first is the number of corpus may limit the extraction of emoji usage
features, resulting in inaccurate recommendation. The second is that some emojis are
used flexibly in real context. These emojis (like ) often appear in various contexts,
which make it difficult to extract emoji usage features.
20 Y. Wang et al.
1
2
10
At the same time, we also set up four comparative experiments in order to prove the
effectiveness of our proposed method. The difference between the four groups of
experiments is that the added features are different, and the calculation method of these
features is the same as the method proposed in this paper. The last group of experi-
ments represents the proposed method, and the bold numbers in the table represent the
accuracy of the proposed method. We use the combination of TF-IDF and Doc2bow
(mainly used to implement Bow model based on document) to obtain emoji-related
context features.
The specific experimental accuracy is shown in the Table 7 below. We can see that
the accuracy is higher when the number of classification categories K = 6. The
accuracy of recommendation is higher when emoji-related feature words are added
separately. The proposed method in this paper combines the characteristics of the
above two baselines, and we can see that the accuracy of this method is the highest in
the four experiments.
20 20
Number of Correct
Number of Correct
Recommendation
Recommendation
15 15
10 10
5 5
0 0
1 11 21 31 41 1 11 21 31 41
emoji emoji
N=5 N=4 N=3 N=5 N=4 N=3
6 Conclusion
References
1. Barbieri, F., Ballesteros, M., Saggion, H.: Are emojis predictable? In: Proceedings of the
15th Conference of the European Chapter of the Association for Computational Linguistics:
Volume 2, Short Papers, pp. 105–111 (2017)
2. Effrosynidis, D.: DUTH at SemEval-2018 Task 2: emoji prediction in tweets. In:
Proceedings of the 12th International Workshop on Semantic Evaluation, SemE-
val@NAACL-HLT, New Orleans, Louisiana, USA, 5–6 June 2018, pp. 466–469 (2018)
3. Chen, J., Yang, D., Li, X., Chen, W., Wang, T.: Peperomia at SemEval-2018 Task 2 : vector
similarity based approach for emoji prediction. In: Proceedings of the 12th International
Workshop on Semantic Evaluation, SemEval@NAACL-HLT, New Orleans, Louisiana,
USA, 5–6 June 2018, pp. 428–432 (2018)
4. Rama, T.: Tübingen-Oslo at SemEval-2018 Task 2 : SVMs perform better than RNNs at
emoji prediction. In: Proceedings of the 12th International Workshop on Semantic
Evaluation, SemEval@NAACL-HLT, New Orleans, Louisiana, USA, 5–6 June 2018,
pp. 34–38 (2018)
5. Wang, Z., Pedersen, T.: UMDSub at SemEval-2018 Task 2 : multilingual emoji prediction
multi-channel convolutional neural network on subword embedding. In: Proceedings of the
12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT, New
Orleans, Louisiana, USA, 5–6 June 2018, pp. 395–399 (2018)
22 Y. Wang et al.
6. Xie, R., Liu, Z., Yan, R., Sun, M.: Neural emoji recommendation in dialogue systems. arXiv
e-prints (2016). http://arxiv.org/abs/1612.04609
7. Wu, C., Wu, F., Wu, S., Yuan, Z.: THU NGN at SemEval-2018 Task 2 : residual CNN-
LSTM network with attention for English emoji prediction. In: Proceedings of the 12th
International Workshop on Semantic Evaluation, SemEval@NAACL-HLT, New Orleans,
Louisiana, USA, 5–6 June 2018, pp. 410–414 (2018)
8. Guibon, G., Ochs, M., Bellot, P.: Prédiction automatique d’ emojis sentimentaux. In: The
14th French Information Retrieval Conference, Marseille, France, 29–31 March 2017,
pp. 59–74 (2017)
9. Choudhary, N., Singh, R., Rao, V.A., Shrivastava, M.: Twitter corpus of resource-scarce
languages for sentiment analysis and multilingual emoji prediction. In: Proceedings of the
27th International Conference on Computational Linguistics, COLING, Santa Fe, New
Mexico, USA, 20–26 August 2018, pp. 1570–1577 (2018). http://www.aclweb.org/
anthology/C18-1133
10. Baziotis, C., Athanasiou, N., Paraskevopoulos, G., Ellinas, N., Kolovou, A., Potamianos, A.:
NTUA-SLP at SemEval-2018 Task 2 : predicting emojis using RNNs with context-aware
attention. In: Proceedings of the 12th International Workshop on Semantic Evaluation,
SemEval@NAACL-HLT, New Orleans, Louisiana, USA, 5–6 June 2018, pp. 438–444
(2018)
11. Liu, Man.: EmoNLP at SemEval-2018 Task 2 : english emoji prediction with gradient
boosting regression tree method and bidirectional LSTM. In: Proceedings of the 12th
International Workshop on Semantic Evaluation, SemEval@NAACL-HLT, New Orleans,
Louisiana, USA, 5–6 June 2018, pp. 390–394 (2018)
12. Kaye, L.K., Wall, H.J., Malone, S.A.: “Turn that frown upside-down”: a contextual account
of emoticon usage on different virtual platforms. Comput. Hum. Behav. 60, 463–467 (2016).
https://doi.org/10.1016/j.chb.2016.02.088
13. Hu, T., Guo, H., Sun, H., Nguyen, T.V.T., Luo, J.: Spice up your chat: the intentions and
sentiment effects of using emojis. In: Proceedings of the Eleventh International Conference
on Web and Social Media, ICWSM, Canada, 15–18 May 2017, pp. 102–111 (2017). http://
arxiv.org/abs/1703.02860
14. Ma, X.: From internet memes to emoticon engineering: insights from the baozou comic
phenomenon in China. In: Human-Computer Interaction. Novel User Experiences - 18th
International Conference, HCI International, Toronto, ON, Canada, 17–22 July 2016,
pp. 15–27 (2016). https://doi.org/10.1007/978-3-319-39513-5
15. Zhou, R., Hentschel, J., Kumar, N.: Goodbye text, hello emoji : mobile communication on
WeChat in China. In: Proceedings of the 2017 CHI Conference on Human Factors in
Computing Systems, Denver, CO, USA, 06–11 May 2017, pp. 748–759 (2017)
16. Salton, G., Yu, C.T.: On the construction of effective vocabularies for information retrieval.
In: Proceedings of the 1973 meeting on Programming Languages and Information Retrieval,
Gaithersburg, Maryland, USA, 4–6 November 1973, pp. 48–60 (1973)
17. Wu, Y., Li, Y., Hao, G.: A web-based theme-related word set construction algorithm. In:
Web and Big Data - APWeb-WAIM 2018 International Workshops: MWDA, BAH,
KGMA, DMMOOC, DS, Macau, China, 23–25 July 2018, pp. 188-200 (2018)
Leveraging Context Information for Joint
Entity and Relation Linking
Keywords: Entity linking Relation linking Joint entity and relation linking
knowledge base question answering Context information
1 Introduction
relations to the correct ones in the KB is especially important. In most entity linking
systems like [2], entity/relation disambiguation is typically performed by calculating
the co-occurrence probability with other entities and relations in an input question.
However, when the length of the input question is short, it is often unable to get enough
information from the mentions in the context. Therefore, it will be more efficient to
combine the input entities and relations to optimize the disambiguation.
To this end, Dubey et al. [3] proposed a method for joint disambiguation of entities
and relations, called EARL (Entity and Relation Linker). Specifically, EARL uses
statistical ideas to calculate the number of relations around each entity, and this number
is then used as a feature to train the model used to generate matching scores for
candidate elements. The candidate element with the highest score is considered as the
best candidate. Meanwhile, the method formalizes the joint entity and relation linking
task into the Generalized Traveling Salesman Problem (GTSP) [4], and thus this NP-
hard problem can be approximately solved in polynomial time.
EARL regards entities and relations as nodes and edges in a graph, and effectively
utilizes the graph structure information. However, this method also has a few limita-
tions. For example, the non-entity/non-relation vocabularies in the context are not
exploited. Let us consider the following example: “Where was the father of Barack
Obama born?” According to EARL, all non-entity/non-relation words would be dis-
carded. Thus, it only recognizes “Barack Obama” as an entity and “born” as a relation,
and generates candidate elements for them separately. Similarly, for the question
“When was the father of Barack Obama born?”, it would produce the same result as the
previous question. However, the two questions are about location (“where”) and time
(“when”), respectively, so the answers to them are obviously different. EARL is
therefore ineffective in terms of relation disambiguation.
In view of the shortcomings of EARL, we propose an improved entity and relation
joint linking method, called EEARL (Extended Entity and Relation Linker). The basic
idea of EEARL is that, when dealing with relation linking, we consider not only the
impact of entities and relations in the context, but also the impact of other non-
entity/non-relation vocabularies (e.g., wh-words). More specifically, when a candidate
element is generated for a relation, we extract the domain, range, type, and the local
name of the relation URI, and merge them into the feature vector. Meanwhile, we use
the Long Short-Term Memory (LSTM) [5] network to fuse the important non-
entity/non-relation vocabularies in the context into the context vector, and calculate the
similarity between the two vectors as the final score.
We conducted performance evaluation experiments on two typical benchmark
datasets, LC-QuAD [6] and QALD-7 [7], to compare EEARL with EARL and several
baseline methods. The experimental results show that EEARL outperforms EARL and
the baseline methods in terms of both entity linking and relation linking accuracy.
In summary, the main contributions of this paper are listed as follows:
• We propose to leverage full context information for entity and relation linking.
• We use a word-level LSTM to capture non-entity/non-relation information in the
context, a character-level LSTM to complete the domain/range vector and a word-
level LSTM to represent the relation URI’s local name for feature modeling.
Leveraging Context Information for Joint Entity and Relation Linking 25
• We integrate our extensions into EARL, thereby forming EEARL and achieving
performance improvement in terms of entity/relation linking accuracy.
The remainder of this paper is organized as follows. Section 2 briefly reviews the
work related to entity and relation linking. On the basis of the idea of EARL, we
describe our improved method EEARL in Sect. 3. Section 4 reports the experimental
results of performance evaluation. Finally, we conclude this paper and discuss future
work in Sect. 5.
2 Related Work
Entity and relation linking has attracted many researchers’ attention. Ratinov et al. [8]
first used a local optimization scheme to link each entity phrase for obtaining a sub-
optimal solution with certain quality, and then calculated the association characteristics
between the candidate entity and other entities according to the suboptimal solution,
which was finally used to train the entire model. The S-MART model was proposed by
Yang and Chang [9], which resolves the limitation that multiple entity phrases cannot
overlap. In this method, all phrases are linked by the Forward-Backward algorithm to
ensure that, when the phrases overlap, at most one phrase points to a specific entity, and
the rest points to an empty entity. AGDISTIS [10] is a graph-based disambiguation
system based on the hop count distance between candidate objects for multiple entities
in a given text.
Due to its particularity, relation mapping generally needs to be analyzed for specific
situations. The word embedding model tackles the semantic heterogeneity in relation
mapping by learning a plausible vector representation for each word from a large
amount of corpus. Many models, e.g., ReMatch [11] and RelMatch [12], use WordNet
[13] similarity for relation mapping, and have achieved promising results.
Many existing KBQA systems employ a generic entity linking tool for entity
linking. Most of them are based on the context or other entities in the same sentence to
disambiguate. However, in practice, a question may contain few entities, which makes
it ineffective to disambiguate based on other entities. To overcome this limitation,
many KBQA systems have recently been developed. For example, Dubey et al. [3]
firstly linked entities, and then generated candidate relation linking based on the result
of entity linking, and finally selected the best candidate relations through for example
semantic similarity techniques. In this method, the relation mapping depends on the
entity linking. If the entity linking goes wrong, the error would be transmitted and
amplified. In other systems like XSER [14], entity linking and relation mapping are
executed independently.
Compared with the above existing methods, our proposed method combines entity
linking and relation linking together, which not only solves the problem of lack of
information caused by considering entity linking and relation linking separately, but
also alleviates the problem of error amplification caused by backward propagation.
Additionally, it considers non-entity/non-relation vocabulary information in the context
to improve linking accuracy.
26 Y. Zhao et al.
Given a KB, existing entity and relation linking methods usually identify the bound-
aries of words first, and then use the information of other entities/relations to generate a
list of candidate elements and disambiguate them. For the task of joint disambiguation,
the entities and relations are collectively considered when generating candidate ele-
ments, and more reliable candidate elements are generated based on the information of
entities and relations. The EEARL method proposed in this paper uses (1) the SENNA
tool [15] to extract keyword phrases from the input natural language questions, (2) the
character-level LSTM [5] to predict the entity/relation types of keyword phrases,
(3) ElasticSearch [16] to generate candidate element lists, and calculates scores based
on the connection density and context information to obtain the best entity/relation
mappings in the KB. The detailed processing flow chart for the EEARL method is
illustrated in Fig. 1, where the context information score computation is our main
contribution.
where fBCE denotes the binary cross-entropy loss of the learning target as a phrase of an
entity or relation, and fED denotes the square of the Euclidean distance between the
predicted embedding and the correct embedding of the tag. We empirically select a as
0.25.
For the keyword phrases extracted from the previous question, the LSTM network
predicts “father” and “born” as two relations and “Barack Obama” as an entity.
The number of connections, the number of hops, and the initial ranking of items are
further described as follows:
• Assume that for each entity/relation, we generate a candidate element list. Fol-
lowing the relevant practice in EARL [3], the hop distance is defined as the shortest
distance (i.e., the number of hops) between two elements in the subdivision graph
[3]. Furthermore, it is assumed that there is a lack of semantic connections between
two nodes in a KB that are too far apart, so any two nodes with hop count greater
than 2 are considered to be disconnected. Thus, the number of connections of a
candidate element is defined as the number of connections from it to all candidates
in other lists, divided by the number of keyword phrases.
• The hop count of a candidate element is defined as the sum of the distances from the
element to all the candidates in the other list, divided by the number of the keyword
phrases.
• The initial ranking of an element refers to the ranking of the element in the can-
didate list when Elastic Search [16] retrieves it.
Once the above three numbers are obtained, we use machine learning to train a
classification model for calculating the probability that a candidate becomes the best
candidate given a candidate element list. To better tackle the over-fitting problem and
shorten model training time, we use xgboost [18] as the classifier. When a candidate
element is given, its three numbers are input into the model, and a value between 0 and
1 is obtained as the connection density score, denoted by scoreconnect density .
Leveraging Context Information for Joint Entity and Relation Linking 29
Context Scoring. So far, EEARL only considers the connections between all entities
and relations in natural language questions, without using other non-entity/non-relation
information, which may make the relation linking ambiguity problem difficult to solve
effectively. For general natural language questions, the sentences usually contain wh-
words like “Where”, “When” and “Who”, and these wh-words are effective informa-
tion for solving the ambiguity of the relation. For example, “Where” can restrict the
attributes of certain relations to “place”, and “When” can limit to “time”. Therefore,
such information in the context can be leveraged to help improve the accuracy of
relation linking.
Because the word-level model preserves the semantics of words in sentences well,
EEARL employs a word-level LSTM [5] to convert each non-entity/non-relation
vocabulary in the context into a context vector. The training data consist of sequences
of words and label vectors. Specifically, we regard all the non-entity/non-relation
vocabularies in the question as context, and then use GloVe [19] to convert these words
into vectors. Furthermore, a word-level LSTM is used to compress many such vectors
into one vector, that is, the context vector. The model for generating the context vector
is shown in Fig. 3.
As for feature vectors, we use the embedding models and the LSTM network to
convert the domain, range, type, and the local name of relation URI into four vectors
respectively. Specifically, we use the GloVe embedding model to transfer the domain,
range, and type into three vectors. For the local name of the relation URI, we first split
it into segments, and then use GloVe to generate word vectors, and finally adopt a
LSTM to compress the word vectors into the fourth vector. Finally, we combine the
four vectors to form the feature vector. The feature vector generation model is shown in
Fig. 4.
30 Y. Zhao et al.
Particularly, for a relation element without domain and/or range, we mine the
attributes of the semantically similar words with the relation name to complement the
missing part of the feature vector representing the relation. Specifically, we use the
character embedding based LSTM [20] to implement missing attributes, because this
model can handle the OOV words more effectively. The network used to complete the
domain/range vectors is illustrated in Fig. 5.
Given a natural language question, we use the LSTM network to predict the context
vector (Fig. 3), and adopt the feature vector generation model (Fig. 4) to represent the
feature vector for each candidate element. After that, we calculate the similarity
between the context vector and the feature vector as the context score, denoted by
scorecontext . This indicates that, in addition to the original connection density-based
score, we also consider the impact of context information on relation linking. The final
Leveraging Context Information for Joint Entity and Relation Linking 31
Evaluation Tasks and Metric. We evaluated and compared EEARL with EARL [3]
and several baseline methods on entity linking and relation linking. For entity linking,
the chosen baseline methods include AGDISTIS [10], Spotlight [21], and Babelfy [22].
32 Y. Zhao et al.
The accuracy results of relation linking are shown in Table 2. Observing the results,
we can find that EEARL is better than EARL (accuracy increased by 3%) and far better
than the two baseline methods in terms of relation linking accuracy.
Additionally, Table 2 indicates that the accuracy results of relation linking on LC-
QuAD and QALD-7 are quite different. This is due to the fact that the questions in LC-
QuAD are more complex than those in QALD-7. Besides, Tables 1 and 2 confirm that
adaptive learning can significantly help improve the performance.
4.3 Discussion
According to the results, the improvement of EEARL was mainly reflected in the
treatment of wh-words. For example, given a question “Where were Justina Machado
and John Englehard born?” in LC-QuAD, EARL actually ignored “Where” and thus
incorrectly linked “born” to dbp:birthDate. Differently, EEARL considered the
effect of “Where” on “born” and correctly linked “born” to dbp:birthPlace.
On the other hand, our method still has a few limitations. For instance, when
handling relation linking, it is difficult to distinguish between elements of class type
and property type, resulting in errors in relation mapping. Furthermore, our method
cannot deal with inference and restrictions. For questions such as “Who is the tallest
man in China?”, EEARL fails to deal with the “tallest” restriction and only links “man”
to dbr:Person. However, this problem can be deferred until the subsequent query
generation phase [28], which can be simplified in the linking phase.
5 Conclusion
In this paper, we have proposed an improved joint entity and relation linking method
called EEARL, which leverages full context information to improve linking accuracy.
When generating relation linking, EEARL considers not only the impact of entities and
relations co-occurred in the natural language questions, but also the impact of other
non-entity/non-relation vocabularies in the questions. Our experimental results on two
benchmark datasets show that EEARL outperforms EARL and several baseline
methods in terms of entity linking and relation linking accuracy. In future work, we
look forward to improving the accuracy of entity/relation prediction and exploring
external information outside KBs.
34 Y. Zhao et al.
References
1. Berant, J., Chou, A., Frostig, R., Liang, P.: Semantic parsing on Freebase from question-
answer pairs. In: Proceedings of the 2013 Conference on Empirical Methods in Natural
Language Processing, EMNLP 2013, pp. 1533–1544. Association for Computational
Linguistics (2013). https://www.aclweb.org/anthology/D13-1160
2. Kolitsas, N., Ganea, O.-E., Hofmann, T.: End-to-end neural entity linking. In: Proceedings
of the 22nd Conference on Computational Natural Language Learning, CoNLL 2018,
pp. 519–529. Association for Computational Linguistics (2018). https://aclweb.org/
anthology/papers/K/K18/K18-1050/
3. Dubey, M., Banerjee, D., Chaudhuri, D., Lehmann, J.: EARL: joint entity and relation
linking for question answering over knowledge graphs. In: Vrandečić, D., et al. (eds.) ISWC
2018. LNCS, vol. 11136, pp. 108–126. Springer, Cham (2018). https://doi.org/10.1007/978-
3-030-00671-6_7
4. Pintea, C.-M., Pop, P.C., Chira, C.: The generalized traveling salesman problem solved with
ant algorithms. Complex Adapt. Syst. Model. 5, 8 (2017). https://doi.org/10.1186/s40294-
017-0048-9
5. Lukovnikov, D., Fischer, A., Lehmann, J., Auer, S.: Neural network-based question
answering over knowledge graphs on word and character level. In: Proceedings of the 26th
International Conference on World Wide Web, WWW 2017, pp. 1211–1220. International
World Wide Web Conferences Steering Committee (2017). https://doi.org/10.1145/
3038912.3052675
6. Trivedi, P., Maheshwari, G., Dubey, M., Lehmann, J.: LC-QuAD: a corpus for complex
question answering over knowledge graphs. In: d’Amato, C., et al. (eds.) ISWC 2017.
LNCS, vol. 10588, pp. 210–218. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-
68204-4_22
7. Usbeck, R., Ngomo, A.-C.N., Haarmann, B., Krithara, A., Röder, M., Napolitano, G.: 7th
open challenge on question answering over linked data (QALD-7). In: Dragoni, M., Solanki,
M., Blomqvist, E. (eds.) SemWebEval 2017. CCIS, vol. 769, pp. 59–69. Springer, Cham
(2017). https://doi.org/10.1007/978-3-319-69146-6_6
8. Ratinov, L., Roth, D., Downey, D., Anderson, M.: Local and global algorithms for
disambiguation to Wikipedia. In: The 49th Annual Meeting of the Association for
Computational Linguistics: Human Language Technologies, Proceedings of the Conference,
pp. 1375–1384. Association for Computational Linguistics (2011). https://www.aclweb.org/
anthology/P11-1138
9. Yang, Y., Chang, M.-W.: S-MART: novel tree-based structured learning algorithms applied
to tweet entity linking. In: Proceedings of the 53rd Annual Meeting of the Association for
Computational Linguistics, ACL 2015, vol. 1, pp. 504–513. The Association for Computer
Linguistics (2015). https://www.aclweb.org/anthology/P15-1049
10. Usbeck, R., et al.: AGDISTIS - graph-based disambiguation of named entities using linked
data. In: Mika, P., et al. (eds.) ISWC 2014. LNCS, vol. 8796, pp. 457–471. Springer, Cham
(2014). https://doi.org/10.1007/978-3-319-11964-9_29
11. Mulang, I.O., Singh, K., Orlandi, F.: Matching natural language relations to knowledge
graph properties for question answering. In: Proceedings of the 13th International
Conference on Semantic Systems, SEMANTICS 2017, pp. 89–96. ACM (2017). https://
doi.org/10.1145/3132218.3132229
12. Singh, K., et al.: Capturing knowledge in semantically-typed relational patterns to enhance
relation linking. In: Proceedings of the Knowledge Capture Conference, K-CAP 2017,
Article No. 31, pp. 31:1–31:8. ACM (2017). https://doi.org/10.1145/3148011.3148031
Leveraging Context Information for Joint Entity and Relation Linking 35
13. Miller, G.A., Fellbaum, C.: WordNet then and now. Lang. Res. Eval. 41(2), 209–214 (2007).
https://doi.org/10.1007/s10579-007-9044-6
14. Xu, K., Zhang, S., Feng, Y., Zhao, D.: Answering natural language questions via phrasal
semantic parsing. In: Zong, C., Nie, J.Y., Zhao, D., Feng, Y. (eds.) Natural Language
Processing and Chinese Computing. CCIS, vol. 496, pp. 333–344. Springer, Heidelberg
(2014). https://doi.org/10.1007/978-3-662-45924-9_30
15. Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.P.: Natural
language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011).
https://dl.acm.org/citation.cfm?id=2078186
16. Akdal, B., ÇabukKeskin, Z.G., Ekinci, E.E., Kardas, G.: Model-driven query generation for
ElasticSearch. In: Proceedings of the 2018 Federated Conference on Computer Science and
Information Systems, FedCSIS 2018, pp. 853–862. IEEE (2018). https://doi.org/10.15439/
2018F218
17. Pinter, Y., Guthrie, R., Eisenstein, J.: Mimicking word embeddings using subword RNNs.
In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language
Processing, EMNLP 2017, pp. 102–112. Association for Computational Linguistics (2017).
https://www.aclweb.org/anthology/D17-1010
18. Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the
22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,
KDD 2016, pp. 785–794. Association for Computing Machinery (2017). https://doi.org/10.
1145/2939672.2939785
19. Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation.
In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language
Processing, EMNLP 2014, pp. 1532–1543. Association for Computational Linguistics
(2014). https://www.aclweb.org/anthology/D14-1162
20. Zou, L., Huang, R., Wang, H., Yu, J.X., He, W., Zhao, D.: Natural language question
answering over RDF - a graph data driven approach. In: Proceedings of the ACM SIGMOD
International Conference on Management of Data, SIGMOD 2014, pp. 313–324. ACM
(2014). https://doi.org/10.1145/2588555.2610525
21. Mendes, P.N., Jakob, M., García-Silva, A., Bizer, C.: DBpedia spotlight: shedding light on
the web of documents. In: Proceedings the 7th International Conference on Semantic
Systems, I-SEMANTICS 2011, pp. 1–8. ACM (2011). https://doi.org/10.1145/2063518.
2063519
22. Moro, A., Raganato, A., Navigli, R.: Entity linking meets word sense disambiguation: a
unified approach. Trans. Assoc. Comput. Linguist. 2, 231–244 (2014). https://transacl.org/
ojs/index.php/tacl/article/view/291
23. Ristoski, P., Paulheim, H.: RDF2Vec: RDF graph embeddings for data mining. In: Groth, P.,
et al. (eds.) ISWC 2016. LNCS, vol. 9981, pp. 498–514. Springer, Cham (2016). https://doi.
org/10.1007/978-3-319-46523-4_30
24. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: 3rd International
Conference on Learning Representations, ICLR 2015, Conference Track Proceedings.
https://arxiv.org/abs/1412.6980
25. Boiński, T., Szymański, J., Dudek, B., Zalewski, P., Dompke, S., Czarnecka, M.: DBpedia
and YAGO based system for answering questions in natural language. In: Nguyen, N.T.,
Pimenidis, E., Khan, Z., Trawiński, B. (eds.) ICCCI 2018. LNCS (LNAI), vol. 11055,
pp. 383–392. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98443-8_35
36 Y. Zhao et al.
26. Unger, C., Ngomo, A.-C.N., Cabrio, E.: 6th open challenge on question answering over
linked data (QALD-6). In: Sack, H., Dietze, S., Tordai, A., Lange, C. (eds.) SemWebEval
2016. CCIS, vol. 641, pp. 171–177. Springer, Cham (2016). https://doi.org/10.1007/978-3-
319-46565-4_13
27. Speck, R., Ngomo, A.-C. N.: Ensemble learning of named entity recognition algorithms
using multilayer perceptron for the multilingual web of data. In: Proceedings of the
Knowledge Capture Conference, K-CAP 2017, Article No. 26, pp. 26:1–26:4. ACM (2017).
https://doi.org/10.1145/3148011.3154471
28. Zafar, H., Napolitano, G., Lehmann, J.: Formal query generation for question answering
over knowledge bases. In: Gangemi, A., et al. (eds.) ESWC 2018. LNCS, vol. 10843,
pp. 714–728. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93417-4_46
Community Detection in Knowledge
Graph Network with Matrix
Factorization Learning
1 Introduction
Since May 2012, Google released the project of Knowledge Graph (KG), and
announced that they would build next generation of intelligent search engine
based on KG. Main purpose of Knowledge Graph was to transform search from
string to data association application, and improve relevance of knowledge query
results and overall search experience by using graphical connection and contex-
tual data. The success of Knowledge Graph and its application to semantic tech-
nology has led to the reuse of this term to describe similar projects in semantic
research. Key technologies of Knowledge Graph include extracting entities and
their attribute information from web pages of the Internet, as well as relation-
ships graph or network between named entities [28].
Knowledge Graphs capture facts related to people, processes, applications,
data and things, and the relationships among them. They also capture evidence
that can be used to attribute the strengths of these relationships, where the
context is derived from. With the extensive development and application of
artificial intelligence technology in scientific research and practice, Knowledge
Graphs have been developing rapidly as an important subject of artificial intel-
ligence, promoting in-depth with machine learning applications and innovation
c Springer Nature Switzerland AG 2019
J. Song and X. Zhu (Eds.): APWeb-WAIM 2019 Workshops, LNCS 11809, pp. 37–51, 2019.
https://doi.org/10.1007/978-3-030-33982-1_4
38 X. Shi et al.
2 Related Works
as the entity and its related to ‘attribute - value’ relations, through the rela-
tionship between different entities connected, we can constitute the knowledge
structure graph or network. In a knowledge graph network, each node represents
the ‘entity’ of the real world, and each edge represents the relationship between
entities.
In artificial intelligence related solutions, we may combine method of net-
work science and knowledge graph model for further research. Adding knowl-
edge model in network science will be more help to understand the network
structure of semantic features and potential tacit knowledge. Also in knowledge
graph research, we can introduce network scientific analysis, such as community
detection, to obtain more quantitative data in the interaction and reasoning of
knowledge, enhancing knowledge graph topological simplification, reduce overall
computation efficiency, and promote the perspective effect for visual display.
Yang et al. [41] propose that network representation learning aims at learn-
ing distributed vector representation for each vertex in a network, which is also
increasingly recognized as an important aspect for network analysis. Most net-
work representation learning methods investigate network structures for learn-
ing. In reality, network vertices contain rich information (such as text), which
cannot be well applied with algorithmic frameworks of typical representation
learning methods.
According to bales et al. [3] explaination, network generated by natural
language has the topological properties common to other natural phenomena.
Therefore, through small-world features and scale-free topologies, as well as var-
ious network analysis methods, the development of control vocabulary in large-
scale knowledge graphs and feature description in given fields have been greatly
improved.
Matrix Factorization (Decomposition) Learning is one of the most widely
used method in Machine Learning, its main goal is to transfer the original data
Matrix to be expressed as the product of two or more low rank Matrix form,
after the decomposition of Matrix rank is far less than the original Matrix rank,
and application of low rank low dimension Matrix that deal with all kinds of
classification and clustering task. In recent years, more and more researchers pay
attention to matrix factorization applications, which can efficiently find hidden
potential factors or missing values in prediction matrix by decomposing data
into different compact and effective representation methods.
In community detection applications, relation of each nodes in graph-based
network are nonnegative due to their physical nature. The adjacency matrix A
as well as the Laplacian matrix completely represents the structure of network,
and A is non-negative naturally. Based on the consideration that there is no
any physical meaning to reconstruct a network with negative adjacency matrix,
using Nonnegative Matrix Factorization (NMF) to obtain new representations
of network with non-negativity constraints can achieve much productive effect
in community analysis [18,42].
Semantic networks are a typical type of network data representation incor-
porating linguistic information that describes concepts or objects and the
Community Detection in KG Network with MF Learning 41
when constructing RDF graphs of social networks. The algorithm not only real-
izes the community discovery function, but also marks the community by using
the markers that people use in the process of social markers and the semantic
relations inferred between the markers.
P (z) ∼ z −τ (1)
Bernat et al. [8] and Graham et al. [35] respectively carried out quantitative
network analysis on semantic Web ontology data, semantic database data of
language education and Wikipedia page data.
Community Detection in KG Network with MF Learning 43
We select KG example data from Google, which was a part of the Google
programming contest Web data released in 20021 , to quantitative analysis of the
relevant network experiment, where nodes represent web pages and directional
edges represent hyperlinks between them. The total number of nodes in these
data is 875,713, and the number of edges is 5,105,039. The average clustering
coefficient calculated is 0.5143, and the maximum diameter length is 21. We
use MATLAB PLFIT toolbox2 Law fitting to Pow, access to the Formula 1.
And related fitting chart in Fig. 2 has been marked by a dotted line Power Law
distribution and distribution parameters is 2.9, thus we can check that general
knowledge graph data networks are scale-free feature.
X ≈ UVT (2)
1
https://snap.stanford.edu/data/web-Google.html.
2
http://www.santafe.edu/∼aaronc/powerlaws.
44 X. Shi et al.
X = UVT + E (3)
5 Experiment Results
5.1 Expriments in DBLP Network
3
http://snap.stanford.edu/data/com-DBLP.html.
46 X. Shi et al.
2. Spectral Clustering (SC) make use of the spectrum (eigenvalues) of the simi-
larity matrix of the data to perform dimensionality reduction before clustering
in fewer dimensions [22].
3. Louvain (BGLL) can compute high modularity partitions and hierarchies of
large networks in quick time [5,24].
4. Greedy modularity optimisation (GMO) algorithm based on the multi-scale
algorithm but optimised for modularity [20].
5. Non-negative Matrix Factorization (NMF) based on clustering to be used in
community detection [40].
6. Symmetric Non-negative Matrix Factorization (SNMF) for undirected net-
work [37].
7. Bayesian Non-negative Matrix Factorization (BNMF) with Poisson likeli-
hood [7].
8. Bayesian Symmetric Non-negative Matrix Factorization (BSNMF) with Pois-
son likelihood.
We use three metrics accuracy (AC), normalized mutual information (NMI)
and Modularity to evaluate community detection performance on each experi-
ment [6,24,34,40]. Experimental results may been evaluated by comparing the
community label of each sample node with its label provided by the ground-truth
network.
In matrix factorization learning methods NMF, SNMF, BNMF and BSNMF,
we apply 10 independent experiments and every experiment iterates for 500
times. The initial community count k of BNMF and BSNMF is set to n/10.
In methods SPC, NMF and SNMF K is set to the ground-truth communities
number 4, 946, and other 5 methods can spontaneously compute communities
count in algorithm itself, such as our method BSNMF with ARD capturing 1, 788
communities.
Table 1 shows the detailed AC, NMI and Q values on the DBLP undirected
network. From this result, we can see that BSNMF with symmetric transfor-
mation and Bayesian inference promote the three indicators than SNMF and
BNMF. Our BSNMF method achieves the best performance, especially in AC
and NMI metric. GMO algorithm catch the highest Modularity Q with 0.9217.
4
http://rss.cnki.net/rss/.
Community Detection in KG Network with MF Learning 47
Methods Modularity
3-Klique 0.3579
GN 0.5530
BGLL 0.8294
GMO 0.9165
NMF 0.4209
SNMF 0.8165
BSNMF 0.9664
between the two points, and the weight of the edge is the number of papers
cooperated by the two authors. Accordingly, a large-scale scientific knowledge
collaborative network with 7,625 nodes and 12,672 edges is formed, in which the
average degree of each node is 1.66, that is, each person and 1.66 person-times
of peers in five years carry out cooperative publication.
Due to different researchers published papers, if an author fully publish as an
individual author, and their co-author network as an independent node. There
are totally 665 isolated nodes in this network, so before community detection
application, we can delete these isolated node first, to enhance the result that
with no further effect of the whole network. The new cooperative network X
is 6960 nodes, and the number of edges is still 12,672. We further compare
BSNMF method with other six methods to semantically process co-author net-
works of Chinese periodicals. We compare with CPM to complete subgraph fil-
tering method which uses 3-clique to discover the number of complete subgraphs
in the network, and the undiscovered nodes are used as independent nodes to
calculate the overall modularity.
Combined with the generated cooperative network, the author compares the
modularity results of existing community detection methods. BSNMF program
sets the initial community number as one fifth of the total number of nodes,
48 X. Shi et al.
6 Conclusion
The cross-application with different disciplines can greatly promote the contin-
uous development of knowledge graph technology. Many research methods and
results have been formed in the research of network quantitative analysis and
community detection.
Whether it can be effectively combined with the application of knowledge
graph in knowledge engineering, especially in the field of scientific and infor-
mation area, and improve the search, discovery and collaboration of knowledge,
needs to be further expanded by researchers.
Acknowledgments. This work was supported by NSFC (Grant No. 61772330), China
Next Generation Internet IPv6 project (Grant No. NGII20170609), and the Social
Science Planning of Shanghai (Grant No. 2018BTQ002).
References
1. Knowledge graph development report (2018). http://cips-upload.bj.bcebos.com/
KGDevReport2018.pdf
2. Aggarwal, C.C.: Social Network Data Analytics. Springer, New York (2011).
https://doi.org/10.1007/978-1-4419-8462-3
3. Bales, M.E., Johnson, S.B.: Graph theoretic modeling of large-scale semantic net-
works. J. Biomed. Inform. 39(4), 451–464 (2006)
4. Bhatt, S., et al.: Knowledge graph enhanced community detection and character-
ization. In: Proceedings of the Twelfth ACM International Conference on Web
Search and Data Mining, pp. 51–59. ACM (2019)
5. Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of
communities in large networks. J. Stat. Mech. Theory Exp. 2008(10), P10008
(2008)
6. Cai, D., He, X., Han, J., Huang, T.: Graph regularized nonnegative matrix fac-
torization for data representation. IEEE Trans. Pattern Anal. Mach. Intell. 33(8),
1548–1560 (2011)
7. Cemgil, A.T.: Bayesian inference for nonnegative matrix factorisation models.
Comput. Intell. Neurosci. 2009, 1–17 (2009). https://doi.org/10.1155/2009/785152
8. Corominas-Murtra, B., Valverde, S., Solé, R.: The ontogeny of scale-free syntax net-
works: phase transitions in early language acquisition. Adv. Complex Syst. 12(03),
371–392 (2009)
9. Council, N.R.: Network Science. The National Academies Press, Washing-
ton, DC (2005). https://doi.org/10.17226/11516, https://www.nap.edu/catalog/
11516/network-science
10. Danon, L., Dı́az-Guilera, A., Arenas, A.: The effect of size heterogeneity on com-
munity identification in complex networks. J. Stat. Mech. Theory Exp. 2006(11),
P11010 (2006)
11. Erétéo, G., Gandon, F., Buffa, M.: SemTagP: semantic community detection in
folksonomies. In: Proceedings of the 2011 IEEE/WIC/ACM International Confer-
ences on Web Intelligence and Intelligent Agent Technology, vol. 01, pp. 324–331.
IEEE Computer Society (2011)
12. Fevotte, C., Idier, J.: Algorithms for nonnegative matrix factorization with the
beta-divergence. Neural Comput. 23(9), 2421–2456 (2011)
50 X. Shi et al.
13. Han, X., Chen, D., Yang, H.: A semantic community detection algorithm based on
quantizing progress. Complexity 2019, 13 (2019)
14. Henk, V., Vahdati, S., Nayyeri, M., Ali, M., Yazdi, H.S., Lehmann, J.: Metaresearch
recommendations using knowledge graph embeddings (2019)
15. Ji, G., He, S., Xu, L., Liu, K., Zhao, J.: Knowledge graph embedding via dynamic
mapping matrix. In: Proceedings of the 53rd Annual Meeting of the Association for
Computational Linguistics and the 7th International Joint Conference on Natural
Language Processing (Volume 1: Long Papers), vol. 1, pp. 687–696 (2015)
16. Juanzi, L., Lei, H.: Review of knowledge graph research. J. ShanXi Univ. (Nat.
Aci. Ed.) 40(03), 454–459 (2017)
17. Kianian, S., Khayyambashi, M.R., Movahhedinia, N.: Semantic community detec-
tion using label propagation algorithm. J. Inf. Sci. 42(2), 166–178 (2016)
18. Lai, D., Wu, X., Lu, H., Nardini, C.: Learning overlapping communities in complex
networks via non-negative matrix factorization. Int. J. Mod. Phys. C 22(10), 1173–
1190 (2011)
19. Lancichinetti, A., Fortunato, S.: Community detection algorithms: a comparative
analysis. Phys. Rev. E 80(5), 056117 (2009)
20. Le Martelot, E., Hankin, C.: Fast multi-scale detection of relevant communities in
large-scale networks. Comput. J. 56(9), 1136–1150 (2013)
21. Li, M., Lee, W.C., Sivasubramaniam, A.: Semantic small world: an overlay network
for peer-to-peer search. In: Proceedings of the 12th IEEE International Conference
on Network Protocols, ICNP 2004, pp. 228–238. IEEE (2004)
22. Lu, H., Fu, Z., Shu, X.: Non-negative and sparse spectral clustering. Pattern Recog-
nit. 47(1), 418–426 (2014)
23. Martinez-Rodriguez, J.L., Lopez-Arevalo, I., Rios-Alvarado, A.B., Li, X.: A brief
comparison of community detection algorithms over semantic web data. In: ISW-
LOD@ IBERAMIA, pp. 34–44 (2016)
24. Newman, M.E.: Modularity and community structure in networks. Proc. Natl.
Acad. Sci. 103(23), 8577–8582 (2006)
25. Pastor-Satorras, R., Vespignani, A.: Epidemic spreading in scale-free networks.
Phys. Rev. Lett. 86(14), 3200 (2001)
26. Paulheim, H.: Machine learning with and for semantic web knowledge graphs. In:
d’Amato, C., Theobald, M. (eds.) Reasoning Web 2018. LNCS, vol. 11078, pp.
110–141. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00338-8 5
27. Qi, W., Fucai, C., Ruiyang, H., Zhengchaos, C.: Community detection in hetero-
geneous network with semantic paths. Acta Electron. Sin. 6, 030 (2016)
28. Qiao, L., Yang, L., Hong, D., Yao, L., et al.: Knowledge graph construction tech-
niques. J. Comput. Res. Dev. 53(3), 582–600 (2016)
29. Rörden, J., Revenko, A., Haslhofer, B., Blumauer, A.: Network-based knowledge
graph assessment. In: Proceedings of the Posters and Demos Track of the 13th
International Conference on Semantic Systems - SEMANTiCS 2017 (2017)
30. Schmidt, M.N., Laurberg, H.: Nonnegative matrix factorization with gaussian pro-
cess priors. Comput. Intell. Neurosci. 2008, 3 (2008)
31. Shi, X., Lu, H., He, Y., He, S.: Community detection in social network with pair-
wisely constrained symmetric non-negative matrix factorization. In: Proceedings
of the 2015 IEEE/ACM International Conference on Advances in Social Networks
Analysis and Mining 2015, ASONAM 2015, pp. 541–546. ACM, New York (2015)
32. Steyvers, M., Tenenbaum, J.B.: The large-scale structure of semantic networks:
statistical analyses and a model of semantic growth. Cogn. Sci. 29(1), 41–78 (2005)
Community Detection in KG Network with MF Learning 51
33. Stokman, F.N., de Vries, P.H.: Structuring knowledge in a graph. In: van der
Veer, G.C., Mulder, G. (eds.) Human-Computer Interaction, pp. 186–206. Springer,
Heidelberg (1988). https://doi.org/10.1007/978-3-642-73402-1 12
34. Tang, L., Liu, H.: Community detection and mining in social media. Synth. Lect.
Data Min. Knowl. Discov. 2(1), 1–137 (2010)
35. Thompson, G.W., Kello, C.: Walking across wikipedia: a scale-free network model
of semantic memory retrieval. Front. Psychol. 5, 86 (2014)
36. Travers, J., Milgram, S.: The small world problem. Phychol. Today 1(1), 61–67
(1967)
37. Wang, F., Li, T., Wang, X., Zhu, S., Ding, C.: Community discovery using non-
negative matrix factorization. Data Min. Knowl. Discov. 22(3), 493–521 (2011)
38. Xia, Z., Bu, Z.: Community detection based on a semantic network. Knowl. Based
Syst. 26, 30–39 (2012)
39. Xiaohua, S., Hongtao, L.: Research of community detection in scientific cooperation
network with Bayesian NMF. Data Anal. Knowl. Disc. 1(09), 49–56 (2017)
40. Xu, W., Liu, X., Gong, Y.: Document clustering based on non-negative matrix
factorization. In: Proceedings of the 26th Annual International ACM SIGIR Con-
ference on Research and Development in Information Retrieval, pp. 267–273. ACM
(2003)
41. Yang, C., Liu, Z., Zhao, D., Sun, M., Chang, E.: Network representation learning
with rich text information. In: Twenty-Fourth International Joint Conference on
Artificial Intelligence (2015)
42. Yang, J., Leskovec, J.: Overlapping community detection at scale: a nonnegative
matrix factorization approach. In: Proceedings of the Sixth ACM International
Conference on Web Search and Data Mining, pp. 587–596. ACM (2013)
43. Yang, J., Leskovec, J.: Defining and evaluating network communities based on
ground-truth. Knowl. Inf. Syst. 42(1), 181–213 (2015)
44. Yu, X., Jing, Y., Zhiqiang, X.: A semantic overlapping community detecting algo-
rithm in social network based on random walk. J. Comput. Res. Dev. 52(2), 499–
511 (2015)
45. Zhang, H.: The scale-free nature of semantic web ontology. In: Proceedings of the
17th International Conference on World Wide Web, pp. 1047–1048. ACM (2008)
A Survey of Relation Extraction
of Knowledge Graphs
Abstract. With the widespread use of big data, knowledge graph has become a
new hotspot. It is used in intelligent question answering, recommendation
system, map navigation and so on. Constructing a knowledge graph includes
ontology construction, annotated data, relation extraction, and ontology
inspection. Relation extraction is to solve the problem of entity semantic linking,
which is of great significance to many natural language processing applications.
Research related to relation extraction has gained momentum in recent years,
necessitating a comprehensive survey to offer a bird’s-eye view of the current
state of relation extraction. In this paper, we discuss the development process of
relation extraction, and classify the relation extraction algorithms in recent years.
Furthermore, we discuss deep learning, reinforcement learning, active learning
and transfer learning. By analyzing the basic principles of supervised learning,
unsupervised learning, semi-supervised learning and distant supervision, we
elucidate the characteristics of different relation extraction algorithms, and give
the potential research directions in the future.
1 Introduction1
1
The Science Foundation for Youth Science and Technology Innovation of Nanjing University
of Aeronautics and Astronautics under Grant NJ20160028, NT2018028, NS2018057.
The entity includes country and city, and the attribute includes population and area.
An example of (entity, relationship, entity) is (Japan - capital - Tokyo). A concrete
expression for (entity - attribute - attribute) value is (Japan - population - 100 million).
The relationships in the syntactic tree include “character of” and “written by”. Since
“article” and side ‘pobj’ are connected with the preposition “of”, it is concluded that the
object of “character of” is entity “article”. The closest word to “character of” is “who”,
so it is the subject of “character of”.
In the common relation extraction algorithms, pattern matching and dictionary
driving require experts with professional skills to build a large-scale knowledge base
and make rules manually, which is time-consuming and laborious, inefficient and poor
in portability [17]. In order to solve these shortcomings, methods based on machine
learning are generated. At present, the relation extraction method based on machine
learning has been widely used in various fields. This method treats the extraction of
relation as a simple classification problem. The basic judgment is made by simple
manpower, and then the classifier is constructed. The introduction of adaptive infor-
mation extraction, open information extraction, and other technical means promotes the
rapid development of machine learning. Relation extraction methods based on machine
learning can be divided into four categories according to whether the training data are
labeled or not, mainly including supervised learning, unsupervised learning, semi-
supervised learning, and distant supervised learning. Table 1 is the analysis and
comparison of four kinds of relation extraction algorithms based on machine learning.
Table 2 is the analysis and comparison of classical relation extraction algorithms in
machine learning.
A Survey of Relation Extraction of Knowledge Graphs 57
Machine learning has also undergone four important periods: Deep Learning,
Reinforcement Learning, Transfer Learning, and Active Learning. The birth of the
corresponding algorithm has also made a great contribution to the improvement of
machine learning performance and scalability, making the scope of application of
machine learning more and more. The active learning algorithm is the frontier field of
machine learning and relation extraction. It is a learning method suitable for small data
and non-label data occupying large scenes and is often applied in a semi-supervised or
weakly supervised environment, together with Transfer Learning. For example, Denis
Gudovskiy and Alec Hodgkinson proposed an EBA attention mechanism combining
deep learning and active learning based on the latest DNNS method [18], which
improved the accuracy of feature extraction of MNIST and SVHN data sets and
achieved great results. Deep learning is often combined with reinforcement learning to
become deep reinforcement learning. For example, in 2019, Wang et al. designed a
new neural network structure that is no model [19], which has the advantage of being
easy to integrate with reinforcement learning.
3 Unsupervised Learning
Unsupervised Learning does not need to have the label data, can intelligently extract
the entity relations of the data. The simple goal of unsupervised learning is to build a
base class model from the data and train the algorithm to generate its own data instance.
Since the training data has no label, it can make up for the deficiency in manual
extraction relation and has the characteristics of strong adaptability and high efficiency.
Unsupervised relation extraction algorithm was first proposed by Hasegawa in 2004
[20], in which he proposed a named entity relation extraction method based on mis-
directed. Since then, many unsupervised algorithms have improved on this approach.
In 2013, Socher, Chen et al. [21] aiming at the lack of reasoning ability of discrete
entities and relationships in a knowledge base, proposed an expressive neural tensor
networks model suitable for the reasoning of two-entity relationships, further proving
the applicability of unsupervised learning algorithm. They reproduced the methods of
Bordes et al. [22] and Jenatton et al. [23], and improved and optimized them on the
basis of this method, so as to prove that when word vectors are initialized into learning
vectors in the unsupervised corpus, the accuracy of relevant models established will be
higher. Traditional unsupervised learning is basically a clustering algorithm, and most
of these learning methods are based on statistical methods and have no feedback
ability. Therefore, Heck et al. combined the statistical method of the deep semantic
method of AI community and proposed an unsupervised semantic analysis learning
algorithm based on large-scale semantic knowledge graph, which does not need
semantic pattern design, data collection or manual annotation [24]. In addition, a graph
crawling algorithm for data mining is proposed. In their experiments, they combined
the two methods and achieved results similar to those of semantic analyzers that
monitor annotation training. In 2019, Luus and others skillfully combined unsupervised
algorithms with interactive transfer learning algorithms and active learning tools [25].
They selected images of 1000 labels as data sets for feature information extraction. On
the basis of CNN, by reducing the dimensions of the semi-supervised t-SNE algorithm
model, using the interactive transfer learning method and the low complexity active
learning tool, the effect of significantly improving the marking efficiency is achieved.
4 Supervised Learning
suitable for large-scale data. In recent years, as large-scale knowledge bases such as
Wikipedia and Freebase keep emerging, in order to further expand the scale, it is
necessary to study new methods of constructing knowledge graphs. Therefore, Dong
et al. [30] proposed a method of how to build a knowledge base of network scale in
2014. The knowledgebase system constructed by them has high accuracy in checking
the facts.
5 Semi-supervised Learning
Zhou and Zhang proposed the Co-Trade algorithm [36]. The Co-Trade algorithm
makes full use of the marked data in the noiseless environment. However, the Co-Trade
algorithm cannot use simple voting to determine the credibility of the mark. Table 3
shows the advantages and disadvantages of various Co-Training algorithms. With the
continuous improvement of Co-Training algorithms, applications based on Co-Training
algorithms gradually penetrate into many fields, such as natural language processing,
image retrieval and pattern recognition.
developed an active learning method for filtering natural language information and
extracting domain models [46]. Their method is calculated to be as high as 96% due to
the use of active learning. To further improve the accuracy of identifying natural
language elements, they are studying how to improve the transfer learning algorithm
and apply it to the extraction domain model.
Distant Supervision was first proposed by Mintz [47] and others at ACL2009. Because
of the high cost and limited number of labeled training data in supervised algorithms,
purely unsupervised methods can use large amounts of data and extract relationships,
but the resulting relationships are not easily labeled to the relationships required for a
particular knowledge base. Therefore, they combined the advantages of supervision
and unsupervised, and proposed the distant supervision relation extraction. It also
proves the role of syntactic features in the information extraction of long-distance
supervision. In 2015, Daojian Zeng, Kang Liu and others discovered that there are two
types of problems in distant supervision relation extraction. One type is that heuristic
text alignment may cause label errors; the other is due to the limitations of the
application scope of traditional statistical models, and the feature extraction process
may be cumbersome and may cause unknown errors. In 2012, Surdeanu et al. [48]
proposed the distant supervision hypothesis of “multi-instance learning”. In 2014,
Zeng, Lai et al. [49] used convolutional deep neural network to extract lexical and
sentence hierarchical features. Inspired by them, Zeng et al. borrowed the above-
mentioned “multi-instance learning” method for noise data and error labels, and trained
the model with a highly confident training data set. In response to the limitations of
statistical models, they also proposed a Piecewise Convolutional Neural Networks
(PCNN) [50]. The model is relatively simple, and the difference from the traditional
convolutional neural network is mainly due to the change of the pooling layer. Finally,
they implemented a multi-instance learning PCNN to extract the long-distance super-
vision relationship. The advantage of this method lies in the introduction of innovation
into the multi-instance learning relation extraction task. It is proposed that the PCNNs
model can get more useful information in the sentence. The disadvantage is that “multi-
instance learning” will waste some useful sentence features.
With the wide application of distant supervision, noise data and error labels are also
increasing, which seriously affects the performance of relation extraction. In order to
further alleviate this problem, in 2016, Yankai Lin and others improved the papers
published by Daojian Zeng et al. in 2015. They believe that the “multi-example
learning” method alleviates the problem of more noise data, but Since only the sample
with the highest confidence in each packet is used as the training data, a lot of useful
information is lost while filtering out the noise. A convolutional neural network model
based on the extraction of attention mechanism is proposed [51]. In this model, CNN
are used to embed the semantics of sentences, and sentence-level attention mechanisms
are used to assign weights to sentences. Their experimental results show that this new
model has better prediction performance and better performance than advanced feature-
based methods and neural network methods. The disadvantage is that the attention
A Survey of Relation Extraction of Knowledge Graphs 63
mechanism needs to loop through each relationship, which is more complicated, and
only uses the attention mechanism at the sentence level. The word hierarchy does not
consider the attention mechanism. Guanying Wang put forward a new perspective on
the cause of the mislabeling problem in 2018. They suppose that the error label of
distant supervision is mainly caused by the incomplete use of knowledge graph
information, so the unlabeled distant monitoring method can solve the noise label.
Under the assumption that the relationship label is not used, the knowledge of the
classifier is monitored using the prior knowledge of the knowledge graph [52].
Experiments show that this new method works well and its performance is better than
the most advanced results in distant supervision, which can effectively solve the noise
label problem. Table 5 shows a general analysis and comparison of the above distant
supervision algorithms.
7 Conclusion
Recently, with the increasing popularity of artificial intelligence, big data and block-
chain, knowledge graph has played an important role in different areas. A large number
of applications based on deep learning and basic model algorithms emerge. In this
paper, the relation extraction algorithms of existing knowledge graphs are analyzed and
compared, and some popular relation extraction algorithms and current research status
are introduced. The advantages and disadvantages of the four types of machine learning
algorithms are detailed, and their algorithm ideas and development history are
expounded respectively. Community detection research, embedding models, intelligent
answer system are becoming hot topics and important research areas in complex net-
work fields. Not only can we analyze communities in the knowledge distribution and
64 A. Li et al.
evolution trajectory detection domains, but also explore the knowledge diagrams of a
unified embedded model. Definitely, knowledge graph will enter a bigger picture in the
future.
References
1. Xu, B., et al.: CN-DBpedia: a never-ending chinese knowledge extraction system. In:
Benferhat, S., Tabia, K., Ali, M. (eds.) IEA/AIE 2017. LNCS (LNAI), vol. 10351,
pp. 428–438. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-60045-1_44
2. Niu, X., Sun, X.R., Wang, H.F., et al.: Zhishi.meweaving Chinese linking open data. In:
Proceedings of the 10th International Semantic Web Conference, Bonn, Germany,
pp. 205–220 (2011)
3. Pan, J.Z., Horrocks, I.: RDFS(FA): connecting RDF(S) and OWL DL. IEEE Trans. Knowl.
Data Eng. 19(2), 192–206 (2007). https://doi.org/10.1109/TKDE.2007.37
4. Mcguiness, D.L., Harmelen, F.: OWL Web ontology language overview. W3C Recomm.
63(45), 990–996 (2004)
5. Qiao, L., Yang, L., Hong, D., et al.: Knowledge graph construction techniques. J. Comput.
Res. Dev. 53(3), 582–600 (2016). (in Chinese)
6. Zhang, C., Chang, L., Wang, W., Chen, H., Bin, C.: Question and answer over fine-grained
knowledge graph based on BiLSTM-CRF (2019)
7. Proceedings of the 6th Message Understanding Conference (MUC - 7). National Institute of
Standars and Technology (1998)
8. Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: a core of semantic knowledge unifying
WordNet and Wikipedia. In: Proceedings of WWW (2007)
9. Banko, M., Cafarella, M.J., Soderland, S., et al.: Open information extraction for the web. In:
Proceedings of the 20th Int Joint Conf on Artificial Intelligence, pp. 2670–2676. ACM,
New York (2007)
10. Yang, B., Cai, D.-F., Yang, H.: Progress in open information extraction. J. Chin. Inf.
Process. 4, 1–11 (2014)
11. Etzioni, O., Cafarella, M., Downey, D., et al.: Unsupervised named-entity extraction from
the web: an experimental study. Artif. Intell. 165(1), 91–134 (2005)
12. Banko, M., Cafarella, M.J., Soderland, S., et al.: Open information extraction from the web.
In: Proceedings of IJCAI (2007)
13. Banko, M., Etzioni, O.: The tradeoffs between open and traditional relation extraction. In:
Proceedings of Annual Meeting of the Association for Computational Linguistics (2008)
14. Wu, F., Weld, D.S.: Open information extraction using Wikipedia. In: Proceedings of
Annual Meeting of the Association for Computational Linguistics, pp. 118–127 (2010)
15. Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information extraction.
In: Proceedings of Conference on Empirical Methods in Natural Language Processing
(2011)
16. Etzioni, O., Fader, A., Christensen, J., et al.: Open information extraction: the second
generation. In: Proceedings of International Joint Conference on Artificial Intelligence
(2011)
17. Xu, J., Zhang, Z., Wu, Z.: Review on techniques of entity relation extraction. New Technol.
Libr. Inf. Serv. 168(8), 18–23 (2008)
18. Gudovskiy, D., Hodgkinson, A.: Explanation-based attention for semi-supervised deep
active learning (2019)
A Survey of Relation Extraction of Knowledge Graphs 65
19. Wang, Z., Schaul, T., Hessel, M., van Hasselt, H., Lanctot, M.: Dueling network
architectures for deep reinforcement learning (2019)
20. Hasegawa, T., Sekine, S., Grishman, R.: Discovering relations among named entities from
large corpora. In: Proceedings of ACL-2004, pp. 415–422 (2004)
21. Socher, R., Chen, D., Manning, C.D., Ng, A.Y.: Reasoning with neural tensor networks for
knowledge base completion (2013)
22. Bordes, A., Weston, J., Collobert, R., Bengio, Y.: Learning structured embeddings of
knowledge bases. In: AAAI (2011)
23. Jenatton, R., Le Roux, N., Bordes, A., Obozinski, G.: A latent factor model for highly multi-
relational data. In: NIPS (2012)
24. Heck, L., Hakkani-Tür, D., Tur, G.: Leveraging knowledge graphs for web-scale
unsupervised semantic parsing. In: ISCA (2013)
25. Luus, F., Khan, N., Akhalwaya, I.: Active learning with TensorBoard projector (2019)
26. Liu, F., Zhong, Z., Lei, L., Wu, Y.: Entity relation extraction method based on machine
learning (2013)
27. Xia, S., Lehong, D.: Feature-based approach to Chinese term relation extraction In: 2009
International Conference on Signal Processing Systems, pp. 410–414 (2009)
28. Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines. Cambridge
University Press, Cambridge University (2000)
29. Zhang, T.: Regularized winnow methods. In: Advances in Neural Information Processing
Systems 13, pp. 703–709 (2001)
30. Dong, X.L., Gabrilovich, E., Heitz, G.: Knowledge vault: a web-scale approach to
probabilistic knowledge fusion. In: KDD (2014)
31. Zhou, Z.-H.: Cooperative Training Style in Semi-Supervised Learning. Machine Learning
and Its Applications, pp. 259–275. Tsinghua University Press, Beijing (2007)
32. Chapelle, O., Schölkopf, B., Zien, A. (eds.): Semi-Supervised Learning (2006). The MIT
Press, Cambridge
33. Pise, N.N., Kulkarni, P.: A survey of semi-supervised learning methods. In: 2008
International Conference on Computational Intelligence and Security (2008)
34. Zhou, Z.-H., Li, M.: Tri-training: exploiting unlabeled data using three classifiers. IEEE
Trans. Knowl. Data Eng. 17(11), 1529–1541 (2005)
35. Li, M., Zhou, Z.-H.: Improve computer-aided diagnosis with machine learning techniques
using undiagnosed samples. IEEE Trans. Syst. 19(11), 1479–1493 (2007)
36. Zhang, M.-L., Zhou, Z.-H.: CoTRADE: confident co-training with data editing. IEEE Trans.
Syst. Man Cybern. Part B Cybern. 41, 1612–1626 (2011)
37. Hoffmann, R., Zhang, C., Ling, X., Zettlemoyer, L., Weld, D.S.: CoTRADE: knowledge-
based weak supervision for information extraction of overlapping relations. In: The 49th
Annual Meeting of the Association for Computational Linguistics: Human Language
Technologies, pp. 541–550 (2011)
38. Li, Q., Han, Z., Wu, X.-M.: CoTRADE: deeper insights into graph convolutional networks
for semi-supervised learning. In: The Thirty-Second AAAI Conference on Artificial
Intelligence (AAAI-18) (2018)
39. Luan, Y., Wadden, D., He, L., Shah, A., Ostendorf, M., Hajishirzi, H.: CoTRADE: a general
framework for information extraction using dynamic span graphs. In: NAACL (2019)
40. Agrawal, K., Mittal, A., Pudi, V.: CoTRADE: scalable, semi-supervised extraction of
structured information from scientific literature, pp. 11–20. Association for Computational
Linguistics (2019)
66 A. Li et al.
41. Kim, S.N., Medelyan, O., Kan, M.-Y., Baldwin, T.: SemEval-2010 task 5: automatic
keyphrase extraction from scientifific articles. In: Proceedings of the 5th International
Workshop on Semantic Evaluation, SemEval 2010, Stroudsburg, PA, USA, pp. 21–26.
Association for Computational Linguistics (2010)
42. Gollapalli, S.D., Caragea, C.: Extracting keyphrases from research papers using citation
networks. In: Proceedings of the Twenty-Eighth AAAI Conference on Artifificial
Intelligence, AAAI 2014, pp. 1629–1635. AAAI Press (2014)
43. Jaidka, K., Chandrasekaran, M.K., Rustagi, S., Kan, M.-Y.: Insights from CL-SciSumm
2016: the faceted scientific document summarization shared task. Int. J. Digit. Libr. 19(2),
163–171 (2016)
44. Agrawal, K., Mittal, A., Pudi, V.: Scalable, semi-supervised extraction of structured
information from scientifific literature (2019)
45. Drugman, T., Pylkkonen, J., Kneser, R.: Active and semi-supervised learning in ASR:
benefits on the acoustic and language models (2019)
46. Arora, C., Sabetzadeh, M., Nejati, S., Briand, L.: An active learning approach for improving
the accuracy of automated domain model extraction (2019)
47. Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction
without labeled data, ACL 2009 (2009)
48. Surdeanu, M., Tibshirani, J., Nallapati, R., Manning, C.D.: Multi-instance multi-label
learning for relation extraction. In: Proceedings of EMNLP-CoNLL, pp. 455–465 (2012)
49. Zeng, D., Liu, K., Lai, S., Zhou, G., Zhao, J.: Relation classification via convolutional deep
neural network. In: Proceedings of COLING, pp. 2335–2344 (2014)
50. Zeng, D., Liu, K., Chen, Y., Zhao, J.: Distant supervision for relation extraction via
piecewise convolutional neural networks (2015)
51. Lin, Y., Shen, S., Liu, Z., Luan, H., Sun, M.: Neural relation extraction with selective
attention over instances (2016)
52. Wang, G., Zhang, W., Wang, R., Zhou, Y.: Label-free distant supervision for relation
extraction via knowledge graph embedding (2018)
DSEA
PEVR: Pose Estimation for Vehicle
Re-Identification
Saifullah Tumrani1(B) , Zhiyi Deng1 , Abdullah Aman Khan1 , and Waqar Ali1,2
1
School of Computer Science and Engineering, University of Electronic Science
and Technology of China, Chengdu 611731, China
{saif.tumrani,abdkhan}@std.uestc.edu.cn, [email protected],
[email protected]
2
Faculty of Information Technology, The University of Lahore, Lahore, Pakistan
1 Introduction
Vehicles i.e. cars, buses, trucks, etc. are a necessary part of human life, which
plays a vital role in transportation and mass transfer. Moreover, vehicles are an
important object class in advanced monitoring systems having many applications
such as recognition at the parking lot, automatic toll collecting on the highway,
vehicle tracking and collection of traffic data. Research regarding person re-
id is being conducted from the past decade. Until recently, vehicle re-id was
ignored but many researchers in computer vision community are now focusing
on this area. However, this area of research still holds a lot of room for future
research and has many significant gaps as compared to person re-id. The previous
researches focused on detection [1], segmentation [2] and classification [3]. Re-id
aims to find the same vehicle (query image) in the vehicle surveillance database
containing different images captured by different cameras, if the query image is
found in the database, it is considered a re-id. Recent researchers focused on
detection, classification, categorization and driver behavior modeling. Vehicle
re-id is an exercise of matching the same vehicle’s images with different non-
overlapping images captured by surveillance cameras. Vehicle re-id is a very
c Springer Nature Switzerland AG 2019
J. Song and X. Zhu (Eds.): APWeb-WAIM 2019 Workshops, LNCS 11809, pp. 69–78, 2019.
https://doi.org/10.1007/978-3-030-33982-1_6
70 S. Tumrani et al.
challenging task, as the difference between vehicles of the same model is negligible
and the vehicle appearance may change with varying lighting conditions and
views. In Fig. 1, each column shows distinct directions of vehicles with different
view conditions. Thus, vehicle re-id models should be able to precisely find the
inter-class and intra-class differences.
Fig. 1. The Poses in each column belongs to same vehicle, showing front, back, left
and right sides of the vehicles with various viewpoints.
Global Average
Inception V3 Pooling
1
2048
Front
Weighted Back
Sum Left
4 Right
Fig. 2. PEVR framework: first our model transforms RBG images to tensor of activa-
tion through convolution backbone, while generating probability maps associated with
different sides. Inception-V3* module in lower branch is modified version of Inception-
V3 as stated below in inception architecture section.
2 Related Work
The previous researches regarding vehicle re-id can be divided based on appear-
ance, license plate recognition, and spatiotemporal properties. The appearance
of a vehicle can have high similarities such as the same color and type. Addition-
ally, problems like occlusions, illumination variation, and multiple views make
it more challenging. However, most of the work is done using visual appearance
features. Some early researchers relied on various sensors other than camera,
while we only rely on visual appearance of vehicle same as facial identification
[11–15] or pedestrian identification [7,16,17]. Ferus et al. [18] used feature pool
to extract appearance features of vehicles. Matei et al. [19] exploited kinematic
and appearance constraints to track vehicle across nonoverlapping cameras. Liu
et al. [20] proposed a model, that extracts features from local regions. As Fig. 1
72 S. Tumrani et al.
3 Proposed Method
In this work, we verify our model using two standard deep learning architecture
namely Inception V-3 [24] and ResNet [25]. We used inception-V3 [24] as the
backbone for PEVR models. So, we begin to explain inception-V3 [24] architec-
ture followed by the details for our PEVR model and then how it is integrated
into our proposed framework will be described.
4 Experiments
We implemented the proposed model on VeRi [9], our experiments show that
the approach effectively increases the performance of vehicle re-id by estimating
the pose of vehicles.
4.1 Datasets
VeRi dataset [9] is a publically available dataset for vehicle Re-ID containing
information of vehicle ID, camera ID, color, type. VeRi dataset is collected with
20 cameras and different viewpoints, occlusions, illuminations and resolutions in
real-world traffic surveillance environment covering an area of 1 Km2 consisting
of 776 vehicles of which 576 vehicles are used for and training and remaining
200 are used for testing. There are 37778 training images while 11579 images for
testing. For evaluation, each vehicle is captured from every camera is applied
as query images containing 1678 images. Additionally, vehicle color and model
information is also available (Table 1).
CompCars dataset [10] is a large publically available dataset containing data
obtained from urban surveillance videos and network. CompCars [10] contains
136,726 partial images of different vehicle, their properties and viewpoints. We
have further split the dataset and labeled the vehicles with 4 different viewpoints
categorized as front, back, left and respectively. We trained our model with both
of these datasets and tested on testing set of VeRi [9] dataset and manually split
CompCars [10] for training and testing.
74 S. Tumrani et al.
VeRi dataset [9] lacks pose labels of vehicles, by utilizing images of CompCars
[10] which are labeled the vehicles viewpoints to train our model. We categories
the datasets manually in four subsets as i.e. front, back, left and right from
training images. We have scaled the samples to 227 * 227 to train and test our
network.
Table 1. The detail information of dataset
Table 3. RANK-1, RANK-5 and map accuracies of the comparison methods on veri
dataset [9]
TP
P recision = (1)
TP + FP
Average Precision combines recall and precision for retrieval of ranked results.
It is mean of precision scores after retrieval of each relevant image.
n
p(l)g(l)
AP = (2)
Gt
l=1
where n is the total number of test images and Gt is the number of ground
truth images. However, p(l) is precision at lth position, and g(l) is the function
at value of l as a indicator and if the match is found at l th or not. The
arithmetic mean of average precision for a set of Q query images is called
mean average precision (mAP) and is formulated as.
Q
AP (k)
mAP = (3)
Q
k=1
– Rank: It measures the similarity of a test image with its class image. For
example. If t1 which is test1 image corresponds to class1 and it is found in
top1 then it is said to be rank@1, however, if it is found in top 5 then it’s
called rank@5 and so on.
76 S. Tumrani et al.
0.9
0.8
0.7
Recognition rate
0.6
0.5
0.4
0.3
0
0 1 2 3 4 5 6 7 8 9 10
Rank score
Fig. 3. CMC Curve on Proposed method along with other state-of-art methods.
To evaluate the effectiveness of our method, we tested and provided the results
in Table 3. We have compared our approach with different state-of-art methods
including VGG+CCL [3], VGG+Triplet Loss [26] and Mixed Diff + CLL [3] and
achieved better performance, the results can be seen in Table 3.
We can see our method achieved significantly more accuracy in Rank-1 than
other state-of-art methods, and even better in Rank-5. PEVR can minimize
intra-class differences and maximizes the inter-class difference at the same time
and is fit for the task. We repeated the testing phase to evaluate the model
prediction accuracy and got the CMC curve as shown in Fig. 3. Match rate from
rank-1 to rank-5 of the proposed method. We used the manually split dataset to
train our network, Table 3 shows the rank-1 and rank-5 along with mAP of our
PEVR: Pose Estimation for Vehicle Re-Identification 77
method compared with the other three methods. From results, its a significant
margin of increment of 20% to 30% compared with previous methods our PEVR
performed better.
5 Conclusion
In this paper, we propose a pose estimation method for vehicle Re-Id (PEVR)
model for estimating pose. Our method is less computationally expensive and
achieved significant results compare with other methods. Driving direction can
be found by estimating the pose of vehicle in future work. Our method also
collaborates with visual features, we demonstrate that indeed a simple network,
if trained properly on a large number of image dataset can outperform other
methods. The results show a significant improvement in efficiency and accuracy.
References
1. Zhang, Y., Bai, Y., Ding, M., Li, Y., Ghanem, B.: W2f: a weakly-supervised to fully-
supervised framework for object detection. In: The IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), June 2018
2. Mahendran, S., Vidal, R.: Car segmentation and pose estimation using 3d object
models (2015)
3. Liu, H., Tian, Y., Yang, Y., Pang, L., Huang, T.: Deep relative distance learning:
tell the difference between similar vehicles. In: The IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), June 2016
4. Ahmed, M.J., Sarfraz, M., Zidouri, A., Al-Khatib, W.G.: License plate recogni-
tion system. In: 10th IEEE International Conference on Electronics, Circuits and
Systems, 2003. ICECS 2003. Proceedings of the 2003, vol. 2, pp. 898–901 (2003)
5. Bulan, O., Kozitsky, V., Ramesh, P., Shreve, M.: Segmentation- and annotation-
free license plate recognition with deep localization and failure identification. IEEE
Trans. Intell. Transp. Syst. 18(9), 2351–2363 (2017)
6. Anagnostopoulos, C.N.E., Anagnostopoulos, I.E., Loumos, V., Kayafas, E.: A
license plate-recognition algorithm for intelligent transportation system applica-
tions. IEEE Trans. Intell. Transp. Syst. 7(3), 377–392 (2006)
7. Ahmed, E., Jones, M., Marks, T.K.: An improved deep learning architecture for
person re-identification. In: The IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), June 2015
8. Xiao, T., Li, H., Ouyang, W., Wang, X.: Learning deep feature representations
with domain guided dropout for person re-identification. In: The IEEE Conference
on Computer Vision and Pattern Recognition (CVPR), June 2016
9. Liu, X., Liu, W., Ma, H., Fu, H.: Large-scale vehicle re-identification in urban
surveillance videos. In: 2016 IEEE International Conference on Multimedia and
Expo (ICME), pp. 1–6, July 2016
10. Yang, L., Luo, P., Loy, C.C., Tang, X.: A large-scale car dataset for fine-grained
categorization and verification. In: 2015 IEEE Conference on Computer Vision and
Pattern Recognition (CVPR), pp. 3973–3981, June 2015
11. Sun, Y., Wang, X., Tang, X.: Deep learning face representation from predicting
10,000 classes. In: The IEEE Conference on Computer Vision and Pattern Recog-
nition (CVPR), June 2014
78 S. Tumrani et al.
12. Sun, Y., Chen, Y., Wang, X., Tang, X.: Deep learning face representation by joint
identification-verification. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence,
N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems
27, pp. 1988–1996. Curran Associates, Inc. (2014)
13. Sun, Y., Liang, D., Wang, X., Tang, X.: Deepid3: Face recognition with very deep
neural networks (2015)
14. Wen, Y., Zhang, K., Li, Z., Qiao, Y.: A discriminative feature learning approach
for deep face recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.)
ECCV 2016. LNCS, vol. 9911, pp. 499–515. Springer, Cham (2016). https://doi.
org/10.1007/978-3-319-46478-7 31
15. Taigman, Y., Yang, M., Ranzato, M., Wolf, L.: Deepface: Closing the gap to
human-level performance in face verification. In: The IEEE Conference on Com-
puter Vision and Pattern Recognition (CVPR), June 2014
16. Liao, S., Hu, Y., Zhu, X., Li, S.Z.: Person re-identification by local maximal occur-
rence representation and metric learning. In: The IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), June 2015
17. Liao, S., Hu, Y., Li, S.Z.: Joint dimension reduction and metric learning for person
re-identification. arXiv preprint arXiv:1406.4216 (2014)
18. Feris, R.S., et al.: Large-scale vehicle detection, indexing, and search in urban
surveillance videos. IEEE Trans. Multimedia 14(1), 28–42 (2012)
19. Matei, B.C., Sawhney, H.S., Samarasekera, S.: Vehicle tracking across nonoverlap-
ping cameras using joint kinematic and appearance features. CVPR 2011, 3465–
3472 (2011)
20. Liu, X., Zhang, S., Huang, Q., Gao, W.: Ram: a region-aware deep model for vehicle
re-identification. In: 2018 IEEE International Conference on Multimedia and Expo
(ICME), pp. 1–6, July 2018
21. Liu, X., Liu, W., Mei, T., Ma, H.: Provid: progressive and multimodal vehicle
reidentification for large-scale urban surveillance. IEEE Trans. Multimedia 20(3),
645–658 (2018)
22. Khare, V., et al.: A novel character segmentation-reconstruction approach for
license plate recognition. Expert Syst. Appl. 131, 219–239 (2019)
23. Hendry, Chen, R.C.: Automatic license plate recognition via sliding-window
darknet-yolo deep learning. Image Vis. Comput. 87, 47–56 (2019)
24. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the incep-
tion architecture for computer vision. In: The IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), June 2016
25. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition.
In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
June 2016
26. Ding, S., Lin, L., Wang, G., Chao, H.: Deep feature learning with relative distance
comparison for person re-identification. Pattern Recogn. 48(10), 2993–3003 (2015)
The Research of Chinese Ethnical Face
Recognition Based on Deep Learning
1 Introduction
Face recognition refers to the technology capable of identifying or verifying the
identity of subjects in images or videos. The first face recognition algorithm was
developed in the early seventies [13]. Since then, related work have continued
and the accuracy has been also improved gradually. Nowadays face recognition
has been widely applied and accepted in various occasions, such as access con-
trol, fraud detection, monitoring system and social media. One pivotal factor
is its non-intrusive nature [28]. For example, in modern face recognition sys-
tem the user just needs to stand in the field of view of a camera to finish an
authentication.
c Springer Nature Switzerland AG 2019
J. Song and X. Zhu (Eds.): APWeb-WAIM 2019 Workshops, LNCS 11809, pp. 79–91, 2019.
https://doi.org/10.1007/978-3-030-33982-1_7
80 Q. Zhao et al.
The technique has shifted significantly over the year accompanied with
the rise of some excellent deep learning algorithms [20,22–25] and large-scale
datasets. Traditional methods rely on hand-crafted features, such as edges and
texture description, combined with machine learning techniques, such as princi-
pal component analysis (PCA), linear discriminant analysis (LDA) or support
vector machines(SVM). The hand-crafted features, lacking of robustness in most
cases, are far from the industrial demands encountered in unconstrained envi-
ronments. In contrast, Deep learning method based on CNNs could be trained
with large datasets to learn the best features to represent the clusters of data.
Recently, there are a mass of available collections of face in-the-wild on the
web [2,17,18,24]. CNN-based face recognition trained in these resources have
achieved higher accuracy, accounting for they learn the features that are robust
to the real-world variations. They out-perform than traditional ones in practice.
Face recognition systems are usually composed of the following blocks, as
shown in Fig. 1:
(1) Face Detection. The face detection finds the face objects in the image,
then returns the coordinates of bounding box that mark the position of
target objects. This is illustrated in Fig. 2(a).
(2) Face Alignment. The purpose of face alignment is to scale and crop face
images, at the same time to find a set of reference points, such as left and
right eyes, nose and mouse’s left and right etc. as shown in Fig. 2(b). They
are also called facials landmarks.
(3) Face Representation. During the face representation stage, the pixel val-
ues of image are transformed into the compact vector that represent the
features of the face.
(4) Face Matching. In this block, vectors or templates are compared to each
others, and ultimately the model would count a score that indicates the
similarity of them.
The growth in popularity of deep learning methods has been dynamic. More
and more researchers apply the approaches to many other computer vision tasks,
even combine with the fields of humanity and sociology [5,8]. In this paper we
also make full use of the prevalent methods to solve the CEFR-task. In short, the
CEFR-task is a classification problem to judge which Chinese minority group
they belong to in the pictures or videos, by analyzing the ethnical characteristics
of face.
The ethnical characteristics are important parts of facial features. In the
investigation of minority groups in China, the facial features are usually sum-
marized into series of text description, in the forms of survey and measurement
The Research of Chinese Ethnical Face Recognition Based on Deep Learning 81
Fig. 2. The images contain two procedures of face recognition. In (a), face detection
designates the bounding boxes for faces. In (b), the faces are cropped, and reference
points are drawn.
on the counterparts. For example, the research group [31] describes the facial
features of Tibetans: The hair is dark and straight, the eyes’ color is mostly
brown, the rate of internal pleats is higher, the extraocular angle is higher than
the internal, the degree of eye cracking is moderately narrow, the bridge of nose
is straight, the lip is slightly convex, the humerus is prominent and the face is
wide and flat. This also proves that the national characteristics are extractable
and quantifiable, and indicates a direction to design the algorithm on extracting
ethnical face features.
It is required to establish an ear classification criterion. Researchers generally
use racial categories as identification labels [1,4]. The broadest ethnic category
criteria are: Africans and African-Americans, Caucasians, East Asians, Native
Americans, Pacific Islanders, Indians, and Latinos. The sum of all these cate-
gories are able to cover approximately 95% of the world’s population. In some
cases, a part of nations can be identified by naked eyes, which leads to an illu-
sion: it is easy to understand and realize the ethnical face recognition, but in
fact, the underlying algorithm is complex and diverse.
Above all, there is no unified interpretation of the definition of the nation,
and the definition of ambiguity causes uncertainty. Secondly, with the continuous
development of modern society, nation migration and integration have become
the mainstream, which will inevitably lead to new changes in national charac-
teristics, even blurring. On the other hand, due to the influence of social factors,
such as prejudice and inherent thinking on some nations, the task is up against
some practical troubles both in data collection and experimentation. In current,
the research on CEFR is very rare, and there is no public standard dataset.
82 Q. Zhao et al.
2 Proposed Approach
In this section, we will describe our approach towards setting up a reliable as
well as available ethnical face dataset, and joint face detection and recognition.
Web Crawler. Based the definition by Wikipedia, web crawler [29] is an inter-
net bot that systematically browses the worldwide web, typically for the purpose
of web indexing. It is a momentous method to gather datasets in many works.
Our implementation of the web crawler is based on the open source crawler
framework, WebMagic on Github, which is written in Java and possesses good
extensibility. By means of rewriting functions according to different require-
ments, users could achieve crawling steadily and efficiently.
The main process includes following blocks:
(1) Download. The block integrates the interfaces of page download and pre-
process. In the process of crawling web pages, http requests are required,
and WebMagic encapsulates Apache HttpClient as a download tool by
The Research of Chinese Ethnical Face Recognition Based on Deep Learning 83
the windows through another complex CNNs. Finally, it uses the more powerful
one to select the top one result containing bounding box and facial landmarks
positions. And it leverages three tasks to train the detectors: face/non-face clas-
sification, bounding box regression, and facial landmarks localization.
In first task, the learning objective is formulated as a binary classification
problem. For each sample xi , we use the cross-entropy loss:
Ldet
i = −(yidet log(pi ) + (1 − yidet )(1 − log(pi ))) (1)
ˆ is the regression result obtained from the network and y box is the
where yibox i
ground-truth coordinate, including left top, height and width, and thus yibox ∈
R4 .
Similar to the former, facial landmark detection is also formulated as a regres-
sion problem:
Llandmark ˆ
= yilandmark − yilandmark 22 (3)
i
ˆ
where yilandmark is the facial landmark’s target coordinates, and yilandmark is
the ground-truth coordinates. There are five facial landmarks, including left eye,
right eye, nose, left mouse corner, and right mouse corner, and thus yilandmark ∈
R10 .
The overall learning target can be formulated as:
N
min αj βij Lji (4)
i=1 j∈det,box,landmark
After scaling and cropping the face images, it is required to classify the ethnic
groups by deep learning methods.
Recently, residual networks (ResNets) [11] has become the preferred choice for
many objects recognition tasks, including face recognition [7,16]. The main idea
of ResNets is the introduction of building block that uses shortcut connection
to learn a residual mapping, as shown in Fig. 4. The use of shortcut connection
The Research of Chinese Ethnical Face Recognition Based on Deep Learning 85
allows the training of much deeper architectures more easier than “plain” ones.
The reason is that the former network facilitates the flow of information across
layers effectively than the latter, and converges more quickly, which has been
widely proved experimentally.
We use ResNets to extract the feature map in face images, then fine-tune the
structure in common manners. It could use classification loss function, cross-
entropy loss, presented as follows:
epi
Lclass
i = − log n (5)
j=1 epj
where pi denotes the probability of i-th sample belonging to the yi -th class. The
number of classes is n, and corresponding network structure is shown in Fig. 5.
3 Experiments
In this section, we firstly show the information of CCEFI self-draft, the face
images of which have been preprocessed by detection and cropping module. The
images are directly the input objects of the classification networks. Then we
86 Q. Zhao et al.
Fig. 6. The face detector based on MTCNN correctly finds the faces in different pictures
and circles the face objects with bounding box.
compare the diverse structures: ResNet50, VGG19 [21], Inception-V3 [27], and
Inception-ResNet-V2 [26]. All of them, with models pre-trained in ImageNet [6],
are based on the deep learning framework of Keras.
3.1 Preparation
As shown in Fig. 6, the faces in the ethnic images are effectively detected, whether
it is a single-face target or a multi-face target. And we also display the face images
cropped in Fig. 7.
After filtration and statistics, the numbers of face images of each ethnic group
are stated in Table 1, and the training and test set are divided randomly as the
proportion around 4:1.
3.2 Results
(a) Han.
(b) Mongolian.
(c) Tibetan.
(d) Uygur.
Fig. 7. The images contain the people with appropriate face angles but different ages
and genders, and these samples mostly avoid the clear covers on face after filtrating by
us. Except the common, there are also some singers and actors with popular makeup,
which consequently increases the difficulty for the discriminator.
Except for VGG19, the test accuracies of the other three networks are similar,
but the training time of the ResNet50 is faster. The Inception-ResNet-V2 model
introduces the idea of residual networks, which enables the networks to develop
into a deeper level and be easy to train. At the same time, it introduces the
idea of Inception blocks to optimize the structure of each layer by widening the
singer-layer network. It increases the adaptability to the scale. Correspondingly,
the network costs more and the training duration is the longest, and the clas-
sification effect should be optimal. However, after many experiments, we found
that accuracy of 75% is the bottleneck in the test set. It may be related to the
quantity and quality of the dataset. Same situation occurs in the paper [19].
Among the four sets of results, Mongolian is easier identified as Tibetan, but
Tibetan is not easily identified as Mongolian. In VGG19, even most of Mongolian
are considered to be Tibetan. After analyzing the training and test set, we find:
88 Q. Zhao et al.
0.9 ResNet50
VGG19
Inception-V3
0.8
Inception-ResNet-V2
Accuracy
0.7
0.6
0.5
0.4
0.3
0 5 10 15 20 25 30 35 40
Epoch
1.4
ResNet50
1.2 VGG19
Inception-V3
1 Inception-ResNet-V2
0.8
Loss
0.6
0.4
0.2
0
0 5 10 15 20 25 30 35 40
Epoch
Fig. 8. The other three networks start to converge after 20 rounds. It shows that
VGG19 is difficult to train compared to the other three networks under the current
dataset.
(1) In terms of quantity, since the collection of Mongolian is the smallest of all
sets, only containing 620 pictures, the result is likely fluctuating.
(2) With regard of access approach, both Mongolian and Tibetan are acquired
from singers, while the other two groups do not have such approach. The eth-
nical singers could not represent the ordinary well because of their makeup.
Furthermore, singers have coincidences, so it is easy to confuse. There are
more ways to attain the images for Tibetan than Mongolian, which results
in the recognition of Tibetan owning robustness than the others.
(3) This problem also indicates a certain over-fitting phenomenon.
Finally on the binary classification of Han and Uygur, the recognition accu-
racy reaches 90%, which states a certain application space (Fig. 8).
The Research of Chinese Ethnical Face Recognition Based on Deep Learning 89
4 Conclusion
This paper starts from the current situation and prospect of face recognition,
and introduces the our task both in practical significance and implementation
direction.
Lacking of open ethnical face dataset, we build a collection of Chinese ethni-
cal face images (CCEFI) dataset including Han, Uygur, Tibetan and Mongolian
through web crawlers. The CCEFI self-draft includes multiple variations, e.g.
different genders and ages, makeup and illumination, etc. in non-restrictive occa-
sions with large differences. The total picture number of CCEFI reaches nearly
4500. Then we design MTCNN-based face detection module and ResNet-based
90 Q. Zhao et al.
face classification module. Experimental results demonstrate that the model can
detect the faces in photos, then classify the ethnical groups they belong to. In the
future, we will ameliorate the quality and quantity of CCEFI, and supplement
other ethnical groups.
References
1. Barbujani, G.: Human races: classifying people vs understanding diversity. Curr.
Genomics 6(4), 215–226 (2005)
2. Cao, Q., Shen, L., Xie, W., Parkhi, O.M., Zisserman, A.: VGGFace2: a dataset for
recognising faces across pose and age. In: 2018 13th IEEE International Conference
on Automatic Face & Gesture Recognition (FG 2018), pp. 67–74. IEEE (2018)
3. Chen, D., Ren, S., Wei, Y., Cao, X., Sun, J.: Joint cascade face detection and
alignment. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014.
LNCS, vol. 8694, pp. 109–122. Springer, Cham (2014). https://doi.org/10.1007/
978-3-319-10599-4 8
4. Coon, C.S.: The origin of races (1962)
5. Delorme, A., Pierce, A., Michel, L., Radin, D.: Prediction of mortality based on
facial characteristics. Front. Hum. Neurosci. 10, 173 (2016). https://doi.org/10.
3389/fnhum.2016.00173
6. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale
hierarchical image database. In: 2009 IEEE Conference on Computer Vision and
Pattern Recognition, pp. 248–255. IEEE (2009)
7. Deng, J., Guo, J., Xue, N., Zafeiriou, S.: ArcFace: additive angular margin loss for
deep face recognition. arXiv preprint arXiv:1801.07698 (2018)
8. Fu, S., He, H., Hou, Z.: Learning race from face: a survey. IEEE Trans. Pattern
Anal. Mach. Intell. 36(12), 2483–2509 (2014). https://doi.org/10.1109/TPAMI.
2014.2321570
9. Gao, W., et al.: The cas-peal large-scale chinese face database and baseline evalu-
ations. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 38(1), 149–161 (2007)
10. Grother, P., Ngan, M.: Face recognition vendor test (FRVT). NIST interagency
report (8009) (2018)
11. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition.
In: Computer Vision and Pattern Recognition, pp. 770–778 (2016)
12. He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In:
Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp.
630–645. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0 38
13. Kelly, M.D.: Visual identification of people by computer. Technical report, Stanford
Univ Calif Dept of Computer Science (1970)
14. Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network
cascade for face detection, pp. 5325–5334 (2015)
15. Liu, H.: Research on facial recognition of china ethnic minorities. MS thesis, North-
eastern University (2009)
16. Liu, W., Wen, Y., Yu, Z., Meng, Y.: Large-margin softmax loss for convolu-
tional neural networks. In: International Conference on International Conference
on Machine Learning (2016)
The Research of Chinese Ethnical Face Recognition Based on Deep Learning 91
17. Nech, A., Kemelmacher-Shlizerman, I.: Level playing field for million scale face
recognition. In: Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition, pp. 7044–7053 (2017)
18. Parkhi, O.M., Vedaldi, A., Zisserman, A., et al.: Deep face recognition. In: BMVC,
vol. 1, p. 6 (2015)
19. Qiu, S.: The research of face ethnicity recognition base on deep learning. MS thesis,
South China University of Technology (2016)
20. Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face
recognition and clustering-1a 089. pdf (2015)
21. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale
image recognition. arXiv preprint arXiv:1409.1556 (2014)
22. Sun, Y., Chen, Y., Wang, X., Tang, X.: Deep learning face representation by joint
identification-verification. In: Advances in Neural Information Processing Systems,
pp. 1988–1996 (2014)
23. Sun, Y., Liang, D., Wang, X., Tang, X.: DeepID3: face recognition with very deep
neural networks. arXiv preprint arXiv:1502.00873 (2015)
24. Sun, Y., Wang, X., Tang, X.: Deep learning face representation from predicting
10,000 classes, pp. 1891–1898 (2014)
25. Sun, Y., Wang, X., Tang, X.: Deeply learned face representations are sparse, selec-
tive, and robust. In: Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition, pp. 2892–2900 (2015)
26. Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-resnet
and the impact of residual connections on learning. In: Thirty-First AAAI Confer-
ence on Artificial Intelligence (2017)
27. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the incep-
tion architecture for computer vision. In: Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)
28. Trigueros, D.S., Meng, L., Hartnett, M.: Face recognition: from traditional to deep
learning methods. arXiv preprint arXiv:1811.00116 (2018)
29. Wikipedia Contributors: Web crawler – Wikipedia, the free encyclopedia (2019).
https://en.wikipedia.org/w/index.php?title=Web crawler&oldid=900366065.
Accessed 11 June 2019
30. Zhang, C., Zhang, Z.: Improving multiview face detection with multi-task deep
convolutional neural networks. In: IEEE Winter Conference on Applications of
Computer Vision, pp. 1036–1041. IEEE (2014)
31. Zhang, Z.: Institutional characteristics of the tibetan people. Acta Anthropologica
Sinica 12, 250–257 (1985)
Model of Charging Stations
Construction and Electric Vehicles
Development Prediction
Abstract. Electric vehicle is attracting more and more people and the
construction of charging stations is becoming very important. Our paper
mainly deals with the construction of charging stations and electric vehi-
cles market penetration. First, through the relationship between charging
stations and gas stations quantitatively in America, we evaluate 501,474
charging stations will be built in 2060, among which there is 334,316
supercharging stations and 167,158 destination-charging stations. Sec-
ond, we study the optimal distribution of charging stations in South
Korea, and establish a bi-objective programming based on Coopera-
tive Covering model with the help of Queue Theory. Combining these
two models we find the optimal number of charging stations is 30,045.
Thirdly, we use logistic growth model to estimate the growth of charging
stations. We predict that South Korea will achieve 10% electric vehicles
in 2030, 30% in 2036, and 50% in 2040. Combining factors of charg-
ing stations, national policies and international initiatives, etc. we infer
South Korea will realize all electric vehicles in at latest 2060. Lastly, we
utilize K-means to classify those countries into three classes.
1 Introduction
In recent decades, with the increasing pressure of environment and economy, peo-
ple are more and more interested in electric vehicles. A variety of factors need
to be considered comprehensively for the mathematical model of electric charg-
ing station construction. Due to the lack of data, we can only acquire 5 years’
specific number of vehicles. But we can search relevant data about American
gas station, gas price, etc. When vehicles are transferred to all-electric ones, gas
station will be replaced by charging station as they play similar roles in daily life.
c Springer Nature Switzerland AG 2019
J. Song and X. Zhu (Eds.): APWeb-WAIM 2019 Workshops, LNCS 11809, pp. 92–104, 2019.
https://doi.org/10.1007/978-3-030-33982-1_8
Model of Charging Stations Construction 93
So, it’s easy to associate these two kinds of stations. Thus, we adopt the idea
of On Comparison to derive the number of charging stations from gas stations.
For the coverage model, there are roughly three options through the study of
scholars: maximum coverage model [2,4], cooperative coverage model [3,5], and
gradual coverage model [16]. We combine the advantages of these three mod-
els and adopt a decentralized cooperative gradual coverage model. The queuing
theory [8] can better simulate the arrival of customers and facilitate the construc-
tion of charging stations. In addition, the Logistic growth model in biology can
also be used to scientifically predict the future development of electric vehicles.
2 Related Work
On Comparison. A common used method in setting labor quota, is based
on producing or completing the quota of the same type of product or process,
to derive that of the other kind of product or process through comparison and
analysis. The two products to be compared must be similar or of the same type,
in the same series, and are clearly comparable. For our problem, gas station
and charging station have close effects on human life and comparable operation
mode. Therefore, the number and distribution of charging station may have
similar rules with gas station.
Growth Model. Studies by Nathaniel et al. have shown that the average elec-
tric vehicle owner travels 44.7 miles per day [10], but can travel 220–310 miles
after a slow charge. Therefore, when choosing the construction scheme of charg-
ing stations, we can consider the construction of more destination-charging sta-
tions. The data of electric vehicles from1 show that the number of electric vehicles
increased rapidly from 2012 to 2017. The results from [11] also showed that elec-
tric car ownership will increase rapidly in the first few years, But in the later
stages of growth, the growth rate of electric vehicles will gradually slow down.
This is consistent with the growth model of biological populations.
is average time of refueling fully at a time, and tgas = 2 min according to our
assumption; tsc is average time of supercharging at a time, and tsc = 30 mins;
Dsc is average allowed distance of supercharging per time, and Dsc = 170 miles;
Thus, k is a ratio of the allowed distance by per unit time refueling to the allowed
distance by per unit time supercharging. Nc is total number of charging stations.
There are two types of stations, supercharging and destination-charging. Nsc is
the number of the former, and Ndc is the number of the later. And we consider
the ratio of gas efficiency to charging efficiency, namely k, as the proportion of
supercharging stations. Besides, usually there are two chargers in a destination-
charging station while eight in a supercharging station. What’s more, the number
of destination-chargers is approximately twice of superchargers. Therefore, the
number of destination charging stations is eight times as many as supercharging
stations, namely formula (1)–(6) And result is in Table 1.
Table 1. Result.
Variables Value
Cgas 2.6
Dgas 15.6
k 1.377
Nsc 167158
Ndc 334316
Nc 501474
Covering Function. Covering function can help us define the coverage inten-
sity for the current location.
g(X) = f (dij )xj (8)
i=0
xj ∈ {0, 1}
yi ∈ {0, 1}
We select an area in Seoul for calculation to test the feasibility of our model. The
results are showed in Fig. 1. The generation of demand points depends on the
density of road network (where the density of road network is high, the demand
points are correspondingly more). The building points are set as 8, and they are
mainly located near the spots with high road network density. An orange point
indicates not building at this point, and a blue one represents the establishment
of a fast charging station at that point.
Fig. 1. Charging stations distribution of an area in Seoul. From the figure above, we
can see that our model can achieve good results. This also reduces driver’s queueing
time.
6 18(x − L 2
2)
f (x) = √ exp(− ) (16)
2πL L2
The probability of building charging station in xj+1 is
xj+1 x
0
f (x)dx − 0 j f (x)dx
Pj+1 = L (17)
0
f (x)dx
The number of drivers in xj+1 is numj+1 , and numj+1 = Qt × Pj+1 . We adopt
M/M/n/m/m model(Multi-Service window closed queuing model) [8], and we
assume the service intensity μ = 30 min per car, arrival strength λ = 40 mins
per car, capacity of charging station j as mj , the number of cars in need as
numj . To simplify the calculation, considering there are 8 chargers in a charging
station, we make
numj
mj = (18)
8
According to regularity condition, we have
⎛ ⎞−1
nj −1
mj
cl
l!
poj = ⎝ pl ⎠
l mj
Cm ρl + (19)
j 1
(nj )!nj (l − nj ) 1
l=0 l=nj
Model of Charging Stations Construction 99
where ρ1 = μλ = 23 , Cm l m!
= l!(m−l)! , nj is the number of supercharging stations in
jth rest area, p0j j is the probability of no any car in need in jth supercharging
station. The average queue length of jth station Lqj meets
mj
(l − nj )clmj l!ρl1
Lqj = l−nj
ρ0j (20)
l=nj +1 (nj )!nj
nj >= 1
where ω is a coefficient of weight, Wqj is the average waiting time of jth station,
c is building cost (here we assume the cost of each charger is the same), nj is the
number of supercharging stations built in jth rest station. j ∈ M (set of charging
station construction spots), C is the building budget.
We take Yeongdong Expressway to analyze. We find that there should be
two supercharging station in first rest station, and three supercharging stations
in second rest station from origin. The result is the same on the other side. We
can see clearly in Fig. 2.
Electric vehicles emerge in these years, and South Korean government announced
its EV incentive program in 2011. There is promising market for EV in South
Korea. The EV sales from 2012 to 2017[16] are in Fig. 3. According to Fig. 4,
from the proportion of EV sales to total new car sales in each region, we can
see that all the countries as a whole will vigorously develop electric vehicles
to replace fuel vehicles in 25 years from 2015 to 2040. The increase of EV sales
express particularly obvious over the 15 years from 2025 to 2040, which is slowing
100 Q. Zhang et al.
Fig. 3. EV sales from 2012 to 2017 Fig. 4. Annual predicted sale percentage of
From the figure above, we can see electric vehicles in various countries from [11]
that in the past six years, the sales
of South Korean electric vehicles
have rapidly grown from 455 to
14,234.
down after 2040. This is similar to a biological growth pattern in which species
migrate to a new ecosystem (non-ideal environment). Therefore, to predict the
future purchases of electric vehicles in the next 20 years by 2040 we establish
the following model:
a
s(x) = +d (23)
1 + b−x+c
where a, b, c, d are constants, and x is a variable related to time. Because our data
starts from 2012, we make x = t − 2011, where t refers to year. Differentiating
formula (22), we get
alog(b)bc−x
s (x) = −x+c (24)
(b + 1)2
s (x) reaches maximum when x = c. Around 2025, the sale proportion of EV
starts to grow rapidly, and the growth rate reaches maximum in about 2032.
Thus, we can set c = 20. When it’s up to 2040 or so, the sale proportion of EV
generally achieve 50% as a whole. We find the specific sale of cars in South Korea
Model of Charging Stations Construction 101
b EV sales(×104 )
1.25 42
1.275 53
1.3 69
1.325 89
1.35 110
9.375 × 105
s(x) = − 5828 (25)
1 + 1.32520−x
GINI. GINI is usually used to measure the gap between the rich and the poor
in a country, but the price of electric vehicles is generally higher than that of
fuel cars. So countries with smaller gap between the rich and the poor is easier
to popularize electric vehicles.
Gas and Electricity Price. When the cost of charging is lower than that of a
refueling for the same driving distance, it can also contribute to the popularity
of electric vehicles.
After normalization, the common attributes of each column are mapped to [0–1],
which can be converted into dimensionless values and improve the accuracy of
K-means model. We classify the countries into three categories by K-means, and
the result is in Table 3. We can find the development environment of Australia,
Singapore, and Korea is close. They are generally ahead of other countries in
GDP per capita, expenditures on R&D per capita, and energy consumption,
while their GINI is relevantly lower. Compared to other types of countries, the
development environment of these three countries is very likely to realize all-
electric vehicles. For China, GDP, expenditures on R&D, and the relative pro-
portion between gas price and electricity price are far ahead of the other coun-
tries. But Chinese GDP per capita is not high enough, meanwhile GINI is large.
Although it is difficult to be fully popularized, China is a development environ-
ment in which widespread popularization could be achieved. Besides, Indonesian
GINI is small but GDP per capita there is the lowest. At the same time, China
and Indonesia also has the least R&D investment, so countries of class 2 and 3
can only achieve 50% of the market coverage of electric vehicles.
Model of Charging Stations Construction 103
Fig. 5. Proportion of corresponding index of each country. We can see the difference
between countries is great.
Country Class
Australia 1
Singapore 1
South Korea 1
China 2
Indonesia 2
Saudi Arabic 3
8 Conclusion
We adopt the idea of on Comparison based on the similarity between gas sta-
tions and the supercharging stations, to calculate the number of fully electrified
charging stations through the number of gas stations and the distance the fuel
vehicles travel in one year. That avoids overfitting for regression prediction when
using existing data only. Due to the differences in geography, we analyze urban
and suburban areas (represented by expressway) separately, which makes the
model more accurate. We do prediction based on existing researches. Combining
practical rules, we adopt Logistic Growth model, which is more in line with the
laws of electric vehicle development.
References
1. Wikipedia. https://en.wikipedia.org/wiki/Tesla Supercharger. Last accessed 4
July 2019
2. Wu, L.: The research of electric vehicle charging station based on GMCLP, pp.
26–30 (2016)
3. Kuby, M.J., Lim, S.: The flow-refueling location problem for alternative-fuel vehi-
cles. Socio-Econ. Plann. Sci. 39(2), 125–145 (2015)
4. Lili, L., Xubo, G., Yibin, Z.: Development of electric vehicles: opportunities and
challenges for power grid companies. In: Proceedings of the Fifth China Interna-
tional Power Supply Conference, pp. 1–7 (2012)
5. Lim, S., Kuby, M.J.: Heuristic algorithmsfor siting alternative fuel stations using
the flow-refueling location model. Eur. J. Oper. Res. 204(1), 51–61 (2010)
6. Zhang, Y.: Chinese Expressway (2015)
7. Shi, D., Zhou, J.: Comparison of service areas on expressway at home and abroad
(2016)
8. Chuanji, L.: Queuing Theory, 2nd edn. Beijing Youdian University, Beijing (2009)
9. South Korea Gasoline Prices. https://tradingeconomics.com/south-korea/
gasoline-prices. Last accessed 4 July 2019
10. Pearre, N.S., Kempton, W., Guensler, R.L., Elango, V.V.: Electric vehicles: how
much range is required for a day’s driving? Transp. Res. Part C: Emerg. Technol.
19(6), 1171–1184 (2011)
11. Figure source. https://about.bnef.com/. Last accessed 2 Feb 2018
12. Icct. https://www.theicct.org/blogs/staff/promoting-electric-vehicles-in-korea.
Last accessed 4 July 2019
13. Aju Business Daily. http://www.ajudaily.com/. Last accessed 4 July 2019
14. Xinhua. http://inf.315che.com/n/2007 01/29650/. Last accessed 4 July 2019
15. South Korea’s ministry of land and oceans. http://www.mofcom.gov.cn/aarticle/
i/dxfw/cj/201104/20110407486496.html. Last accessed 4 July 2019
16. Nie, L.: Multi-objective progressive coverage model and solution for the temporary
medical waste storage site selection. China Population Res. Environ. 2015(S1),
110–112 (2018)
17. Zhou, Y.: Journal 2(5), 99–110 (2016)
18. Li, Z.: Research on urban EV charging network planning and operation based on
traffic behavior, Shandong University (2014)
Boundary Detector Encoder and Decoder
with Soft Attention for Video Captioning
1 Introduction
As we enter the information age, video devices such as cameras are constantly
updated. Videos become a major way for people to communicate daily and mas-
sive video information is constantly transmitted. However, complex information
in the videos makes them hard to be fully used.
Video captioning, the machine automatically generates natural language to
describe the video, brings a keen interest to researchers dedicated to video
research. The development of deep learning promotes the progress of video cap-
tioning, which is of great significance to society. Indeed, video captioning has
many applications in human-robot interaction, automatic video and surveillance.
It can be leveraged to help visually impaired people to get the content of a video
and to generate automatic subtitles.
Video captioning research started with the classical template [11] which
detects separately Subject, Verb and Object, then captions were generated
through a sentence template. However, the approach can produce a rigid cap-
tioning and cannot satisfy the richness of natural language. At the same time,
c Springer Nature Switzerland AG 2019
J. Song and X. Zhu (Eds.): APWeb-WAIM 2019 Workshops, LNCS 11809, pp. 105–115, 2019.
https://doi.org/10.1007/978-3-030-33982-1_9
106 T. Chen et al.
the advent of deep learning has greatly promoted the advancements in CV and
NLP. Hence, latest approaches almost use encoder-decoder scheme [16,21] that
encodes the visual features with 2D/3D-CNN and use LSTM/GRU to tempo-
rally generate sentences.
Early research on video captioning mostly focused on domain specific short
video clips with limited vocabularies of objects and activities [3,13]. The main
differences of the approaches that use encoder-decoder framework are the dif-
ferent types of CNNs in encoding scheme and the different language models
in decoding networks. Later methods were progressed by adding some modules
on the standard encoder-decoder framework. [21] proposed a neural structure
used in both the video encoding stage and sentence decoding stage. They used
a stacked LSTM to encode the input video. Interestingly, they used the same
stacked LSTM to generate sentence. The advantages of the approach are that
they can keep the sequential nature of input video and the network can use same
parameters in the two stages. The framework has been widely followed by other
works and already applied to machine translation [19].
Recently, researchers improved the encoder-decoder framework by signifi-
cantly modifying their components. [2] paid attention to the encoder scheme
and proposed a hierarchical boundary-aware neural encoder which could iden-
tify discontinuity points between video frames. [25] proposed a temporal atten-
tion mechanism that learned to select the relevant frames to the decoder by
focusing on the sentence decoder. Many methods based on various attention
mechanisms [7,8,18,25] have been successfully used in video captioning. Encour-
aged by [2,25], we use a boundary detector [2] to improve the encoding scheme
and employ a temporal attention mechanism [25] in decoder stage to generate
sentences.
2 Method
2.1 Boundary Detector
In this paper, we use a new video encoding scheme which can detect tempo-
ral discontinuities, such as action or appearance changes, to generate descrip-
tion. Figure 1 shows the structure: the features extracted from the input frames
by ResNet152 are fed into the boundary detector module. When the action or
appearance of the input video changes, the boundary detector scheme will auto-
matically modify the connectivity of the LSTM layer. Another LSTM layer is
adopted to get the features of video clip whose last frame is regarded as a bound-
ary, then we can get the features of the whole video at the end of the video by
LSTM layer.
Given an input video, the boundary detector will take a sequence of features
(x1 , x2 , ..., xn ) as input and output a sequence of vectors (s1 , s2 , ..., sn ). In the
encoder, the connectivity state of the LSTM layer will change when the input
and the hidden state of the layer change. Therefore, the boundary detector is
regarded as an activation rather than a non learnable hyperparameter.
Boundary Detector Encoder and Decoder with Soft Attention 107
At each time step, the choice of the encoder is to transfer the memory cell
content and hidden state to next time step or reinitialize them, thus interrupt-
ing the seamless update and the processing of input sequence. What decides
the result is boundary detector cell which allows the encoder independently to
handle video blocks of different lengths. The boundaries of each chunk are deter-
mined by a learnable function which depends on the input and not a determined
formula. Formally, the boundary detector St is calculated as a linear combination
of the current input and hidden state. The function τ , which is a combination
of a sigmoid function and a step function, can be represented by the following
expression:
St = τ (V T · (Wsi Xt + Wsh ht−1 + bs )) (1)
where Xt is the input frame, Wsi , Wsh are learned weights, bs is learned bias,
VT is a learned vector and ht−1 is the hidden state of last time step.
Given the current result of boundary detector, we can use the following sub-
stitutions to update the network hidden state and memory cell which are trans-
ferred to the next time step.
There are many proposed LSTM architecture [9,10,12,17] which are sightly
different in their structure. In this paper, we use the scheme [10]. Its equations
are as follows.
it = σ(Wix Xt + Wih ht−1 + bi ) (4)
ft = σ(Wfx Xt + Wfh ht−1 + bf ) (5)
gt = φ(Wgx Xt + Wgh ht−1 + bg ) (6)
ot = φ(Wfx Xt + Wfh ht−1 + bf ) (7)
ct = ft ct−1 + it gt (8)
ht = ot φ(ct ) (9)
where is the element-wise Hadamard product, φ denotes the hyperbolic tan-
gent tanh, σ is the sigmoid function, Wix , Wfx , Wgx , Wih , Wfh and Wgh are
learned matrices, bi , bf , bg are learned biases vectors, Xt is the input data. The
hidden state ht and memory cell ct are initialized to zero. The input gate it
controls how the current input should be passed to current memory cell ct . ft
is the forget gate which is applied to control what the cell forget from the last
memory cell ct−1 . The output gate ot decides whether the current memory cell
is output. Figure 2 shows a schema of the boundary detector.
According to the above equations, the boundary detector will get a different
length series of outputs(s1 , s2 , ..., sm ), where m is the number of video boundary.
Each of the outputs represents the content of a video segment. The outputs are
passed to another LSTM layer to build a hierarchical representation of the input
video. At last, the last hidden state of the additional LSTM layer can be used
as the feature vector of the whole video.
Boundary Detector Encoder and Decoder with Soft Attention 109
2.2 Attention
In this paper, we adopt the soft attention mechanism that is proposed in [25]. The
soft attention mechanism allows the decoder to weight the feature vector V =
(v1 , v2 , ..., vn ) of each video frame so that the decoder can pay more attention to
the input frames related to input of decoder. The decoder can get more precise
input data than the decoder without attention mechanism, so we can produce
more natural captioning.
We get the feature vector of each input frames by the encoder and the
weight distribution et of the input video frames can be obtained by the following
equation:
et = W T φ(Wa ht−1 + Ua Vi + ba ) (10)
where WT , Wa and Ua are learned weight matrix, ba denotes a learned bias
vector, ht−1 is the hidden state at last time step, Vi represents the feature of
the entire input video which is got by concatenating the feature of each video
frames and φ is the tanh function.
Through the above formula, we can obtain the correlation score et of all
input frames, then we normalize et to get the probability distributions pt of
the input video frames by a softmax function. Finally, we multiply the obtained
probability distributions pt and the features of the whole video Vi to get the
final input of decoder F. The equation is as follows:
F = Vi θ(et ) (11)
2.3 Training
The function τ is a combination of a sigmoid function and a step function so
that we need a special training method to train the encoder. First, the boundary
detector St is regard as a stochastic neuron [15]. In particular, we use a stochastic
version of function τ . Formally, τ can be computed as follows during the forward
propagation of training stage.
0, σ(x) < Z, Z ∼ U [0, 1]
τ (x) = (12)
1, otherwise
where U [0, 1] is uniform distribution on [0,1]. This ensures that the boundary
detector St is stochastic and its probability of 0 or 1 is proportional to the value
of the input σ(x).
During the backward pass, since the derivative of the step function is zero,
we cannot use the standard back propagation. To solve the problem, we apply
the approach suggested by Bengio et al. [5]. The method is based on the idea
that if the network uses a differentiable approximation in backward propagation,
the network can apply discrete operations in forward propagation. In our work,
the derivative of τ used in backward pass is simply the derivative of the sigmoid
function.
∂τ
(x) = σ(x)(1 − σ(x)) (13)
∂x
110 T. Chen et al.
At test phase, we employ the deterministic version of step function (Eq. 14).
Therefore, the number of video segments detected by the boundary detector
encoder is random during training and is deterministic during the test.
0, σ(x) < 0.5
τ (x) = (14)
1, otherwise
3 Experimental Setup
3.1 Datasets
3.2 Metrics
We use three popular evaluation metrics:BLEU [14], METEOR [1] and CIDEr
[20]. BLEU uses n-gram to calculate the co-occurrence frequency of ground-true
sentence and predicted sentence. Like most of previous work, we use 4-gram to
evaluate the sentence produced by our network. METEOR is based on BLEU
with some improvements, adding the relationship between the generated sentence
and the ground-true sentence. CIDEr, finally, treats each sentence as a document
and calculates the cosine angle of the TF-IDF vector, which gives the similarity
between the predicted sentence and the reference sentence.
4 Experimental Results
4.1 Ablation Study
In order to clearly understand the roles of the boundary detector and atten-
tion mechanism, we have done three sets of experiments on both MSVD and
MSR-VTT: an experiment with a boundary detector and attention mechanism
(BD attention), an experiment with a boundary detector but no attention mech-
anism (BD), an experiment without a boundary detector and attention mecha-
nism (BD NO).
Table 1 shows the results on MSVD. First of all, to understand the role of the
boundary detector, we compare BD and BD NO. The result of BD outperforms
the result of BD NO on the three metrics. Specifically, BD is 5.9%, 3.0%, 6.6%
higher than BD NO on BLEU@4, METEOR and CIDEr. Secondly, comparing
BD attention and BD, we can clearly realize that the attention mechanism can
make the sentences generated by our network closer to the ground-true sentences.
For reference, BD attention achieves a 6.5% BLEU@4, 3.9% METEOR, 6.5%
CIDEr.
112 T. Chen et al.
On the dataset MSVD, we have chosen three advanced methods in recent years
to compare with our method. Temporal attention (SA) [25] uses a decoder with
an attention mechanism and extracts feature from GoogleNet and a 3D CNN.
S2VT [21] employs a stacked LSTMs for decoder and encoder. LSTM-YT [21]
applies a CNN encoder which uses mean pooling for downsamping to extract
the feature of the input video.
The results on this dataset are shown in Table 3. It is obvious that our
method has better performance than others. We especially focus on the app-
roach SA which also uses an attention mechanism. The BLEU@4, METEOR
and CIDEr for BD attention(ours) can reach 44.1, 32.1, 70.1, making the rela-
tive improvement over SA by 5.3%, 8.4%, 35.7%, respectively. The results imply
the advantage of our boundary detector.
On the MSR-VTT dataset, we again consider SA and LSTM-YT which are
both used in [18]. We also choose two other approaches: M3 [23] which builds a
textual and visual shared memory to model the long-term visual-textual depen-
dency, hLSTMat [18] that uses the temporal attention to select specific frames
Boundary Detector Encoder and Decoder with Soft Attention 113
to generate the words, while the adjusted temporal attention is for deciding
whether to depend on the visual information or the language context informa-
tion. As it can be noticed in Table 4, the results across BELU@4 and METEOR
point that our BD attention shows better performance than other runs.
Figures 3 and 4 present a few examples on MSVD, MSR-VTT respectively. It
is obvious that the sentences generated by BD attention can describe the video
well.
5 Conclusion
In this paper, we use a boundary detector encoder which can discover the hier-
archical structure of the video and employ a decoder with attention mechanism.
Experiments on MSVD and MSR-VTT show that our method is comparable to
the state-of-the-art models. In the future, we will further improve the boundary
detector.
References
1. Banerjee, S., Lavie, A.: METEOR: an automatic metric for MT evaluation with
improved correlation with human judgments. In: Proceedings of the ACL Workshop
on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or
Summarization, pp. 65–72 (2005)
2. Baraldi, L., Grana, C., Cucchiara, R.: Hierarchical boundary-aware neural encoder
for video captioning. In: Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition, pp. 1657–1666 (2017)
3. Barbu, A., et al.: Video in sentences out. arXiv preprint arXiv:1204.2742 (2012)
4. Bengio, S., Vinyals, O., Jaitly, N., Shazeer, N.: Scheduled sampling for sequence
prediction with recurrent neural networks. In: Advances in Neural Information
Processing Systems, pp. 1171–1179 (2015)
5. Bengio, Y., Léonard, N., Courville, A.: Estimating or propagating gradi-
ents through stochastic neurons for conditional computation. arXiv preprint
arXiv:1308.3432 (2013)
6. Chen, D.L., Dolan, W.B.: Collecting highly parallel data for paraphrase evaluation.
In: Proceedings of the 49th Annual Meeting of the Association for Computational
Linguistics: Human Language Technologies, vol. 1, pp. 190–200 (2011)
7. Fang, H., et al.: From captions to visual concepts and back. In: Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, pp. 1473–1482
(2015)
8. Gao, L., Guo, Z., Zhang, H., Xu, X., Shen, H.T.: Video captioning with attention-
based LSTM and semantic consistency. IEEE Trans. Multimed. 19(9), 2045–2055
(2017)
9. Gers, F.A., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction
with LSTM (1999)
10. Graves, A., Mohamed, A., Hinton, G.: Speech recognition with deep recurrent
neural networks. In: 2013 IEEE International Conference on Acoustics, Speech
and Signal Processing, pp. 6645–6649 (2013)
11. Guadarrama, S., et al.: Youtube2text: recognizing and describing arbitrary activ-
ities using semantic hierarchies and zero-shot recognition. In: Proceedings of the
IEEE International Conference on Computer Vision, pp. 2712–2719 (2013)
12. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8),
1735–1780 (1997)
13. Khan, M.U.G., Gotoh, Y.: Describing video contents in natural language. In: Pro-
ceedings of the Workshop on Innovative Hybrid Approaches to the Processing of
Textual Data, pp. 27–35 (2012)
14. Papineni, K., Roukos, S., Ward, T., Zhu, W.-J.: BLEU: a method for automatic
evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on
Association for Computational Linguistics, pp. 311–318 (2002)
15. Raiko, T., Berglund, M., Alain, G., Dinh, L.: Techniques for learning binary
stochastic feedforward neural networks. arXiv preprint arXiv:1406.2989 (2014)
16. Rohrbach, A., et al.: Movie description. Int. J. Comput. Vis. 123(1), 94–120 (2017)
17. Schmidhuber, J., Wierstra, D., Gagliolo, M., Gomez, F.: Training recurrent net-
works by evolino. Neural Comput. 19(3), 757–779 (2007)
18. Song, J., Guo, Z., Gao, L., Liu, W., Zhang, D., Shen, H.T.: Hierarchical LSTM with
adjusted temporal attention for video captioning. arXiv preprint arXiv:1706.01231
(2017)
Boundary Detector Encoder and Decoder with Soft Attention 115
19. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural
networks. In: Advances in Neural Information Processing Systems, pp. 3104–3112
(2014)
20. Vedantam, R., Lawrence Zitnick, C., Parikh, D.: CIDEr: consensus-based image
description evaluation. In: Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, pp. 4566–4575 (2015)
21. Venugopalan, S., Rohrbach, M., Donahue, J., Mooney, R., Darrell, T., Saenko,
K.: Sequence to sequence-video to text. In: Proceedings of the IEEE International
Conference on Computer Vision, pp. 4534–4542 (2015)
22. Venugopalan, S., Xu, H., Donahue, J., Rohrbach, M., Mooney, R., Saenko, K.:
Translating videos to natural language using deep recurrent neural networks. arXiv
preprint arXiv:1412.4729 (2014)
23. Wang, J., Wang, W., Huang, Y., Wang, L., Tan, T.: M3: multimodal memory mod-
elling for video captioning. In: Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, pp. 7512–7520 (2018)
24. Xu, J., Mei, T., Yao, T., Rui, Y.: MSR-VTT: a large video description dataset for
bridging video and language. In: Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, pp. 5288–5296 (2016)
25. Yao, L., et al.: Describing videos by exploiting temporal structure. In: Proceedings
of the IEEE International Conference on Computer Vision, pp. 4507–4515 (2015)
Author Index