Theme-Related Keyword Extraction From Free Text Descriptions of Image Contents For Tagging

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

International Conference on Advanced Communications Technology(ICACT) 537

Theme-Related Keyword Extraction from


Free Text Descriptions of Image Contents for
Tagging
Joonmyun Cho*, Yoon-Seop Chang*, Sung-Ho Lee*
*Hyper-connected Communication Research Laboratory, ETRI(Electronics and Telecommunications Research Institute),
218 Gajeong-ro, Yuseong-gu, Daejeon, 305-700, Korea.
[email protected], [email protected], [email protected]

Abstract— This paper discusses a method for automatic theme- users’ natural language comments on their photos and videos;
related keyword extraction from users’ natural language in most social media, user comment texts are posted together
comments on their photographs and videos. ‘Theme’ indicates with the image contents. The theme-related keyword
the concepts circumscribing and describing the content of the extraction system employs a deep learning algorithm,
photos and videos such as pets, natural sites, palaces and places.
The method employs a deep learning algorithm, RNN(Recurrent
RNN(Recurrent Neural Network) that is good at recognizing
Neural Network) that is good at recognizing implicit patterns of implicit patterns of sequential data like sequence of words in
sequential data. The method has been applied to the construction user comments. In existing methods having pre-set candidate
of a place-related image content DB, and delivers reasonably keywords for a certain target domain (i.e. theme), many
good performance even in case the measure (i.e. themes of image legitimate keywords as closely associated with the theme can
contents) is abstract and vague. be omitted or many unnecessary keywords as weakly
associated with the theme can be extracted, so that the quality
Keywords— Image Content DB, Keyword Tagging, Content of extracted keywords tends to be worsened. This is because
Search, Tag Extraction, Recurrent Neural Network (RNN) the concept of the theme of image contents is vague so that it
cannot be defined with explicit rules and that even the same
I. INTRODUCTION keyword can have different meanings or nuance in the context
Let’s imagine an automatic construction of theme-related of a user comment.
image content DB using photographs and videos collected
II. BACKGROUND TECHNOLOGY AND RELATED WORK
from various social media such as SNS and blogs. Here,
‘theme’ indicates concepts circumscribing and describing the A. Social Media
content of the photographs and videos such as pets, palaces,
Huge amounts of contents are posted and distributed
natural sites and places. Such theme-related image content DB
through various social media such as blogs and SNS(Social
can be utilized in various applications. For example, imagine
Network Service). Especially, owing to increase of the
the task of choosing places to take scenes of a movie or TV
Internet bandwidth and progress of telecommunication
drama. The person in charge of the task in a film-making
technology, the amounts of image contents such as
project can refer to the photos or videos retrieved from a
photographs and videos are growing explosively. Most social
place-related image content DB in order to reduce the number
media platforms provide open APIs for users and computer
of candidate places which he/she should go out to and check
systems to use the image contents freely and easily [1], [2].
the real conditions or feelings of. By checking the real
Looking at typical social media such as flickr[3] and
conditions only of a small number of candidate places rather
facebook[4], we can see that photos and videos are posted
than many possible places, he/she can save a lot of time and
together with user comments written in natural language. Thus,
money. We can imagine many other tasks or applications that
if we analyse the user comments we can find the sort (i.e.
could benefit from such theme-related image content DBs
theme) of the image contents and, furthermore, we can extract
such as choosing travel destination, determining a pet dog
proper keywords related to the image content. Figure 1 shows
breed and so on.
a snap shot of a flickr web page posting a photo and user
Those image content DBs should provide an easy and quick
comments on it.
way of finding the image contents that are related closely to
There are numbers of work such as Rae[2] and Kim[5] that
users’ task. The most familiar way of searching is to use
extract certain information from unstructured data like natural
search keywords. In order for the DB systems to provide
language text. Kim developed a system that extracts from
keyword search, each photos and videos have to be tagged
email text the information related to a meeting such as the
with keywords when they are stored in the DBs. This paper
name, place, date and time of the meeting. The system
discusses an automatic theme-related keyword extraction from

ISBN 979-11-88428-01-4 ICACT2018 February 11 ~ 14, 2018


International Conference on Advanced Communications Technology(ICACT) 538

employs CRF(Conditional Random Field) machine learning and desired patterns to be recognized [6], [7]. As the attributes
algorithms to recognize such information words in the texts. of words in the user comment text, besides the order of the
Rae et al. also utilize CRF algorithms to extract from user words, we can consider various language features such as POS
comment text the information related to a place where the (Part of Speech), dependency relations and semantic
flickr image content are taken. categories [1], [2], [5]. Words assigned to the same POS
generally display similar behaviour in terms of syntax, and
words in a sentence connect with each other directly or
indirectly. Moreover, a word represents some things or
concepts so that it can be classified into categories of
meanings according to its representations. Humans, in fact,
unconsciously refer to such features when understanding
sentences.
Meanwhile, machines do not know which patterns are
significant and thus which patterns they should recognize. So,
we have to inform the machine of the desired patterns by
providing labels. While the language features as the attributes
of data are usually provided and tagged automatically by
natural language processing toolkits, the labels are provided
and tagged manually by human users.

III. THE THEME-RELATED KEYWORD EXTRACTION


The theme-related keyword extraction consists mainly of
three systems: theme-related keyword pattern learning system;
Figure 1. A snap shot of a flickr web page posting a photo and user theme-related keyword extraction system; and language
comments on it feature tagging system. The theme-related keyword pattern
learning system generates a model for pattern recognition of
B. Recognition of Patterns of Theme-Related Keywords the keywords relating to a specific theme. The system is
Extraction of keywords relating to a certain concept (i.e. trained using machine learning data composed of word order,
theme) from user comments of natural language text POS, dependency relations, semantic categories and labels.
corresponds to recognition of implicit patterns of unstructured The theme-related keyword extraction system extracts
data. The patterns of theme-related keywords are implicit keywords from any user comments based on the recognition
because the boundaries of a theme cannot be precisely defined model learned. The language feature tagging system
and thus, there are no explicit rules to separate proper automatically tags language features to each word in the user
keywords from improper ones. Actually, even people differ comments. The tagged user comment data is used as the
from one another in determining right keywords depicting training data off-line which is the input to the learning system
features of, for example, places, pets and buildings in images. and as pattern recognition data on-line which is the input to
Nonetheless, the patterns certainly exist because humans, on the extraction system. Figure 2 illustrates the process of the
the other hand, feel no difficulty performing the task and theme-related keyword extraction and location of each system
generally arrive at a number of keywords agreeable to in the process.
everyone.
Various machine learning techniques are now being applied
to the implicit pattern recognition. However, in case of natural
language text, the patterns of a keyword do not involve only
the attributes of the keyword itself. The order and/or sequence
of words in the text also constitute the patterns. In other words,
other words around the keyword and the relations between
them compose a large portion of the patterns. Machine
learning algorithms that well recognize such sequential
patterns include HMM(Hidden Markov Model) and CRF [5],
[6]. Recently, it is reported that RNN(Recurrent Neural
Network), one of the deep learning algorithms, delivers better
performance on sequential pattern recognition than HMM and
CRF algorithms [7].
C. Machine Learning
In order to apply machine learning algorithms to the pattern
Figure 2. The theme-related keyword extraction process and its constituents
recognition over data, we have to feed attributes of the data

ISBN 979-11-88428-01-4 ICACT2018 February 11 ~ 14, 2018


International Conference on Advanced Communications Technology(ICACT) 539

A. Tagging the Language Features genitive case marker and an adjective-derived suffix,
The language feature tagging system decomposes an input respectively. The tags such as 1, 2 and 3 denote word
user comment text into syntactic units (i.e. words) preserving dependencies, meaning the distance to other word the word
the original order of the words. And then, it tags language depends on. The semantic categories are marked with such
features such as POS, dependencies and semantic categories to tags as 㧧㣿__01, 㤊☯__02 and 㡂㧦__02, meaning action,
each word. The system has been implemented using movement and female, respectively. Such language feature
UWordMap[8] and UTagger[9] developed by Ulsan tags are given automatically by UWordMap and UTagger.
University, Korea. UWordMap is a Korean word map Note that those language attributes tagged might include
constructed based on the Standard Korean Dictionary of many noises i.e. wrong tags. In other words, because the
National Institute of Korean Language. UTagger is a toolkit natural language processing algorithms themselves are also
that tags POS, dependency relations and semantic categories based on statistics model and/or machine learning techniques,
of words using UWordMap and various natural language their analysis might be inaccurate. Moreover, a word can have
processing algorithms. many different meanings (i.e. semantic categories) in different
texts (i.e. sentences) according to the context of the sentences.
The system, however, does not tag the right semantic
categories of the word in a sentence but tags all possible
semantic categories of the word. In other words, the semantic
category in this paper does not mean the exact meaning of the
word in a user comment text. It is because judging accurate
meanings of a word in a sentence is very difficult for the state-
of-the-art techniques or at least, the accuracy is so poor that
much noise has to be included inevitably.
The last tags of each word are the labels which are given by
human users as the answer of the pattern recognition. The
labels are tagged according to IOB2 tagging model [10]. In
this model, B tag denotes the beginning of a target pattern (i.e.
a theme-related keyword), I tag denotes the inside of the target
pattern, and O tag denotes the outside of the target pattern. So,
with these label tags, we can separate the target theme-related
keywords from others.
B. Learning Patterns of the Theme-Related Keywords
The theme-related keyword pattern learning system is
trained to generate a pattern recognition model using a bulk of
data (i.e. machine learning data) which incorporates the
language features tags and labels. This paper employs RNN
algorithms in order that the system can learn a model for
sequential data patterns. As a toolkit for RNN algorithms,
DeepLearning4J[11] is used. DL4J is developed using Java so
that it is easy y to integrate DL4J with application system.
The RNN used in this pater consists of 4 neural network
layers. The first layer is the input layer and has 962 sigmoid
neurons (i.e. nodes) according to the number of input
attributes of data. The second and third layers are the hidden
layers having 100 LSTM(Long and Short-Term Memory)
neurons respectively. Lastly, the fourth layer is the out layer
of 4 softmax neurons for the three label tags and an
exceptional output tag; the exceptional case is thought to
occur by user’s false label tagging. SGD(Stochastic Gradient
Descent) algorithm is used for the training of the neural
Figure 3. An example of machine learning data composed of language
feature tags and labels network. And, as the cost function of the training algorithm,
LossMCXENT, a sort of the negative log-likelihood cost
Figure 3 shows an example of machine learning data the function is used. This paper applies the Dropout regularization
theme-related keyword pattern recognition. The data preserves and L2 regularization. Such RNN explained above can be
the order of words and each word is tagged with language thought quite typical one having no special features. In Figure
features and labels. The tags such as NNG, JKG and XSA 4 we can see the network configuration and some hyper-
denote POS features of the word, meaning a common noun, a parameters of the network.

ISBN 979-11-88428-01-4 ICACT2018 February 11 ~ 14, 2018


International Conference on Advanced Communications Technology(ICACT) 540

flowers….” The passages collected were tagged by the


language feature tagging system into preliminary training data,
and then human users marked the BIO labels on the
preliminary training data into the final training data.
A. Place-Related Keyword Pattern Recognition Model
Using the machine learning data prepared, 7 RNNs were
trained. 70% of the data corresponding to 464 passages out of
the 663 passages was used as the training data and the rest of
the data was used as the test data. Finally, 7 place-related
keyword pattern recognition models were generated with the
same training data, and they deliver average accuracy of 89%
over the test data for the labels. In Figure 4, we can see the
evaluation results of the models such as accuracy, precision
and F1 score.
B. Place-Related Keywords for Image Contents
Various user comments on photos and videos are gathered
from various media and processed by the place-related
keyword extraction system equipped with the 7 pattern
recognition models. Figure 5 shows one example of the
processing. The user comment in the example is: “There is an
observation platform on top of seaside cliff and in the
platform is a small monument saying the village is the end of
the earth. As tall pine trees surround the platform like a
Figure 4. The RNN configuration with some hyper-parameters and folding screen, feeling of being in forests occurs.” The
evaluation results based on test data extracted keywords are: observation platform; pine tree;
seaside; monument; forest; village; cliff. After processing over
C. Extracting Theme-Related Keywords 130 user comments, the average performance has been
The theme-related keyword extraction system performs its estimated by human users at: precision: about 76%; recall:
task based on the pattern recognition model learned. In fact, about 94%. In the example of Figure 5, the precision is 5/7 =
the system applies in parallel 7 models trained with the same 71.43% and the recall is 5/5 = 100% because human users
training data. And, the pattern recognition results (i.e. judged the following 5 keywords are the right place-related
resulting label tags) from the 7 models are synthesized into keywords and they are all included in the extracted keywords:
final one. Specifically, the priorities of the label tags are set as observation platform; pine tree; seaside; monument; cliff.
B tag > I tag > O tag, and then the tag of the highest priority is
finally decided as the tag of the word even if the tag has the
least incidence among the 7 tags returned. This priority-based
voting scheme expands, by taking account of any probability
of theme-related keywords, the potential keywords that will be
extracted. The reason for adopting such scheme is that the
main use of the extracted keywords is search keywords so that
they are encouraged to increase the recall rate of the search
rather than the precision. Figure 5. An example of the place-related keyword extraction

IV. EXPERIMENT AND EVALUATION V. CONCLUSIONS


The method discussed in Section 3 has been implemented This paper discusses a method to extract theme-related
in a construction of place-related image content DB. The keywords automatically from user comments on photos and
primary purpose of the DB is to support the task of choosing videos that describe the content of the image contents. The
places to take scenes of movies or TV dramas. In order to method employs RNN of 4 layers with 100 nodes per hidden
prepare the machine learning data (i.e. the training data) for layer to generate theme-related keyword pattern recognition
the place-related keyword pattern learning system, 663 model. One of the language attributes to compose the training
sentences or passages are collected from various TV drama data is semantic categories of words. The method tags all
scripts that describe the conditions or features of the filming possible semantic categories of a word not exact one of the
places. Among them, for example, is the passage: “It is word in a user comment text. With the training data, 7 models
Hyunwook’s house secluded in quiet countryside paths. There are trained and they voted for the final label. The label of
is a humble but modern space of clear view with no walls, no higher priority is selected among the 7 label tags returned by
gates. In a large garden are carelessly grown grass and wild the 7 models regardless of its incidence. This priority-based

ISBN 979-11-88428-01-4 ICACT2018 February 11 ~ 14, 2018


International Conference on Advanced Communications Technology(ICACT) 541

voting scheme increases the recall rate of search by expanding [9] J. Shin, and C. Ock, “Optional features for speeding up UTagger,” in
Proc. Of the 24th Annual Conference on Human and Cognitive
potential keywords that would be extracted. The method has
Language Technology, 2012.
been implemented in the construction of place-related image [10] T. Ek, C. Kirkegaard, H. Jonsson, and P. Nugues, “Named Entity
content DB. The place-related keywords extraction system Recognition for Short Text Messages,” in Proc. Of International
renders an average precision of about 76% and an average Conference of the Pacific Association for Computational Linguistics,
2011.
recall of about 94% according to human user estimation for
[11] (2017) Deeplearning4J [Online]. Available: https://deeplearning4j.org/
the keywords extracted.
This paper shows it is possible to deliver reasonable
performance in the automatic extracting of theme-related
keywords when using machine learning technique, especially,
Joonmyun Cho received his B.S., M.S. and Ph.D.
RNN, even when the measure (in this paper, themes of image degrees in mechanical engineering from
contents) is abstract and vague. Moreover, even in case the KAIST(Korea Advanced Institute of Science and
attributes used for the training data has some noise (in this Technology), South Korea, in 1993, 1995 and 2006,
respectively. He joined ETRI(Electronics and
paper, inaccurate semantic categories of a word with respect Telecommunications Research Institute), South
to the word’s exact meaning in text), the accuracy of the Korea in 2007 and was involved with the URC
extracted keywords is reliable. The method in this paper can (Ubiquitous Robotic Companion) project until 2011
also be used to determine the sort of photos and videos for and Beyond Smart TV project until 2015. Dr. Cho is
currently working in Intelligent IoT SW Platform
classification by analysing user comments on the contents
project as a senior researcher. His research interests
instead of analysing image contents themselves. include knowledge based systems, intelligent agent systems and machine
learning.
ACKNOWLEDGMENT
This research is supported by Ministry of Culture, Sports Yoon-Seop Chang received his B.S., M.S., and Ph.D.
and Tourism (MCST) and Korea Creative Content Agency degrees in geographic information system from Seoul
(KOCCA) in the Culture Technology (CT) Research & National University, South Korea, in 1999, 2001 and
Development Program 2017. 2005, respectively. He joined ETRI (Electronics and
Telecommunications Research Institute), South
Korea, in 2005 and is currently working as a principal
REFERENCES researcher. Since 2008, Dr. Chang has also been a
[1] S. Kumar, F. Morstatter, and H. Liu, “Twitter Data Analytics,” faculty member of University of Science and
Database Management & Information Retrieval, 2013. Technology, South Korea, as an associate professor.
[2] A. Rae, A. Popescu, V. Murdock, and H. Bouchard, “Mining the Web His research interests include geographic information
for Points of Interest,” in Proc. Of International SIGIR Conference on system, web mashup, augmented reality and virtual reality.
Research and Development in Information Retrieval, 2012.
[3] (2017) flickr [Online]. Available: https://www.flickr.com/
[4] (2017) facebook [Online]. Available: https://www.facebook.com/ Seong-Ho Lee received his B.S. and M.S. degrees in
[5] K. R. Kim, “Location Extraction from Meeting Announcements,” computer science from Chungbuk National
KAIST, Master’s Thesis, 2012. University, South Korea, in 1997 and 2000,
[6] B. T. Jang, “Next-Generation Machine Learning Technologies,” respectively. Since 2000, he has been a senior
Communications of the Korean Institute of Information Scientists and member of research staff with ETRI, South Korea,
Engineers, 2007. and he is also working toward the Ph.D. degree in
[7] A. Graves, “Supervised Sequence Labelling with Recurrent Neural computer science Chungbuk National University. Mr.
Networks,” Studies in Computational Intelligence, Springer, 2012. Lee is currently working in Location-based Smart
[8] Y. Bae, and C. Ock, “Introduction to the Korean Word Content Platform project as a senior researcher. His
Map(UWordMap) and API,” in Proc. of the 26th Annual Conference research interests are spatio-temporal database
on Human and Cognitive Language Technology, 2014. systems, geographic information systems, and location-based services.

ISBN 979-11-88428-01-4 ICACT2018 February 11 ~ 14, 2018

You might also like