[PDF][PDF] Keyword Extraction in German: Information-theory vs. Deep Learning.

M Kölbl, Y Kyogoku, JN Philipp, M Richter, C Rietdorf… - ICAART (1), 2020 - scitepress.org
M Kölbl, Y Kyogoku, JN Philipp, M Richter, C Rietdorf, T Yousef
ICAART (1), 2020scitepress.org
This paper reports the results of a study on automatic keyword extraction in German. We
employed in general two types of methods:(A) an unsupervised method based on
information theory (Shannon, 1948). We employed (i) a bigram model,(ii) a probabilistic
parser model (Hale, 2001) and (iii) an innovative model which utilises topics as extra-
sentential contexts for the calculation of the information content of the words, and (B) a
supervised method employing a recurrent neural network (RNN). As baselines, we …
Abstract
This paper reports the results of a study on automatic keyword extraction in German. We employed in general two types of methods:(A) an unsupervised method based on information theory (Shannon, 1948). We employed (i) a bigram model,(ii) a probabilistic parser model (Hale, 2001) and (iii) an innovative model which utilises topics as extra-sentential contexts for the calculation of the information content of the words, and (B) a supervised method employing a recurrent neural network (RNN). As baselines, we employed TextRank and the TF-IDF ranking function. The topic model (A)(iii) outperformed clearly all remaining models, even TextRank and TF-IDF. In contrast, RNN performed poorly. We take the results as first evidence, that (i) information content can be employed for keyword extraction tasks and has thus a clear correspondence to semantics of natural language’s, and (ii) that-as a cognitive principle-the information content of words is determined from extra-sentential contexts, that is to say, from the discourse of words.
scitepress.org
Showing the best result for this search. See all results