Model-induced term-weighting schemes for text classification

HK Kim, M Kim - Applied Intelligence, 2016 - Springer
HK Kim, M Kim
Applied Intelligence, 2016Springer
The bag-of-words representation of text data is very popular for document classification. In
the recent literature, it has been shown that properly weighting the term feature vector can
improve the classification performance significantly beyond the original term-frequency
based features. In this paper we demystify the success of the recent term-weighting
strategies as well as provide possibly more reasonable modifications. We then propose
novel term-weighting schemes that can be induced from the well-known document …
Abstract
The bag-of-words representation of text data is very popular for document classification. In the recent literature, it has been shown that properly weighting the term feature vector can improve the classification performance significantly beyond the original term-frequency based features. In this paper we demystify the success of the recent term-weighting strategies as well as provide possibly more reasonable modifications. We then propose novel term-weighting schemes that can be induced from the well-known document probabilistic models such as the Naive Bayes and the multinomial term model. Interestingly, some of the intuition-based term-weighting schemes coincide exactly with the proposed derivations. Our term-weighting schemes are tested on large-scale text classification problems/datasets where we demonstrate improved prediction performance over existing approaches.
Springer
Showing the best result for this search. See all results