Semi-Supervised Multinomial Naive Bayes for Text Classification by Leveraging Word-Level Statistical Constraint

Li Zhao; Minlie Huang; Ziyu Yao; Rongwei Su; Yingying Jiang; Xiaoyan Zhu

doi:10.1609/aaai.v30i1.10345

Semi-Supervised Multinomial Naive Bayes for Text Classification by Leveraging Word-Level Statistical Constraint

Authors

Li Zhao Tsinghua University
Minlie Huang Tsinghua University
Ziyu Yao Beijing University of Posts and Telecommunications
Rongwei Su Samsung Research and Development Institute China - Beijing
Yingying Jiang Samsung Research and Development Institute China - Beijing
Xiaoyan Zhu Tsinghua University

DOI:

https://doi.org/10.1609/aaai.v30i1.10345

Keywords:

text classification, semi-supervised learning

Abstract

Multinomial Naive Bayes with Expectation Maximization (MNB-EM) is a standard semi-supervised learning method to augment Multinomial Naive Bayes (MNB) for text classification. Despite its success, MNB-EM is not stable, and may succeed or fail to improve MNB. We believe that this is because MNB-EM lacks the ability to preserve the class distribution on words. In this paper, we propose a novel method to augment MNB-EM by leveraging the word-level statistical constraint to preserve the class distribution on words. The word-level statistical constraints are further converted to constraints on document posteriors generated by MNB-EM. Experiments demonstrate that our method can consistently improve MNB-EM, and outperforms state-of-art baselines remarkably.

Downloads

Published

2016-03-05

How to Cite

Zhao, L., Huang, M., Yao, Z., Su, R., Jiang, Y., & Zhu, X. (2016). Semi-Supervised Multinomial Naive Bayes for Text Classification by Leveraging Word-Level Statistical Constraint. Proceedings of the AAAI Conference on Artificial Intelligence, 30(1). https://doi.org/10.1609/aaai.v30i1.10345

Download Citation

Issue

Vol. 30 No. 1 (2016): Thirtieth AAAI Conference on Artificial Intelligence

Section

Technical Papers: NLP and Machine Learning

Semi-Supervised Multinomial Naive Bayes for Text Classification by Leveraging Word-Level Statistical Constraint

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Subscription