Google Scholar

A New Effective Neural Variational Model with Mixture-of-Gaussians Prior for Text Clustering

M Li, H Tang, B Jin, C Zong - 2019 IEEE 31st International …, 2019 - ieeexplore.ieee.org

2019 IEEE 31st International Conference on Tools with Artificial …, 2019•ieeexplore.ieee.org

Text clustering is one of the fundamental tasks in natural language processing and text data mining. It remains challenging because texts have complex internal structure besides the sparsity in the high-dimensional representation. In the paper, we propose a new Neural Variational model with mixture-of-Gaussians prior for Text Clustering (abbr. NVTC) to reveal the underlying textual manifold structure and cluster documents effectively. NVTC is a deep latent variable model built on the basis of the neural variational inference. In NVTC, the stochastic latent variable, which is modeled as one obeying a Gaussian mixture distribution, plays an important role in establishing the association of documents and document labels. On the other hand, by joint learning, NVTC simultaneously learns text encoded representations and cluster assignments. Experimental results demonstrate that NVTC is able to learn clustering-friendly representations of texts. It significantly outperforms several baselines including VAE+GMM, VaDE, LCK-NFC, GSDPMM and LDA on four benchmark text datasets in terms of ACC, NMI, and AMI. Furthermore, NVTC learns effective latent embeddings of texts which are interpretable by topics of texts, where each dimension of latent embeddings corresponds to a specific topic.

ieeexplore.ieee.org

Show moreShow less

Showing the best result for this search. See all results

Cite

Advanced search

Saved to My library

A New Effective Neural Variational Model with Mixture-of-Gaussians Prior for Text Clustering