A New Effective Neural Variational Model with Mixture-of-Gaussians Prior for Text Clustering
2019 IEEE 31st International Conference on Tools with Artificial …, 2019•ieeexplore.ieee.org
Text clustering is one of the fundamental tasks in natural language processing and text data
mining. It remains challenging because texts have complex internal structure besides the
sparsity in the high-dimensional representation. In the paper, we propose a new Neural
Variational model with mixture-of-Gaussians prior for Text Clustering (abbr. NVTC) to reveal
the underlying textual manifold structure and cluster documents effectively. NVTC is a deep
latent variable model built on the basis of the neural variational inference. In NVTC, the …
mining. It remains challenging because texts have complex internal structure besides the
sparsity in the high-dimensional representation. In the paper, we propose a new Neural
Variational model with mixture-of-Gaussians prior for Text Clustering (abbr. NVTC) to reveal
the underlying textual manifold structure and cluster documents effectively. NVTC is a deep
latent variable model built on the basis of the neural variational inference. In NVTC, the …
Text clustering is one of the fundamental tasks in natural language processing and text data mining. It remains challenging because texts have complex internal structure besides the sparsity in the high-dimensional representation. In the paper, we propose a new Neural Variational model with mixture-of-Gaussians prior for Text Clustering (abbr. NVTC) to reveal the underlying textual manifold structure and cluster documents effectively. NVTC is a deep latent variable model built on the basis of the neural variational inference. In NVTC, the stochastic latent variable, which is modeled as one obeying a Gaussian mixture distribution, plays an important role in establishing the association of documents and document labels. On the other hand, by joint learning, NVTC simultaneously learns text encoded representations and cluster assignments. Experimental results demonstrate that NVTC is able to learn clustering-friendly representations of texts. It significantly outperforms several baselines including VAE+GMM, VaDE, LCK-NFC, GSDPMM and LDA on four benchmark text datasets in terms of ACC, NMI, and AMI. Furthermore, NVTC learns effective latent embeddings of texts which are interpretable by topics of texts, where each dimension of latent embeddings corresponds to a specific topic.
ieeexplore.ieee.org
Showing the best result for this search. See all results