[PDF][PDF] Unsupervised Learning with Term Clustering for Thematic Segmentation of Texts.

M Caillet, JF Pessiot, MR Amini, P Gallinari - RIAO, 2004 - Citeseer
M Caillet, JF Pessiot, MR Amini, P Gallinari
RIAO, 2004Citeseer
In this paper we introduce a machine learning approach for automatic text segmentation.
Our text segmenter clusters text-segments containing similar concepts. It first discovers the
different concepts present in a text, each concept being defined as a set of representative
terms. After that the text is partitioned into coherent paragraphs using a clustering technique
based on the Classification Maximum Likelihood approach. We evaluate the effectiveness of
this technique on sets of concatenated paragraphs from two collections, the 7sectors and the …
Abstract
In this paper we introduce a machine learning approach for automatic text segmentation. Our text segmenter clusters text-segments containing similar concepts. It first discovers the different concepts present in a text, each concept being defined as a set of representative terms. After that the text is partitioned into coherent paragraphs using a clustering technique based on the Classification Maximum Likelihood approach. We evaluate the effectiveness of this technique on sets of concatenated paragraphs from two collections, the 7sectors and the 20 Newsgroups corpus, and compare it to a baseline text segmentation technique proposed by Salton et al.
Citeseer
Showing the best result for this search. See all results