Efficient approach for incremental Vietnamese document clustering

TA Nguyen Hoang, K Hoang - … of the eleventh international workshop on …, 2009 - dl.acm.org
Proceedings of the eleventh international workshop on Web information and …, 2009dl.acm.org
In this paper, we present how to use graph model for clustering Vietnamese document
incrementally. Graph based model allows us to model completely the structure of not only
each document but also the whole collection of documents. The graph structure is easily
updated when there is a new document. When building the graph incrementally we can
identify representative subgraph features, which are later used for calculating hybrid pair-
wise document similarity. These subgraph features make clustering process less sensitive to …
In this paper, we present how to use graph model for clustering Vietnamese document incrementally. Graph based model allows us to model completely the structure of not only each document but also the whole collection of documents. The graph structure is easily updated when there is a new document. When building the graph incrementally we can identify representative subgraph features, which are later used for calculating hybrid pair-wise document similarity. These subgraph features make clustering process less sensitive to the Vietnamese word segmentation step. Based on the hybrid similarity measure, the documents are groups into clusters on-the-fly without any assumptions on the number of clusters and without retrieving previous documents.
ACM Digital Library
Showing the best result for this search. See all results