Hierarchical information clustering by means of topologically embedded graphs

Song, Won-Min; Di Matteo, T.; Aste, Tomaso

doi:10.1371/journal.pone.0031929

Physics > Data Analysis, Statistics and Probability

arXiv:1110.4477 (physics)

[Submitted on 20 Oct 2011]

Title:Hierarchical information clustering by means of topologically embedded graphs

Authors:Won-Min Song, T. Di Matteo, Tomaso Aste

View PDF

Abstract:We introduce a graph-theoretic approach to extract clusters and hierarchies in complex data-sets in an unsupervised and deterministic manner, without the use of any prior information. This is achieved by building topologically embedded networks containing the subset of most significant links and analyzing the network structure. For a planar embedding, this method provides both the intra-cluster hierarchy, which describes the way clusters are composed, and the inter-cluster hierarchy which describes how clusters gather together. We discuss performance, robustness and reliability of this method by first investigating several artificial data-sets, finding that it can outperform significantly other established approaches. Then we show that our method can successfully differentiate meaningful clusters and hierarchies in a variety of real data-sets. In particular, we find that the application to gene expression patterns of lymphoma samples uncovers biologically significant groups of genes which play key-roles in diagnosis, prognosis and treatment of some of the most relevant human lymphoid malignancies.

Comments:	33 Pages, 18 Figures, 5 Tables
Subjects:	Data Analysis, Statistics and Probability (physics.data-an); Data Structures and Algorithms (cs.DS); Biological Physics (physics.bio-ph); Quantitative Methods (q-bio.QM); Computational Finance (q-fin.CP)
Cite as:	arXiv:1110.4477 [physics.data-an]
	(or arXiv:1110.4477v1 [physics.data-an] for this version)
	https://doi.org/10.48550/arXiv.1110.4477
Journal reference:	PLoS ONE 7 (2012) e31929
Related DOI:	https://doi.org/10.1371/journal.pone.0031929

Submission history

From: Tomaso Aste [view email]
[v1] Thu, 20 Oct 2011 09:43:02 UTC (1,941 KB)

Physics > Data Analysis, Statistics and Probability

Title:Hierarchical information clustering by means of topologically embedded graphs

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Physics > Data Analysis, Statistics and Probability

Title:Hierarchical information clustering by means of topologically embedded graphs

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators