Abstract
We introduce a new approach to Exceptional Model Mining. Our algorithm, called EMDM, is an iterative method that alternates between Exception Maximisation and Description Minimisation. As a result, it finds maximally exceptional models with minimal descriptions. Exceptional Model Mining was recently introduced by Leman et al. (Exceptional model mining 1–16, 2008) as a generalisation of Subgroup Discovery. Instead of considering a single target attribute, it allows for multiple ‘model’ attributes on which models are fitted. If the model for a subgroup is substantially different from the model for the complete database, it is regarded as an exceptional model. To measure exceptionality, we propose two information-theoretic measures. One is based on the Kullback–Leibler divergence, the other on Krimp. We show how compression can be used for exception maximisation with these measures, and how classification can be used for description minimisation. Experiments show that our approach efficiently identifies subgroups that are both exceptional and interesting.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Andritsos P, Tsaparas P, Miller RJ, Sevcik KC (2004) LIMBO: scalable clustering of categorical data. In: Proceedings of the EDBT, pp 124–146
Asuncion A, Newman DJ (2007) UCI machine learning repository. http://www.ics.uci.edu/~mlearn/MLRepository.html
Cohen WW (1995) Fast effective rule induction. In: Proceedings of the ICML’95, pp 115–123
Garriga GC, Heikinheimo H, Seppänen JK (2007) Cross-mining binary and numerical attributes. In: Proceedings of the ICDM’07, pp 481–486
Heikinheimo H, Fortelius M, Eronen J, Mannila H (2007) Biogeography of european land mammals shows environmentally distinct and spatially coherent clusters. J Biogeogr 34(6): 1053–1064
Klösgen W (2002) Subgroup discovery chapter 16.3. Oxford University Press, Oxford
Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1): 79–86
Leeuwen M, Vreeken J, Siebes A (2006) Compression picks the item sets that matter. In: Proceedings of the ECML PKDD’06 pp 585–592
Leeuwen M, Bonchi F, Sigurbjörnsson B, Siebes A (2009) Compressing tags to find interesting media groups. In: Proceedings of the CIKM’09, pp 1147–1156
Leman D, Feelders A, Knobbe A (2008) Exceptional model mining. In: Proceedings of the ECML/ PKDD’08, 2:1–16
Mitchell-Jones AJ, Amori G, Bogdanowicz W, Krystufek B, Reijnders PJH, Spitzenberger F, Stubbe M, Thissen JBM, Vohralik V, Zima J (1999) The atlas of european mammals. Academic Press, London
Rissanen J (1978) Modeling by shortest data description. Automatica 14(1): 465–471
Siebes A, Vreeken J, van Leeuwen M (2006) Item sets that compress. In: Proceedings of the SDM’06, pp 393–404
Slonim N, Tishby N (1999) Agglomerative information bottleneck. In: Proceedings of the NIPS’99, pp 617–623
Tsoumakas G, Vilcek J, Spyromitros L (2010) MULAN: a java library for multi-label learning. http://mulan.sourceforge.net/
Umek L, Zupan B, Toplak M, Morin A, Chauchat J-H, Makovec G, Smrke D (2009) Subgroup discovery in data sets with multi-dimensional responses: A method and a case study in traumatology. In: Proceedings of AIME’09, pp 265–274
Warner HR, Toronto AF, Veasey LR, Stephenson R (1961) A mathematical model for medical diagnosis, application to congenital heart disease. J Am Med Assoc 177: 177–184
Witten IH, Frank Eibe (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editors: José L Balcázar, Francesco Bonchi, Aristides Gionis, Michèle Sebag.
Rights and permissions
Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
About this article
Cite this article
van Leeuwen, M. Maximal exceptions with minimal descriptions. Data Min Knowl Disc 21, 259–276 (2010). https://doi.org/10.1007/s10618-010-0187-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-010-0187-5