Mining and using sets of patterns through compression

M Van Leeuwen, J Vreeken - Frequent Pattern Mining, 2014 - Springer
Frequent Pattern Mining, 2014Springer
In this chapter we describe how to successfully apply the MDL principle to pattern mining. In
particular, we discuss how pattern-based models can be designed and induced by means of
compression, resulting in succinct and characteristic descriptions of the data. As motivation,
we argue that traditional pattern mining asks the wrong question: instead of asking for all
patterns satisfying some interestingness measure, one should ask for a small, non-
redundant, and interesting set of patterns—which allows us to avoid the pattern explosion …
Abstract
In this chapter we describe how to successfully apply the MDL principle to pattern mining. In particular, we discuss how pattern-based models can be designed and induced by means of compression, resulting in succinct and characteristic descriptions of the data.
As motivation, we argue that traditional pattern mining asks the wrong question: instead of asking for all patterns satisfying some interestingness measure, one should ask for a small, non-redundant, and interesting set of patterns—which allows us to avoid the pattern explosion. Firmly rooted in algorithmic information theory, the approach we discuss in this chapter states that the best set of patterns is that set that compresses the data best. We formalize this problem using the Minimum Description Length (MDL) principle, describe useful model classes, and briefly discuss algorithmic approaches to inducing good models from data. Last but not least, we describe how the obtained models—in addition to showing the key patterns of the data—can be used for a wide range of data mining tasks; hence showing that MDL selects useful patterns.
Springer
Showing the best result for this search. See all results