Using Background Knowledge to Rank Itemsets

Tatti, Nikolaj; Mampaey, Michael

doi:10.1007/s10618-010-0188-4

Computer Science > Machine Learning

arXiv:1902.03102 (cs)

[Submitted on 8 Feb 2019]

Title:Using Background Knowledge to Rank Itemsets

Authors:Nikolaj Tatti, Michael Mampaey

View PDF

Abstract:Assessing the quality of discovered results is an important open problem in data mining. Such assessment is particularly vital when mining itemsets, since commonly many of the discovered patterns can be easily explained by background knowledge. The simplest approach to screen uninteresting patterns is to compare the observed frequency against the independence model. Since the parameters for the independence model are the column margins, we can view such screening as a way of using the column margins as background knowledge.
In this paper we study techniques for more flexible approaches for infusing background knowledge. Namely, we show that we can efficiently use additional knowledge such as row margins, lazarus counts, and bounds of ones. We demonstrate that these statistics describe forms of data that occur in practice and have been studied in data mining.
To infuse the information efficiently we use a maximum entropy approach. In its general setting, solving a maximum entropy model is infeasible, but we demonstrate that for our setting it can be solved in polynomial time. Experiments show that more sophisticated models fit the data better and that using more information improves the frequency prediction of itemsets.

Subjects:	Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)
Cite as:	arXiv:1902.03102 [cs.LG]
	(or arXiv:1902.03102v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1902.03102
Related DOI:	https://doi.org/10.1007/s10618-010-0188-4

Submission history

From: Nikolaj Tatti [view email]
[v1] Fri, 8 Feb 2019 14:36:13 UTC (29 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2019-02

Change to browse by:

cs
cs.DS
stat
stat.ML

References & Citations

DBLP - CS Bibliography

listing | bibtex

Nikolaj Tatti
Michael Mampaey

export BibTeX citation

Computer Science > Machine Learning

Title:Using Background Knowledge to Rank Itemsets

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Using Background Knowledge to Rank Itemsets

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators