Abstract
A serious problem encountered during the mining of association rules is the exponential growth of their cardinality. Unfortunately, the known algorithms for mining association rules typically generate scores of redundant and duplicate rules. Thus, we not only waste CPU but also encounter difficulties saving, managing and using the results of these algorithms. The present paper focuses on the discovery of association rules in which the left-handed and right-handed sides contain in two user-supplied maximum single constraints. If the constraints appear on or differ from a lattice of closed itemsets (together with their typically undersized generators and supports) that have been mined and saved once, we quickly extract the corresponding frequent sub one. Using an equivalence relation based on the closure of the two rule sides, the association rule set with maximum single constraints is partitioned into disjoint equivalence classes. Without loss of generality, it is necessary to consider mining each class independently. This helps avoid the wasteful generation of numerous candidates, reduces the burden of storing the support and confidence of rules in the same class and establishes a foundation for mining algorithms in parallel and distributed environments. In each class, the rules are represented as unique and explicit via the corresponding closed itemsets and generators. Due to the low cardinality and size of the generators, mining based on these representations, which does not generate duplicates, is very efficient. In the present paper, all these theoretical results are proven mathematically and used to construct the \(MAR\_MaxSC\) algorithm. The efficiency of \(MAR\_MaxSC\) compared with post-processing methods for mining association rules with maximum single constraints is then verified on several characteristic databases.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
Mining association rules is useful for applications that examine how often two or more items of interest co-occur. For example, in market basket analysis, we find that the set of milk, bread and eggs occur in 80% of all transactions. This customer behavior gives us clues regarding the store placement of milk, bread and eggs. Further, we can discover rules such as “the proportion of customers who buy eggs among those who bought milk and bread is 90%”. This rule can be applied toward marketing strategies; for example, the promotion of milk and bread to increase the sale of eggs. Formally, the problem of mining association rules [2] is stated as follows: Given database \({\mathcal T}=({\mathcal O},{\mathcal A},{\mathcal R}),\mathrm{m}=|{\mathcal A}|\) (where \(\left| {\mathcal A} \right| \) is the cardinality of \({\mathcal A})\), \(\mathrm{n}=|{\mathcal O}|\) and minimum support and confidence thresholds \(\mathrm{s}_{0} ,\mathrm{c}_{0} ~\in ~(0;~1]\), the task is to mine association rules satisfying \(\mathrm{s}_{0} ,\mathrm{c}_{0} \). This problem can be solved in two steps: (1) extracting the frequent itemsets with \(\mathrm{s}_{0} \) and (2) generating association rules from these sets for \(\mathrm{c}_{0} \).
The method used to solve the above second step is simple. We first enumerate all the nonempty, proper subsets of Z: \(\emptyset \subset \mathrm{X}\subset \mathrm{Z}\). Then, we obtain association rules of the form \(\mathrm{X}\rightarrow \mathrm{Z}\backslash \mathrm{X}\), compute their confidences and filter out those satisfying \(\mathrm{c}_{0} \). Hence, most researchers concentrate on mining frequent itemsets (step 1). The Apriori method proposed in [1] and a similar independently developed approach [27] were the first algorithms proposed for mining frequent itemsets. Apriori and its variants (Apriori-Hybrid [1], DHP [30]) show reliable performance on sparse databases with simple itemsets such as market databases; however, on complex databases such as those consisting of bio-sequences and telecommunication networks, they typically generate numerous candidates or require several database passes. Recently, algorithms based on frequent pattern trees (FP trees) have been developed [15, 20], wherein the original database is compressed into an FP-Tree or similar tree structure. Using the divide-and-conquer and depth-first search methods, all the large itemsets are mined from the frequent 1-itemsets without requiring a second database pass. However, in interactive or incremental mining systems, where users often change the minimum support required as well as insert new transactions into the original database, FP-Tree-inspired structures are unsuitable because the trees must be rebuilt. All these algorithms work with horizontal formatted databases. In addition, the Eclat algorithm proposed by [39] executes a transaction identification set (tidset) intersection approach using a vertical data format. A modification of Eclat with “diffsets” called Declat [40] is often applied to solve frequent itemset mining tasks. For an experimental comparison between several of the frequent itemset mining algorithms, see [18].
The search space of frequent itemsets is frequently vast and grows exponentially with the number of items. In addition, the generation of frequent itemsets produces significant duplication. In particular, a low minimum support threshold can generate a huge number of frequent itemsets. For example, a frequent itemset with m items might produce 2\(^{\mathrm{m}-1}\) subsets. Hence, mining databases that produce several long, frequent itemsets is an impossible task due to its associated computational and storage requirements. An alternative approach is to utilize condensed, lossless representations of frequent itemsets. These representations both reduce CPU and memory requirements and enable the efficient management and storage of the results generated. Two types of condensed representations are maximal and closed itemsets (and their generators). The GenMax algorithm described in [19] mines maximal itemsets using a tidset intersection approach. An Apriori-based alternative algorithm called MaxMiner [10] uses extremely effective itemset pruning with a support lower bound. Other examples can be found in [11, 17]. Having low cardinality, maximal itemsets can be used to reproduce all the frequent itemsets. Unfortunately, we only know their lower support bounds. In addition, the generation of frequent itemsets from maximal itemsets may result in an intractable number of duplicates. Thus, maximal itemsets are unsuitable for association rule mining from frequent itemsets. Hence, it is necessary to find an objective solution. Indeed, the mining of closed itemsets (which is based on the lattice theoretic framework of formal concept analysis [16, 23, 38]) has received great attention for two reasons. First, the number of closed frequent itemsets is greater than the number of maximal frequent itemsets while typically being orders of magnitude lower than the total number of frequent itemsets. Thus, their discovery can help purge redundant itemsets. Second, the set of all closed frequent itemsets is a condensed representation because we can determine whether an itemset X is frequent as well as the exact support of X. In other words, we can generate all the frequent itemsets based on the closed frequent itemsets. This generation is very effective if we also use closed frequent itemset generators. Charm_L [42], MinimalGenerator [41], Touch [34] and GenClose [8] are some typical algorithms for mining them.
Constraint-based association rule mining A serious problem encountered during the mining of frequent itemsets and association rules is that, in the worst case \(\left( {0<\mathrm{s}_{0} =\mathrm{c}_{0} ~\le 1/|\mathrm{n}|} \right) \), the cardinalities of frequent itemset class \({\mathcal F}{\mathcal S}(\mathrm{s}_{0} )\) and association rule set \({\mathcal A}{\mathcal R}{\mathcal S}(\mathrm{s}_{0} ,\mathrm{c}_{0} )\) can become unwieldy (e.g., \(\mathrm{Max}(\# {\mathcal F}{\mathcal S}(\mathrm{s}_{0} ))=2^{\mathrm{m}}-1={\mathcal O}(2^{\mathrm{m}})\), \(\mathrm{Max}(\# {\mathcal A}{\mathcal R}{\mathcal S}(\mathrm{s}_{0} ,\mathrm{c}_{0} ))=~3^{\mathrm{m}}~-2^{\mathrm{m}}+1~=\,{\mathcal O}(3^{\mathrm{m}}))\). In addition, their generation typically produces an intractable number of duplicates (included in both candidates and solutions) that must then be eliminated. Thus, we not only waste computational and storage resources, but it is difficult to save, manage and use the results generated. Hence, in the interest of increased practicality, it is preferable to mine a suitable number of association rules subjected to user constraints.
One common rule mining approach is to filter the generated rule set via constraints on ‘interestingness’ measures until its size becomes manageable [12, 24, 35]. An alternative approach is to use inference methods to prune association rules that can be derived from other rules [14, 25]. Zaki [41] proposes an algorithm for mining the “most general” rules with minimal antecedents and consequents (in terms of the subset relation) in a collection of rules with identical support and confidence. The rule for listing all the remaining rules is given in [37]. Pasquier et al. [32] and Tin et al. [36] concentrate on methods to discover rules with minimal antecedents and maximal consequents in rule classes with identical closures.
Several recent studies have focused on the discovery of frequent itemsets and association rules based on constraints (the reader is referred to [29] for details). Indeed, [28] added constraints such as monotone, anti-monotone, etc., to the mining process. The problem of integrating Boolean constraints, referring to the presence or absence of items in rules, was considered by [35]. In contrast, [10] proposed mining with a minimum “improvement” threshold. They also considered association rules with constraints in their right-handed sides. The concept of tree boundaries has been proposed to reduce the running times of the aforementioned mining methods. In addition, algorithms for mining multi-dimension association rules are given in [26].
1.1 Problem statement
We have recently concentrated on frequent itemset and association rule mining with frequently modified constraints, which directly involve support and confidence thresholds in addition to items. For example, online users who know frequent keyword sets contained in a class of keyword sets on a given subject might be interested in association rules between two given subjects. In [4, 5], we solved the problem of finding frequent itemsets that are contained in a set given constraint \({\mathcal C}\) or contain at least one of its items. Hai et al. [21] applied double constraints to the problem. The discovery of association rules with various constraint types (the two-side union contained in a constraint, the intersection of a rule side with a constraint is not empty, the left-handed and right-handed sides contain two constraints, respectively) is considered in [6, 9, 22].
The present paper focuses on the mining of association rules based on maximum single constraints on both rule sides, which is stated as follows. Given four thresholds, minimum support \(\mathrm{s}_{0} \), maximum support \(\mathrm{s}_{1} \), minimum confidence \(\mathrm{c}_{0} \) and maximum confidence \(\mathrm{c}_{1} \), such that \(0<\mathrm{s}_{0}\le \mathrm{s}_{1} \le 1,~0~<\,\mathrm{c}_{0}\le \mathrm{c}_{1} \le 1\) and two nonempty constraint itemsets in accordance with the two rule sides: \(\varnothing \subset \mathrm{L}_{1}, \mathrm{R}_{1} ,\subseteq {\mathcal A},\) the task is to discover the association rules \(\mathrm{r}:\mathrm{L}^{\prime }\rightarrow \mathrm{R}^{\prime }\) whose support and confidence are sandwiched by two pairs, \(\left( {\mathrm{s}_{0} , \mathrm{s}_\mathrm{1} } \right) \), \(\left( {\mathrm{c}_{0} , \mathrm{c}_\mathrm{1} } \right) \), and whose sides are contained in \(\mathrm{L}_{1} \) and \(\mathrm{R}_{1} \), respectively. More formally, we need to determine the set
where \({\mathcal A}{\mathcal R}{\mathcal S}(\mathrm{s}_{0} ,\mathrm{c}_{0} )=\{\mathrm{r}:\mathrm{L}^{\prime }\rightarrow \mathrm{R}^{\prime }\,|~\varnothing \ne \mathrm{L}^{\prime },\mathrm{R}^{\prime }\subseteq {\mathcal A},\mathrm{L}^{\prime }\cap \mathrm{R}^{\prime }=\varnothing ,\mathrm{S}^{\prime }\equiv \mathrm{L}^{\prime }+\mathrm{R}^{\prime },\mathrm{s}_{0} \le \mathrm{supp}(\mathrm{r}),\mathrm{c}_{0}\le \mathrm{conf}(\mathrm{r})\}\) includes the association rules r with the standard meaning (the support of r and its confidence are written as supp(r) and conf(r)). For \(\mathrm{s}_{1} =\mathrm{c}_{1} =1\) and \(\mathrm{L}_{1} =\mathrm{R}_{1} ={\mathcal A}\), we return the traditional mining problem. For smaller values of \(\mathrm{s}_{1} \) and greater values of \(\mathrm{c}_{0} \), we receive robust rules from unusual itemsets that are valuable in special cases.
1.2 Related work and approach
The traditional approach to generating association rules solves the problem in two phases: (1) discover frequent itemsets with constraints and (2) generate association rules with constraints from them. Srikant et al. [35] proposed a three-phase algorithm to mine association rules with item constraints. Apriori-based generation creates candidates containing given constraint items. A database pass allows for the computation of the supports of all the subsets of frequent itemsets with constraints. These frequent itemsets, together with their subsets and supports, are used to enumerate all the constrained association rules. However, similar to the different variants of the Apriori algorithm, it produces a large number of frequent itemset candidates as well as duplicates (D1, D2). Han et al. [20] introduced the idea of integrating constraints into the initialization of FP-trees. Pei et al. [33] proposed the concept of convertible constraints and combined them with FPGrowth for mining constrained frequent itemsets. Unfortunately, if the constraints change, the algorithms must be re-executed. Hence, they are unsuitable for user-interactive systems.
Post-processing approach In this approach, the constrained rule set \({\mathcal A}{\mathcal R}{\mathcal S}_{\subseteq \mathrm{L}_{1} ,~\subseteq \mathrm{R}_{1}} (\mathrm{s}_{0} ,\mathrm{s}_{1} ,\mathrm{c}_{0} ,\mathrm{c}_{1} )\) is discovered after the following two phases are completed: (1) determining the set of association rules without constraints \({\mathcal A}{\mathcal R}{\mathcal S}(\mathrm{s}_{0} ,\mathrm{c}_{0} )\) and (2) checking and filtering out those of the form \(\mathrm{r}:\mathrm{L}^{\prime }\rightarrow \mathrm{R}^{\prime }\) satisfying the constraints, i.e., \(\mathrm{supp}(\mathrm{r})~\le \mathrm{s}_{1} ,\mathrm{conf}(\mathrm{r})~\le \mathrm{c}_{1} \) and \(\mathrm{L}^{\prime }\subseteq \mathrm{L}_{1} ,\mathrm{R}^{\prime } \subseteq \mathrm{R}_{1} \).
As discussed in the Introduction, we can identify \({\mathcal A}{\mathcal R}{\mathcal S} (\mathrm{s}_{0} ,\mathrm{c}_{0} )\) by (1) finding the frequent itemset class \({\mathcal F}{\mathcal S}(\mathrm{s}_{0} )\) using algorithms such as Apriori, dEclat or FPGrowth, and then (2) for each \(\mathrm{S}^{\prime }\in {\mathcal F}{\mathcal S}(\mathrm{s}_{0} )\), listing all the rules \(\mathrm{r}:\mathrm{L}^{\prime }\rightarrow \mathrm{R}^{\prime } \in {\mathcal A}{\mathcal R}{\mathcal S}(\mathrm{s}_{0} ,\mathrm{c}_{0} ),\) with \(\varnothing \ne \mathrm{L}^{\prime } \subset \mathrm{S}^{\prime },\mathrm{R}^{\prime } \equiv \mathrm{S}^{\prime }\backslash \mathrm{L}^{\prime }\) (using the algorithms proposed by [3] or [31]). However, we encounter the same aforementioned difficulties when using this method.
A more efficient method, which mines \({\mathcal A}{\mathcal R}{\mathcal S}(\mathrm{s}_{0} ,\mathrm{c}_{0} )\), is based on a lattice \({\mathcal L}{\mathcal C}\) of frequent closed itemsets [7, 32, 36, 37, 41]. Rather than extracting all the frequent itemsets, we only extract the closed itemsets and determine the resulting lattice structure. Based on this lattice, the set \({\mathcal A}{\mathcal R}{\mathcal S}(\mathrm{s}_{0} ,\mathrm{c}_{0} )\) is split into disjoint equivalence classes of identical closures of left-hand-side and two-side unions. The elements in each class have identical support and confidence and are computed once. Using frequent closed itemset generators (lattice \({\mathcal L}{\mathcal C}{\mathcal G})\), [32] proposed algorithms for mining rule classes. However, these algorithms generated redundancies and duplicates. In [7, 36, 37], we completely pruned the generation of duplicates using unique rule representations (based on effective set techniques) in each class. We also discovered rules wherein the two-side union of each rule adheres to a given constraint (see [6]). This approach is very efficient because (1) we compute \({\mathcal L}{\mathcal C}{\mathcal G}\) once (using well-known algorithms such as CHARM_L and MinimalGegenators), and (2) it is suitable for use with frequently modified support and confidence thresholds.
Because the cardinality of \({\mathcal A}{\mathcal R}{\mathcal S}_{\subseteq \mathrm{L}_{1}, \subseteq \mathrm{R}_{1} } (\mathrm{s}_{0} ,\mathrm{s}_{1} ,\mathrm{c}_{0} ,\mathrm{c}_{1} )\) is typically small compared with \({\mathcal A}{\mathcal R}{\mathcal S}(\mathrm{s}_{0} ,\mathrm{c}_{0} )\), these post-processing algorithms consume significant computing resources to both discover the rules of \({\mathcal A}{\mathcal R}{\mathcal S}(\mathrm{s}_{0} ,\mathrm{c}_{0} )\) and filter out (using set operators) those in \({\mathcal A}{\mathcal R}{\mathcal S}_{\subseteq \mathrm{L}_{1} ,~\subseteq \mathrm{R}_{1} } (\mathrm{s}_{0} ,\mathrm{s}_{1} ,\mathrm{c}_{0} ,\mathrm{c}_{1} )\). Even in special cases, this solution set can be empty. In addition, if online users modify the support and confidence constraints, \({\mathcal A}{\mathcal R}{\mathcal S}(\mathrm{s}_{0} ,\mathrm{c}_{0} )\) must be re-computed, which decreases the speed of mining. Because the size and cardinality of \({\mathcal A}{\mathcal R}{\mathcal S}(\mathrm{s}_{0} ,\mathrm{c}_{0} )\) with \(\mathrm{s}_{0} =\mathrm{c}_{0} =1/\mathrm{n}\) are prohibitive, it becomes too complicated to mine and maintain.
1.3 Our approach
We use the lattice \({\mathcal L}{\mathcal C}{\mathcal G}\) of closed itemsets and generators because the cardinality of closed itemsets is typically orders of magnitude smaller than that of the total itemsets \({\mathcal F}{\mathcal S}(\mathrm{s}_{0} )\) (approximately 100 times smaller, as shown in [42]), and the ratio of the number of generators to the number of closed itemsets is approximately 1:2 (see [34]; thus, we only mine and save the lattice once. The frequent sub-lattice \({\mathcal L}{\mathcal C}{\mathcal G}\), with respect to the frequent closed itemsets satisfying specified constraints, can be quickly extracted from \({\mathcal L}{\mathcal C}{\mathcal G}\) whenever constraints emerge or change. To considerably decrease the number of duplicated candidates, it is necessary to partition the association rule set into disjoint classes. Using an equivalence relation on the closures of the two rule sides, \((\mathrm{L}\equiv \mathrm{h}(\mathrm{L}^{\prime })~\subseteq \mathrm{S}\equiv \mathrm{h}(\mathrm{L}^{\prime }+\mathrm{R}^{\prime }))\), the constrained association rule set \({\mathcal A}{\mathcal R}{\mathcal S}_{\subseteq \mathrm{L}_{1} ,~\subseteq \mathrm{R}_{1} } (\mathrm{s}_{0} ,\mathrm{s}_{1} ,\mathrm{c}_{0} ,\mathrm{c}_{1} )\) is partitioned into disjoint equivalence rule classes \({\mathcal A}{\mathcal R}_{\subseteq L_{1} ,~\subseteq R_{1} }^+ (L,S)\) for each (L, S) in \({\mathcal N}{\mathcal F}{\mathcal C}{\mathcal S}_{\subseteq L_{1} ,~\subseteq R_{1}} \left( {s_{0} ,s_{1} ,c_{0} ,c_{1} } \right) ,\) which contains all the closed itemset pairs satisfying the constraints. This prunes most of the duplicates produced during the generation of candidate rules and reduces the storage of the support and confidence of rules in the same class. In addition, it is also lays a foundation for designing efficient algorithms in parallel and distributed environments. The rules in each rule class, \({\mathcal A}{\mathcal R}_{\subseteq L_{1} , \subseteq R_{1} }^+ (L, S)\), are uniquely represented via the two closed itemsets L, S and their corresponding sets of generators \({\mathcal G}(\mathrm{L})\), \({\mathcal G}(\mathrm{S})\). These representations help us understand the rule structure and prevent duplicates. We propose an algorithm called \(MAR_{MaxSC}\), which mines a complete set of constrained rules in a negligible amount of time compared with post-processing methods.
1.4 Organization
The remainder of the paper is organized as follows. Section 2 covers the basic concepts of frequent itemset mining, association rule mining and closed itemset lattices. In Sect. 3, we first describe the partitioning of the constrained association rule set followed by the structure and unique representation of the rules of identical classes via closed itemsets and their generators. Based on theoretical results, the efficient MAR_MaxSC algorithm is proposed to completely and non-repeatedly mine a complete set of association rules given specific constraints. Section 4 describes the performance of the proposed algorithm compared with two post-processing algorithms. Finally, the conclusions of the study and future work are given in Sect. 5.
2 Preliminaries
Let \({\mathcal T}=({\mathcal O},{\mathcal A},{\mathcal R})\) be a binary database where \({\mathcal O}\) is a nonempty set of objects (transactions), \({\mathcal A}\) are attributes (items) appearing in the objects and \({\mathcal R}\) is a binary relation on \({\mathcal O}\times {\mathcal A}\). A subset \(\mathrm{A}\) of \({\mathcal A}\) is called an itemset. We consider the operator \({\uplambda }:\,2^{{\mathcal O}}{\rightarrow }2^{{\mathcal A}}\) from the class of all object sets to the class of all itemsets and the operator \(\uprho :\,2^{{\mathcal A}}{\rightarrow }2^{{\mathcal O}}\) from the class of all itemsets to the class of all object sets as follows:
(per convention: \(\lambda (\varnothing )\,=\,{\mathcal A},\,\uprho (\varnothing )\,=\,{\mathcal O})\). Itemset \(\uplambda (\mathrm{O})\) is the common itemset of all the objects in \(\mathrm{O},\) and \(\uprho (\mathrm{A})\) is the set of the objects included in A. We define the closure operator h on \(2^{{\mathcal A}}\) as the union mapping of \(\uplambda \) and \(\uprho \): \(\mathrm{h}=\,\uplambda \mathrm{o}\uprho \). Then, \(\mathrm{h}(\mathrm{A})\,= \,\uplambda (\uprho (\mathrm{A}))\) is called the closure of A. Itemset A is a closed itemset if and only if h(A)=A [31].
The support of an itemset A is defined as the frequency of occurrence of the objects containing A, \(\mathrm{supp}(\mathrm{A})\,\equiv |\rho (\mathrm{A})|/|{\mathcal O}|\). The minimum and maximum support thresholds are designated \(\mathrm{s}_{0} \) and \(\mathrm{s}_{1} \), respectively, with \(0\,<\,1/n\,\le \mathrm{s}_{0} \,\le \mathrm{s}_{1} \,\le \,1\,\)and \(\mathrm{n}=|{\mathcal O}|\). We only consider the non-trivial items in \({\mathcal A}\), \({\mathcal A}^{\mathrm{F}}\,\equiv \,\{\mathrm{a}\in {\mathcal A}:\mathrm{supp}(\{\mathrm{a}\})\mathrm{s}_{0} \}\). The class of all closed itemsets is referred to as \({\mathcal C}{\mathcal S}\). As the normal subset containment relation “\(\supseteq \)” on the subsets of \({\mathcal A}\) generates an order \(\le \,\) on \({\mathcal C}{\mathcal S}, {\mathcal L}{\mathcal C}\,\equiv (\{(\mathrm{S},\mathrm{supp}(\mathrm{S})|\mathrm{S}\in {\mathcal C}{\mathcal S}\},~\le )\) is the lattice of closed itemsets together with their support, which is represented by a Hass diagram. If the support of a nonempty itemset \(\mathrm{A}\) (for \(\mathrm{A }\subseteq {\mathcal A}^{\mathrm{F}})\) is greater than or equal to \(\mathrm{s}_{0}\) and less than or equal to \(\mathrm{s}_{1} \), i.e., \(\mathrm{s}_{0} \,\le \mathrm{supp}(\mathrm{A})\le \mathrm{s}_\mathrm{1} \,\), \(\mathrm{A}\) is called a frequent itemset. By convention, \(\mathrm{s}_{1} \) is identical to 1. Let \({\mathcal F}{\mathcal S}(\mathrm{s}_{0} ,\mathrm{s}_{1} )\equiv \{\mathrm{L}^{\prime }:\,\varnothing \ne \mathrm{L}^{\prime }\subseteq \,{\mathcal A},\,\mathrm{s}_{0} ~\le \mathrm{supp}(\mathrm{L}^{\prime })\,\le \mathrm{s}_{1} \}\) and \({\mathcal F}{\mathcal C}{\mathcal S}(\mathrm{s}_{0} ,\mathrm{s}_{1} )~\equiv {\mathcal F}{\mathcal S}(\mathrm{s}_{0} ,\mathrm{s}_{1} )\cap {\mathcal C}{\mathcal S}\) be the classes of all the frequent and closed itemsets, respectively.
For two nonempty itemsets \(\mathrm{G},\mathrm{A}:\,\varnothing \ne \mathrm{G}\subseteq \mathrm{A}\subseteq {\mathcal A}\), \(\mathrm{G}\) is called a generator [32] of A if and only ifFootnote 1 h(G) = h(A) and \((\mathrm{h}(\mathrm{G}^{\prime })~\subset \mathrm{h}(\mathrm{G}),\,\forall \mathrm{G^{\prime }}:\,\varnothing \ne \mathrm{G}^{\prime }\,\subset \mathrm{G})\). We denote \({\mathcal G}(\mathrm{A})\) as the class of all generators of A. Because \({\mathcal G}(\mathrm{A})\) is not empty and finite [8] and \(|{\mathcal G}(\mathrm{A})|\,=\mathrm{k}\), all its elements can be numbered as follows: \({\mathcal G}(\mathrm{A})=\{\mathrm{G}_{1} ,\mathrm{G}_{2} ,\ldots ,\mathrm{G}_\mathrm{k} \}\). Let \({\mathcal L}{\mathcal C}{\mathcal G}\,\equiv (\{<S,supp(S),\,{\mathcal G}(\mathrm{S})>|\,S\,\in {\mathcal C}\mathrm{S}\},~\le )\) be the lattice of all the closed itemsets together their generators and supports, and let \({\mathcal F}{\mathcal L}{\mathcal C}{\mathcal G}(\mathrm{s}_{0} ,\mathrm{s}_{1} )\equiv \,(\{<S,\,supp(S),\,{\mathcal G}(\mathrm{S})>|\,S\,\in {\mathcal F}{\mathcal C}{\mathcal S}(\mathrm{s}_{0} ,\mathrm{s}_{1} )\},~\le )\) be the sub lattice of the frequent itemsets.
For frequent itemset \(\mathrm{S}^{\prime }\in {\mathcal F}{\mathcal S}(\mathrm{s}_{0} ,\mathrm{s}_{1} )\), we remove a proper, nonempty subset L’ of S’ \((\,\varnothing \ne \mathrm{L}^{\prime }\,\subset \mathrm{S}^{\prime })\) and assign \(\mathrm{R}^{\prime }\equiv \mathrm{S}^{\prime }\backslash \mathrm{L}^{\prime }\). Thus, an implication \(\mathrm{r}:\mathrm{L}^{\prime }\rightarrow \mathrm{R}^{\prime }\) is called a rule created by \(\mathrm{L}^{\prime },\mathrm{R}^{\prime }\) (or \(\mathrm{L}^{\prime },\mathrm{S}^{\prime })\). The support and confidence of r are defined as \(\mathrm{supp}(\mathrm{r})\,\equiv \mathrm{supp}(\mathrm{S^{\prime }})\) and \(\mathrm{conf}(\mathrm{r})~\equiv \mathrm{supp}(\mathrm{S}^{\prime })/\mathrm{supp}(\mathrm{L}^{\prime }),\) respectively. For the given minimum and maximum confidences \(\mathrm{c}_\mathrm{o} ,\mathrm{c}_{1} (0\,<\,\mathrm{c}_{0} \,\le \mathrm{c}_{1} \,\le \,1)\), we consider r to be an association rule iff \(\mathrm{s}_{0} \,\le \mathrm{supp}(\mathrm{r})\,\le \mathrm{s}_{1}\) and \(\mathrm{c}_{0} \,\le \mathrm{conf}(\mathrm{r})\,\le \mathrm{c}_{1} \). When \(\mathrm{s}_{1} =1\) and \(\mathrm{c}_{1} =1\), we return the traditional concept of an association rule. The set of all association rules satisfying the thresholds of \(\mathrm{s}_{0} , \mathrm{s}_{1} , \mathrm{c}_{0} , \mathrm{c}_{1} \) is written as
The set of all rules that satisfy the two constraint itemsets \(\mathrm{L}_{1} , \mathrm{R}_\mathrm{1} \subseteq {\mathcal A}\), is denoted by
This also refers to the class of association rules with maximum single constraints, the association rule set with constraints or the constrained association rule.
3 Mining association rules based on maximum single constraints
3.1 Partitioning an association rule set with maximum single constraints
3.1.1 Rough partitioning
To considerably decrease the number of duplicated candidates, it is necessary to partition the association rule set into disjoint classes using a suitable equivalence relation. Based on the beautiful properties of operator h on lattice \({\mathcal L}{\mathcal C}{\mathcal G}\), we propose the following two equivalence relations on \({\mathcal F}{\mathcal S}(\mathrm{s}_{0} ,\mathrm{s}_{1} )\) and \({\mathcal A}{\mathcal R}{\mathcal S}(\mathrm{s}_{0} ,\mathrm{s}_{1} ,\mathrm{c}_{0} ,\mathrm{c}_{1} )\):
Definition 1
(Two equivalence relations on \({\mathcal F}{\mathcal S}(\mathrm{s}_{0} ,\mathrm{s}_{1} )\) and \({\mathcal A}{\mathcal R}{\mathcal S}(\mathrm{s}_{0} ,\mathrm{s}_{1} ,\mathrm{c}_{0} ,\mathrm{c}_{1} ))\).
-
(a)
\(\forall A,\,\,B\,\in {\mathcal F}{\mathcal S}(s_{0} ,s_{1} ), A \sim _{\mathcal A}\,B\Leftrightarrow h(A)\,=\,h(B).\)
-
(b)
\(\forall r_k :L_k \rightarrow R_k \in {\mathcal A}{\mathcal R}{\mathcal S}(s_{0} ,s_{1} ,c_{0} ,c_{1} ),\,k=1,2,\)
It follows from Definition 1 that \(\sim _{\mathcal A}\) and \(\sim _\mathrm{r}\) are two equivalence relations. Let \([\mathrm{L}]_{\mathcal A}\,\equiv \,\{\mathrm{L}^{\prime }\subseteq \mathrm{L}:\mathrm{L}^{\prime }\,\ne \varnothing \,,\mathrm{h}(\mathrm{L}^{\prime })\,=\mathrm{L}\}\) be the equivalence class of the frequent closed itemsets having the same closure \(\mathrm{L}\) where \(\mathrm{L}\in {\mathcal F}{\mathcal C}{\mathcal S}(\mathrm{s}_{0} ,\mathrm{s}_{1} )\). For \(\mathrm{L},\mathrm{S}\,\in {\mathcal F}{\mathcal C}{\mathcal S}(\mathrm{s}_{0} ,\mathrm{s}_{1} ), \varnothing \ne \mathrm{L}\subseteq \mathrm{S, supp(S)/supp(L)}\in [\mathrm{c}_{0}; \mathrm{c}_1 ]\), the class \({\mathcal A}{\mathcal R}(\mathrm{L},\mathrm{S})\,\equiv \{\mathrm{r}:\mathrm{L}^{\prime }\rightarrow \mathrm{R}^{\prime }\,|\mathrm{L}^{\prime }\,\in [\mathrm{L}]_{\mathcal A} ,\mathrm{R}^{\prime }\ne \varnothing ,\mathrm{L}^{\prime } \cap \mathrm{R}^{\prime }=\varnothing ,\mathrm{S}^{\prime }\,\equiv \mathrm{L}^{\prime }+\mathrm{R}^{\prime }\,\in \,[\mathrm{S}]_{\mathcal A} \}\) contains the rules \(\mathrm{r}:\mathrm{L}^{\prime }\rightarrow \mathrm{R}^{\prime }\) such that \(\mathrm{h}(\mathrm{L}^{\prime })=\mathrm{L},\mathrm{h}(\mathrm{L}^{\prime }+\mathrm{R}^{\prime })=\mathrm{S}\).
Remark 1
-
(a)
Using the properties of h, it is simple to show that \(\forall L\,\,\in {\mathcal F}{\mathcal C}{\mathcal S}(s_{0} ,s_{1} ),\,supp(L^{\prime })\,=\,supp(L),\,\forall L^{\prime }\in \,[L]_{\mathcal A} \). In other words, every frequent itemset of the same class \([L]_{\mathcal A} \) has identical support supp(L).
-
(b)
For every \(r:L^{\prime }\rightarrow R ^{\prime }\,\,{\mathcal A}{\mathcal R}{\mathcal S}(s_{0} ,s_{1} ,c_{0} ,\,c_{1} ),\) we denote \(L\,\equiv h(L^{\prime }),\,S ^{\prime }\,\equiv L^{\prime }+R^{\prime },\,S\,\equiv h(S^{\prime })\). We then have:\(\varnothing \ne L\,\subseteq S, supp(S^{\prime })=supp(S)\in [s_{0} , s_{1} ], conf(r) =supp (S^{\prime })/supp(L^{\prime }) = supp(S)/supp(L)\in [c_{0} ,c_{1} ]\) and \((L,\,S)\,\in {\mathcal N}{\mathcal F}{\mathcal C}{\mathcal S}\,(s_{0} ,s_{1} ,c_{0} ,c_{1} )\), where
$$\begin{aligned}&{\mathcal N}{\mathcal F}{\mathcal C}{\mathcal S}\,(s_{0} ,s_{1} ,c_{0} ,c_{1} )\equiv \,\{(L,\,S)~\in {\mathcal C}{\mathcal S}^{2}\,|\\&\quad \,S\,\,\in {\mathcal F}{\mathcal C}{\mathcal S}(s_{0} ,s_{1} ),\\&\quad \varnothing \ne L\,\subseteq S,\,supp(S)/supp(L)~[c_{0} ,\,c_{1} ]\}. \end{aligned}$$Consider \((L,\,S)~\in {\mathcal N}{\mathcal F}{\mathcal C}{\mathcal S}(s_{0} ,s_{1} ,c_{0} ,c_{1} )\). Thus, every rule in equivalence class \({\mathcal A}{\mathcal R}\,(L,\,S)\) has identical support and confidence, \(supp(S),\,supp(S)/supp(L),\) respectively. This fact considerably reduces the storage of the support and confidence of frequent itemsets and association rules.
-
(c)
The following is a partition of the set \({\mathcal A}{\mathcal R}{\mathcal S}(s_{0} ,s_{1} ,c_{0} ,c_{1} )\):
$$\begin{aligned} {\mathcal A}{\mathcal R}{\mathcal S}(s_{0} ,s_{1} ,c_{0} ,c_{1} ) {=} {\Sigma }_{(L,S)\in {\mathcal N}{\mathcal F}{\mathcal C}{\mathcal S}(s_{0} ,s_{1} ,c_{0} ,c_{1} )} {\mathcal A}{\mathcal R}(L,S). \end{aligned}$$As \({\mathcal A}{\mathcal R}{\mathcal S}_{\subseteq L_{1} ,\subseteq R_{1} } (s_{0} ,s_{1} ,c_{0} ,c_{1} )\subseteq {\mathcal A}{\mathcal R}{\mathcal S}(s_{0} ,s_{1} ,c_{0} ,c_{1} )\), it is straightforward to construct the following rough partition on the rule set with constraints \({\mathcal A}{\mathcal R}{\mathcal S}_{\subseteq L_{1} ,\,\subseteq R_{1} } (s_{0} ,s_{1} ,c_{0} ,c_{1} )\).
Proposition 1
(Roughly partitioning the constrained association rule set). We have
where \({\mathcal A}{\mathcal R}_{\subseteq L_{1} ,\,\subseteq R_{1} } (L, S)\,\equiv \{r:L^{\prime }\rightarrow R^{\prime } \in {\mathcal A}{\mathcal R}(L, S)\,| L^{\prime }\,\subseteq L_{1} ,R^{\prime }\,\subseteq R_{1} ^{(t)}\}\).
Based on this rough partition (obtained from the lattice \({\mathcal F}{\mathcal L}{\mathcal C}{\mathcal G}\) of frequent closed itemsets), association rules with constraints can be discovered in two steps. For each pair \((L,\,S)\in {\mathcal N}{\mathcal F}{\mathcal C}{\mathcal S}\,(s_{0} ,s_{1} ,c_{0} ,c_{1} )\), the rules \(r:L^{\prime }\rightarrow R^{\prime }\) in \({\mathcal A}{\mathcal R}(L,\,S)\) are non-repeatedly listed using the two derivation functions \({\mathcal F}{\mathcal S}(L)\), \({\mathcal F}{\mathcal S}(S\backslash L^{\prime })_{L^{\prime }}\) proposed by [9]. Then, we check whether the rules satisfy two constraints, \(L^{\prime }\subseteq L_{1} \) and \(R ^{\prime }\subseteq R_{1} \). The rules passed over by the check are retained. The corresponding algorithm is named \(PP\_MAR\_MaxSC\_{2}\). It is worth noting that there are many instances of constraints where the corresponding constrained rule set \({\mathcal A}{\mathcal R}{\mathcal S}_{\subseteq L_{1} ,\,\subseteq R_{1} } (s_{0} ,s_{1} ,c_{0} ,c_{1} )\) is empty, as well as many closed itemset pairs \((L,\,S)~\in {\mathcal N}{\mathcal F}{\mathcal C}{\mathcal S}(s_{0} ,s_{1} ,c_{0} ,c_{1} )\) in which the corresponding classes with constraints \({\mathcal A}{\mathcal R}_{\subseteq L_{1} ,\,\subseteq R_{1}} (L,\,S)\) are empty. Moreover, despite the fact that \(\varnothing \ne {\mathcal A}{\mathcal R}_{\subseteq L_{1} ,\,\subseteq R_{1} } (L,\,S)\,\,\subseteq {\mathcal A}{\mathcal R}(L,\,S)\), the size of \({\mathcal A}{\mathcal R}(L,\,S)\) might remain prohibitive and contain numerous redundant rules.
Example 1
(Illustrating the weakness of \(PP\_MAR\_MaxSC\_{2})\). In the remainder of the paper, we always consider the database \({\mathcal T}\) in Fig. 1a. Given \(s_{1} =5/7\), \(c_{0} =1/3\) and \(c_{1} =0.9\). For \(s_{0} =1/7\), \(Charm\_L\) and MinimalGenerators produce the lattice of frequent closed itemsets and their generators and support shown in Fig. 1b. We can see that \(PP\_MAR\_MaxSC\_{2}\) exploits \(|{\mathcal N}{\mathcal F}{\mathcal C}{\mathcal S}(s_{0} ,s_{1} ,c_{0} ,c_{1} )|=19\) rule classes and generates 302 rules in \({\mathcal A}{\mathcal R}{\mathcal S}(s_{0} ,s_{1} ,c_{0} ,c_{1} )\); however, it is unaware that these rules only satisfy the constraints of support and confidence.
-
(a)
For the maximum single constraints of \(L_{1} =c,R_{1} =i, {\mathcal A}{\mathcal R}{\mathcal S}_{\subseteq L_{1} ,\subseteq R_{1} } (s_{0} ,s_{1} ,c_{0} ,c_{1} )=\varnothing !\)
-
(b)
Given \(L_{1} =ceg,R_{1} =ai\), there are only two (per 19) rule classes with respect to two pairs, \(\left( {cegi,\,acegi} \right) \) and \(\left( {ci,\,aci} \right) ,\) that contain association rules satisfying the constraints, and their cardinality is only 12+2=14 (per 302)! Moreover, \({\mathcal A}{\mathcal R}\left( {cegi,\,ace\mathrm{g}i} \right) \) includes 45 rules but we can only retrieve 12 of these desired rules, \({\mathcal A}{\mathcal R}_{\subseteq L_{1} ,\,\subseteq R_{1} } \left( {cegi,\,acegi} \right) = \{ e\rightarrow a,e\rightarrow ai,ce\rightarrow a,ce\rightarrow ai,eg\rightarrow a,\,ge\rightarrow ai,ceg\rightarrow a,ceg\rightarrow ai,g\rightarrow a,\,g\rightarrow ai,cg\rightarrow a,cg\rightarrow ai \}!\)
-
(c)
For \(L_{1} =a,R_{1} =cfhi\), there are only two rule classes containing the rules with constraints \({\mathcal A}{\mathcal R}_{\subseteq L_{1} ,\,\subseteq R_{1} } (ai,acfhi)=\{a\rightarrow cf,a\rightarrow cfi,a\rightarrow cfh,a\rightarrow cfhi,a\rightarrow ch,a\rightarrow chi\}\) and \({\mathcal A}{\mathcal R}_{\subseteq L_{1} ,\,\subseteq R_{1} } (ai,afhi)=\{a\rightarrow f,a\rightarrow fi,a\rightarrow h,a\rightarrow hi,a\rightarrow fh,a\rightarrow fhi\}\).
To overcome the above shortcomings, we propose two necessary condition groups. The first group addresses the nonemptiness of \({\mathcal A}{\mathcal R}{\mathcal S}_{\subseteq L_{1} ,\,\subseteq R_{1} } (s_{0} ,s_{1} ,c_{0} ,c_{1} )\). The second group eliminates the specific pairs \((L,\,S)~\in {\mathcal N}{\mathcal F}{\mathcal C}{\mathcal S}(s_{0} ,s_{1} ,c_{0} ,c_{1} )\) for which the corresponding rule class \({\mathcal A}{\mathcal R}_{\subseteq L_{1} ,\,\subseteq R_{1} } (L,\,S)\) is empty. Next, we describe the rules of \({\mathcal A}{\mathcal R}_{\subseteq L_{1} ,\,\subseteq R_{1} } (L, S)\) (for pairs \(\left( {L,\,S} \right) \,\in {\mathcal N}{\mathcal F}{\mathcal C}{\mathcal S}\left( {s_{0} ,s_{1} ,c_{0} ,c_{1} } \right) \) that pass the above condition) via \({\mathcal A}{\mathcal R}_{\subseteq L_{1} ,\,\subseteq R_{1} }^+ (L,\,S)\). Based on this description, a smoother partition of \({\mathcal A}{\mathcal R}{\mathcal S}_{\subseteq L_{1} ,\,\subseteq R_{1} } (s_{0} ,s_{1} ,c_{0}, c_{1} )\) is proposed.
3.1.2 Necessary conditions for the nonemptiness of \({\mathcal A}{\mathcal R}{\mathcal S}_{\subseteq L_{1}, \subseteq R_{1} } (s_{0} ,s_{1} ,c_{0} ,c_{1} )\) and \({\mathcal A}{\mathcal R}_{\subseteq L_{1}, \subseteq R_{1} } (L, S)\)
We denote the following:
Proposition 2
(Necessary conditions for the nonemptiness of \({\mathcal A}{\mathcal R}{\mathcal S}_{\subseteq \mathrm{L}_{1} ,\,\subseteq \mathrm{R}_{1} } (\mathrm{s}_{0} ,\mathrm{s}_{1} ,\mathrm{c}_{0} ,\mathrm{c}_{1} ), {\mathcal A}{\mathcal R}_{\subseteq \mathrm{L}_{1} ,\,\subseteq \mathrm{R}_{1} } (\mathrm{L},\mathrm{S)}\), and a different representation of \({\mathcal A}{\mathcal R}_{\subseteq \mathrm{L}_{1} ,\,\subseteq \mathrm{R}_{1} } (\mathrm{L},\mathrm{S})\)).
-
(a)
If \(r:L^{\prime }\rightarrow R^{\prime }\,\in \,{\mathcal A}{\mathcal R}{\mathcal S}_{\subseteq L_{1} ,\,\subseteq R_{1} } (s_{0} ,s_{1} ,c_{0} ,c_{1} )\,\ne \varnothing \):
-
\((L, S)\,\in {\mathcal N}{\mathcal F}{\mathcal C}{\mathcal S}(s_{0} ,s_{1} ,c_{0} ,c_{1} ), r\,\in \,{\mathcal A}{\mathcal R}_{\subseteq L_{1} ,\,\subseteq R_{1} } (L, S)\,\ne \varnothing \) with \(L = h(L^{\prime }), S=h(L^{\prime }+R^{\prime })\) and
-
the following necessary conditions are satisfied:
$$\begin{aligned} s_{0}^{*} \le s_{1}^{*} , supp(S_{1}^{*})\le s_{1}^{*}.\qquad \qquad \qquad \quad \qquad \qquad \quad (H_{1}) \end{aligned}$$
Henceforth, we always assume that (H\(_{1}\)) is satisfied.
-
-
(b)
For each \((L, S)\,\in {\mathcal N}{\mathcal F}{\mathcal C}{\mathcal S}(s_{0} ,s_{1} ,c_{0} ,c_{1} )\),
-
For any \(r:L^{\prime }\rightarrow R^{\prime }\in {\mathcal A}{\mathcal R}_{\subseteq L_{1} ,\,\subseteq R_{1} } (L, S)\,\ne \varnothing \):
$$\begin{aligned}&S_{S_{1}^{*}} \in {\mathcal F}{\mathcal C}{\mathcal S}_{\subseteq S_{1}^{*} } (s_{0}^{*} ,s_{1}^{*} ), L_{C_{1} } \in {\mathcal F}{\mathcal C}{\mathcal S}_{\subseteq C_{1} } (s_{0}^{\prime } ,s_{1}^{\prime } ),\\&L^{\prime }\,\in {\mathcal F}{\mathcal S}_{\subseteq L_{C1} } , R^{\prime }\,\in \,{\mathcal F}{\mathcal S}(S_{S_{1}^{*}} \backslash L^{\prime })_{L^{\prime },\subseteq R_{1}^{*}} . \end{aligned}$$Then, \((L, S)\,\in \,{\mathcal N}{\mathcal F}{\mathcal C}{\mathcal S}_{\subseteq L_{1} ,\,\subseteq R_{1} } (s_{0} ,s_{1} ,c_{0} ,c_{1} )\,\ne \varnothing \,\) and \({\mathcal A}{\mathcal R}_{\subseteq L_{1} ,\,\subseteq R_{1} } (L, S)\subseteq {\mathcal A}{\mathcal R}_{\subseteq L_{1} ,\,\subseteq R_{1}}^{+} (L, S)\).
-
\({\mathcal A}{\mathcal R}_{\subseteq L_{1} ,\,\subseteq R_{1} } (L, S)\subseteq \,{\mathcal A}{\mathcal R}{\mathcal S}_{\subseteq L_{1} ,\,\subseteq R_{1} } (s_{0} ,s_{1} ,c_{0} ,c_{1} ).\)
-
-
(c)
For each \((L, S)\,\in \,{\mathcal N}{\mathcal F}{\mathcal C}{\mathcal S}_{\subseteq L_{1} ,\,\subseteq R_{1} } (s_{0} ,s_{1} ,c_{0} ,c_{1} )\,\ne \varnothing , \exists L^{\prime }\in {\mathcal F}{\mathcal S}_{\subseteq L_{C_{1} } }\) and
$$\begin{aligned} {\mathcal A}{\mathcal R}_{\subseteq L_{1} ,\,\subseteq R_{1} }^{+} (L, S) = {\mathcal A}{\mathcal R}_{\subseteq L_{1} ,\,\subseteq R_{1} } (L, S). \end{aligned}$$
Proof
-
(a)
If \(r:\,L^{\prime }\rightarrow R^{\prime }\in {\mathcal A}{\mathcal R}{\mathcal S}_{\subseteq L_{1} ,\,\subseteq R_{1} } (s_{0} ,s_{1} ,c_{0} ,c_{1} )\), we have \(L^{\prime },\,R^{\prime }\ne \varnothing \,,\,L^{\prime }\cap R^{\prime }=\varnothing ,\,S^{\prime }\equiv L^{\prime }+R^{\prime },\,L^{\prime }\subseteq L_{1} ,\,R^{\prime }\subseteq R_{1} \).
Denote \(L\,\equiv h(L^{\prime }), S\,\equiv h(L^{\prime }+R^{\prime })\). Because \(L^{\prime }\ne \varnothing \), \(\varnothing \,\ne L\subseteq S\) (if \(L=\varnothing , \varnothing \subset L^{\prime }\subseteq h(L^{\prime })\subseteq h(L)=\varnothing !\),), \(supp(S)\,=\,supp(S^{\prime }) \in ~[s_{0} ,\,s_{1} ], supp(S)/supp(L)\,=\,supp(S^{\prime })/supp(L^{\prime })\,=\,conf(r)~[c_{0} ,\,c_{1} ]\). Thus, \((L,\,S)~\in {\mathcal N}{\mathcal F}{\mathcal C}{\mathcal S}~(s_{0} ,s_{1} ,c_{0} ,c_{1} )\) and \(r\,\in {\mathcal A}{\mathcal R}_{\subseteq L_{1} ,\,\subseteq R_{1} } (L,\,S)\).
Moreover, \(S^{\prime }\subseteq S_{1}^{*} \), \(L^{\prime }\subseteq \,L\cap C_{1} \,=\,L_{C_{1} } \,\subseteq C_{1} , supp(C_{1} )\,\le \,supp(L^{\prime }), supp(C_{1} ).c_{0} \,\le \,supp(L^{\prime }).c_{0} ~supp(S^{\prime })\) and \(s_{0}^{*} \,\le supp(S^{\prime })=\,supp(S)\,\le s_{1}^{*} \). Hence, \(s_{0}^{*} \le s_{1}^{*} \) and \(supp(S_{1}^{*} )\,\le supp(S^{\prime })\le s_{1}^{*} \).
-
(b)
For every \((L,\,S)\,\in {\mathcal N}{\mathcal F}{\mathcal C}{\mathcal S}(s_{0} ,s_{1} ,c_{0} ,c_{1} ),\,\forall r:L^{\prime }\rightarrow R^{\prime }\in {\mathcal A}{\mathcal R}_{\subseteq L_{1} ,\,\subseteq R_{1} } (L,\,S)\), we have \(\varnothing \ne L\subseteq S,~L^{\prime },\,R^{\prime }\ne \varnothing \,,~L^{\prime }\cap R^{\prime }=\varnothing ,~h(L^{\prime })=L,~h(S^{\prime })=S,~s_{0} \,\le supp(S^{\prime })=supp(S)\,\le \,s_{1} ,~c_{0} \,\le \,supp(S)/supp(L)\le c_{1} \,\) and \(R^{\prime }\subseteq R_{1} ,\,L^{\prime }\subseteq L_{1} \).
It is easy to know that \(L^{\prime } \in FS_{\subseteq L_{C1} } \) as \(L^{\prime }\subseteq L_{C_{1} } \subseteq L\) and \(L\,=\,h(L^{\prime })\,=\,h(L_{C_{1} } )\). In addition, \(supp(S)/c_{1} \,\le supp(L^{\prime })=supp(L)\,\le supp(S)/c_{0} ,\,s_{0}^{\prime } \,\le supp(L^{\prime })~\le s_{1}^{\prime } ,~S\,\supseteq S_{S_{1}^{*} } \supseteq S^{\prime }\equiv L^{\prime }+R^{\prime }\) and \(supp(C_{1} ).c_{0} \,\le supp(L^{\prime }).c_{0} \,\le supp(S^{\prime })=supp(S)\). Then, \(s_{0}^{*} \le \,supp(S)\,\le s_{1}^{*} \) and \(h(S^{\prime })\,=\,h(S_{S_{1}^{*} } )\,=\,S\).
Take \(L_i \in \,{\mathcal G}(L^{\prime })\subseteq {\mathcal G}(L),\,S_k \in {\mathcal G}(S^{\prime })\,\subseteq {\mathcal G}(S)\) (as \({\mathcal G}(L^{\prime })\,\ne \varnothing \,,~{\mathcal G}(S^{\prime })\ne \varnothing ~\) [8]). We have \(L_i \subseteq L^{\prime }\subseteq C_{1} ,S_k \subseteq S^{\prime }\subseteq S_{1}^{*} \), i.e., \({\mathcal G}_{C_{1} } (L)\,\ne \varnothing \,,\,{\mathcal G}_{S_{1}^{*} } (S)\,\ne \varnothing \,,L_{C_{1} } \,\in {\mathcal F}{\mathcal C}{\mathcal S}_{\subseteq C_{1} } (s_{0}^{\prime } ,s_{1}^{\prime } )\), \(S_{S_{1}^{*} } \in {\mathcal F}{\mathcal C}{\mathcal S}_{\subseteq S_{1}^{*} } (s_{0}^{*} ,s_{1}^{*} )\) or \((L,\,S)\,\in {\mathcal N}{\mathcal F}{\mathcal C}{\mathcal S}_{\subseteq L_{1} ,\,\subseteq R_{1} } (s_{0} ,s_{1} ,c_{0} ,c_{1} )\).
Moreover, the fact that \(R^{\prime }=\,S^{\prime }\backslash L^{\prime }\subseteq \,(S\backslash L^{\prime })\cap R_{1} \,=\,R_{1}^{*} \) means that \(R^{\prime }{\in } {\mathcal F}{\mathcal S}(S_{S_{1}^{*} } \backslash L^{\prime })_{L^{^{\prime }},\subseteq R_{1}^{*} } ,\,r\,\in {\mathcal A}{\mathcal R}_{\subseteq L_{1} ,\,\subseteq R_{1} }^+ (L,\,S)\) and \({\mathcal A}{\mathcal R}_{\subseteq L_{1} ,\,\subseteq R_{1} } (L,\,S) \subseteq {\mathcal A}{\mathcal R}_{\subseteq L_{1} ,\,\subseteq R_{1} }^+ (L,\,S)\).
As \(c_{0} \le conf(r)=supp(S^{\prime })/supp(L^{\prime })=supp(S)/supp(L)\,\le c_{1} \), we have \(r\,\in {\mathcal A}{\mathcal R}{\mathcal S}_{\subseteq L_{1} ,\subseteq R_{1} } (s_{0} ,s_{1} ,c_{0} ,c_{1} )\). Therefore, \({\mathcal A}{\mathcal R}_{\subseteq L_{1} ,\,\subseteq R_{1} } (L,\,S)\subseteq {\mathcal A}{\mathcal R}{\mathcal S}_{\subseteq L_{1} ,~\subseteq R_{1} } (s_{0} ,s_{1},c_{0} ,c_{1} )\).
-
(c)
Indeed, for \((L,\,S)\in {\mathcal N}{\mathcal F}{\mathcal C}{\mathcal S}_{\subseteq L_{1} ,\,\subseteq R_{1} } (s_{0} ,s_{1} ,c_{0} ,c_{1} )\), we have \(L_{C_{1} } \in {\mathcal F}{\mathcal C}{\mathcal S}_{\subseteq C_{1} } (s_{0}^{\prime } ,s_{1}^{\prime } ), \exists L_i \in {\mathcal G}(\mathrm{L}): L_i \subseteq C_{1} \) with \(L^{\prime }\equiv L_i , \varnothing \subset L_i \subseteq L^{\prime }\). Then, \(L_i =L^{\prime }\subseteq L_{C_{1} } \,\subseteq L,\,L\,=\,h(L_i )\,=\,h(L^{\prime })\,=\,h(L_{C_{1} } )\). Hence, \(L^{\prime }\in {\mathcal F}{\mathcal S}_{\subseteq L_{C_{1} } } \ne \varnothing \).
As \((L,\,S)\,\in {\mathcal N}{\mathcal F}{\mathcal C}{\mathcal S}_{\subseteq L_{1} ,~\subseteq R_{1} } (s_{0} ,s_{1} ,c_{0} ,c_{1} ) \subseteq {\mathcal N}{\mathcal F}{\mathcal C}{\mathcal S}(s_{0}, s_{1} ,c_{0} ,c_{1} )\), statement (b) shows that \({\mathcal A}{\mathcal R}_{\subseteq L_{1},\,\subseteq R_{1} } (L,\,S)\subseteq {\mathcal A}{\mathcal R}_{\subseteq L_{1} ,\,\subseteq R_{1} }^+ (L,\,S)\). Thus, we must prove the reverse, i.e., \({\mathcal A}{\mathcal R}_{\subseteq L_{1} ,\,\subseteq R_{1} }^+ (L,\,S)\subseteq {\mathcal A}{\mathcal R}_{\subseteq L_{1} ,\,\subseteq R_{1} } (L,\,S)\).
In fact, because \(\forall r:L^{\prime }{\rightarrow }R^{\prime }\in {\mathcal A}{\mathcal R}_{\subseteq L_{1} ,\,\subseteq R_{1} }^+ (L,\,S), L^{\prime },\,R^{\prime }\ne \varnothing \,,~L^{\prime }\in {\mathcal F}{\mathcal S}_{\subseteq L_{C_{1} } } , R^{\prime }\in {\mathcal F}{\mathcal S}(S_{S_{1}^{*} } \backslash L^{\prime })_{L^{\prime },\subseteq R_{1}^{*} } \), we have \(L^{\prime }\subseteq L_{C_{1} } \subseteq C_{1} ,\,h(L^{\prime })\,=\,h(L_{C1} ),\,R^{\prime }\subseteq R_{1}^{*} \,=\,(S\cap R_{1} )\backslash L^{\prime }R_{1} ,~L^{\prime }\cap R^{\prime }=\,\varnothing ,~h(L^{\prime }+R^{\prime })\,=\,h(S_{S_{1}^{*} } )\). Because \(S_{S_{1}^{*} } \in {\mathcal F}{\mathcal C}{\mathcal S}_{\subseteq S_{1}^{*} } (s_{0}^{*} ,s_{1}^{*} ), L_{C_{1} } \in {\mathcal F}{\mathcal C}{\mathcal S}_{\subseteq C_{1} } (s_{0}^{\prime } ,s_{1}^{\prime } ), \exists L_{i}\in \mathcal {G}\mathrm{(L)}:L_{i}\subseteq C_{1}, \exists S_k \in {\mathcal G}(S):S_k \subseteq S_{1}^{*} \), i.e., \(L_i \subseteq L_{C_{1} } \subseteq \,L,~L=h(L_i )=h(L_{C_{1} } )=h(L^{\prime })\) and \(S_k \subseteq \,S_{S_{1}^{*} } \,\subseteq S,~S=h(S_k )=h(S_{S_{1}^{*} } )=h(S^{\prime })\). Then, \(\mathrm{r}\in {\mathcal A}{\mathcal R}_{\subseteq L_{1} ,\,\subseteq R_{1} } (L,\,S)\), i.e., \({\mathcal A}{\mathcal R}_{\subseteq L_{1} ,\,\subseteq R_{1} }^+ (L,\,S)\subseteq {\mathcal A}{\mathcal R}_{\subseteq L_{1} ,\,\subseteq R_{1} } (L,\,S)\ne {\varnothing }\). \(\square \)
Consequence 1 (The necessary and sufficient condition for the nonemptiness of \({\mathcal A}{\mathcal R}{\mathcal S}_{\subseteq L_{1} ,\,\subseteq R_{1} } (s_{0} ,s_{1} ,c_{0} ,c_{1} ))\).
-
(a)
If at least one condition of \((H_{1} )\) is violated, then \({\mathcal A}{\mathcal R}{\mathcal S}_{\subseteq L_{1} ,\,\subseteq R_{1} } (s_{0} ,s_{1} ,c_{0} ,c_{1} )=\varnothing \).
-
(b)
\(r{:}L^{\prime }{\rightarrow } R^{\prime }\, \in {\mathcal A}{\mathcal R}{\mathcal S}_{\subseteq L_{1} ,\,\subseteq R_{1} } (s_{0} ,s_{1} ,c_{0} ,c_{1} )\ne \varnothing \Leftrightarrow \) there exists \((L, S) \in {\mathcal N}{\mathcal F}{\mathcal C}{\mathcal S}_{\subseteq L_{1} ,\,\subseteq R_{1} } (s_{0} ,s_{1} ,c_{0} ,c_{1} ), L^{\prime }\,\in {\mathcal F}{\mathcal S}_{\subseteq L_{C_{1} } } , R^{\prime }\,\in {\mathcal F}{\mathcal S}(S_{S_{1}^{*} } \backslash L^{\prime })_{L^{\prime },\subseteq R_{1}^{*} }\), and \(r:L^{\prime }\rightarrow R^{\prime }\in {\mathcal A}{\mathcal R}_{\subseteq L_{1} ,\,\subseteq R_{1} }^+ (L, S)\,\ne \varnothing \) .
Based on Proposition 2 and Consequence 1, we thus have a partition of \({\mathcal A}{\mathcal R}{\mathcal S}_{\subseteq L_{1} ,\,\subseteq R_{1} } (s_{0} ,s_{1} ,c_{0} ,c_{1} )\) that is smoother than that in Proposition 1.
3.1.3 Smoothly partitioning the association rule set with maximum single constraints
Theorem 1
(Smoothly partitioning the constrained association rule set) Assuming that (H\(_{1}\)) is satisfied, we have
This partition establishes a foundation for independently mining each equivalence rule class \({\mathcal A}{\mathcal R}_{\subseteq L_{1} ,\,\subseteq R_{1} }^+ (L,\,S)\). Thus, it represents an original instance of using equivalence relations to obtain algorithms in parallel and distributed environments.
Example 2
(Illustrating the emptiness of \({\mathcal A}{\mathcal R}{\mathcal S}_{\subseteq L_{1} ,\,\subseteq R_{1} } (s_{0}, s_{1} ,c_{0} ,c_{1} )\) when at least one necessary condition given in (H\(_{1}\)) is not satisfied, and that of \({\mathcal A}{\mathcal R}_{\subseteq L_{1} ,\,\subseteq R_{1} } (L,\,S)\,\) when \((L, S)\notin \,{\mathcal N}{\mathcal F}{\mathcal C}{\mathcal S}_{\subseteq L_{1} ,~\subseteq R_{1} } (s_{0} ,s_{1} ,c_{0} ,c_{1} ))\).
-
(a)
In Example 1(a), \({\mathcal A}{\mathcal R}{\mathcal S}_{\subseteq L_{1} ,\,\subseteq R_{1} } (s_{0} ,s_{1} ,c_{0} ,c_{1} )= \varnothing \). Indeed, \(S_{1}^{*} =ci, C_{1} \,=\,c, s_{1}^{*} =5/7, supp(S_{1}^{*} )\,=\,6/7.\,\) As the condition \(supp(S_{1}^{*} )\,\le s_{1}^{*} \) is violated, we immediately conclude that \({\mathcal A}{\mathcal R}{\mathcal S}_{\subseteq L_{1} ,\,\subseteq R_{1} } (s_{0} ,s_{1} ,c_{0} ,c_{1} ) =\varnothing \) without generating \(|{{\mathcal A}{\mathcal R}{\mathcal S}(1/7,~5/7,~1/3,~0.9)} =302\) rules and then eliminating all of them.
-
(b)
For the constraints given in Example 1(b), we first find that most of the frequent closed itemset pairs (L, S) of \({\mathcal N}{\mathcal F}{\mathcal C}{\mathcal S}(s_{0} ,s_{1} ,c_{0} ,c_{1} )\) are redundant, i.e., they are not in \({\mathcal N}{\mathcal F}{\mathcal C}{\mathcal S}_{\subseteq L_{1} ,~\subseteq R_{1} } (s_{0} ,s_{1} ,c_{0} ,c_{1} )=\{\left( {cegi,\,acegi} \right) ,( ci,\,aci )\}\). For example, there are 17 redundant pairs (per 19) that contain no association rules satisfying the constraints. Let us consider pair \((L,\,S)=(cegi,\,bcegi)\) in \({\mathcal N}{\mathcal F}{\mathcal C}{\mathcal S}(s_{0} ,s_{1} ,c_{0} ,c_{1} )\). It is easy to see that \(S\,\in {\mathcal F}{\mathcal C}{\mathcal S}(s_{0}^{*} ,s_{1}^{*} )\) as \(s_{0}^{*} =1.33/7\,supp(S)=2/7\le \,s_{1}^{*} =5/7\). However, because \({\mathcal G}\)(S) = {b} and {b} \(\nsubseteq \) \( S_{S_{1}^{*} } =S\cap S_{1}^{*} =cegi\), the necessary condition \(S_{S_{1}^{*} } \in {\mathcal F}{\mathcal C}{\mathcal S}_{\subseteq S_{1}^{*} } (s_{0}^{*} ,s_{1}^{*} )\) is not satisfied, i.e., \((L,\,S)\,\notin {\mathcal N}{\mathcal F}{\mathcal C}{\mathcal S}_{\subseteq L_{1} ,\,\subseteq R_{1} } (s_{0} ,s_{1} ,c_{0} ,c_{1} ))\). Hence, we immediately conclude that \({\mathcal A}{\mathcal R}_{\subseteq L_{1} ,\,\subseteq R_{1} } (cegi,\,bcegi)=\varnothing \) without listing all the rules of \({\mathcal A}{\mathcal R}(cegi,\,bcegi)\). If the necessary condition tests are executed for all pairs of \({\mathcal N}{\mathcal F}{\mathcal C}{\mathcal S}_{\subseteq L_{1} ,\,\subseteq R_{1} } (s_{0} ,s_{1} ,c_{0} ,c_{1} )\), we can prune 17 rule classes, i.e., we avoid the generation of \(254(302-|{\mathcal A}{\mathcal R}(cegi,\,acegi)|-|{\mathcal A}{\mathcal R}(ci,\,aci)|=302-45-3)\) (\(302-|{\mathcal A}{\mathcal R}(cegi,\,acegi)|-|{\mathcal A}{\mathcal R}(ci,\,aci)|=302-45-3)\) corresponding redundant rule candidates. Observing the satisfied rule classes, we see that they retain many redundant candidates that are duplicates or were missed by the constraints. Through Example 3, we will show that all the redundant rule candidates can be completely pruned.
The \(MFCS\_FromLattice({\mathcal L}{\mathcal C}{\mathcal G}^\mathrm{s}, C_{1} , s_{0}^{\prime } , s_{1}^{\prime } )\) procedure shown in Fig. 2 finds the class \({\mathcal F}{\mathcal C}{\mathcal S}_{\subseteq C_{1} } (s_{0}^{\prime } ,s_{1}^{\prime } )\) of frequent closed itemsets satisfying the constraints from \({\mathcal L}{\mathcal C}{\mathcal G}^\mathrm{s}\)—the sub lattice of \({\mathcal L}{\mathcal C}{\mathcal G}\) with root S. To determine \({\mathcal F}{\mathcal C}{\mathcal S}_{\subseteq S_{1}^{*} } (s_{0}^{*} ,s_{1}^{*} )\), we call the procedure with the input parameters \({\mathcal L}{\mathcal C}{\mathcal G}, S_{1}^{*} , s_{0}^{*} \) and \(s_{1}^{*} \). More formally, \({\mathcal F}{\mathcal C}{\mathcal S}_{\subseteq S_{1}^{*} } (s_{0}^{*} ,s_{1}^{*} )=MFCS\_FromLattice({\mathcal L}{\mathcal C}{\mathcal G}, S_{1}^{*} , s_{0}^{*} , s_{1}^{*} )\). For example, if \(S=acfhi\), the sub lattice \({\mathcal L}{\mathcal C}{\mathcal G}^\mathrm{s}\) is drawn by the lines of red color in Fig. 1b.
It is important to note that for \((L,\,S)\,\in {\mathcal N}{\mathcal F}{\mathcal C}{\mathcal S}_{\subseteq L_{1} ,\subseteq R_{1} } (s_{0}, s_{1} ,c_{0} ,c_{1} ),\) the two sides of each rule \(r:L^{\prime }\rightarrow R^{\prime }\) of rule class \({\mathcal A}{\mathcal R}_{\subseteq L_{1} ,\,\subseteq R_{1} }^+ (L,\,S)\,=\,\{r:L^{\prime }\rightarrow R^{\prime }|\,L^{\prime }\in \,{\mathcal F}{\mathcal S}_{\subseteq L_{C1} } ,\,R^{\prime }\in \,{\mathcal F}{\mathcal S}(S_{S_{1}^{*} } \backslash L^{\prime })_{L{^{\prime }},\subseteq R_{1}^{*} } \}\,\) have not yet been explicitly represented, and their generation can contain numerous duplicates and redundant candidates.
3.2 Non-repeatedly producing all association rules satisfying the constraints in each class \({\mathcal A}{\mathcal R}_{\subseteq L_{1} ,\,\subseteq R_{1} }^{+}(L, S)\)
For \((L,\,S)\,\in {\mathcal N}{\mathcal F}{\mathcal C}{\mathcal S}_{\subseteq L_{1} ,\,\subseteq R_{1} } (s_{0} ,s_{1} ,c_{0} ,c_{1} )\), based on the two generator sets \({\mathcal G}(L)\) and \({\mathcal G}(S)\), we propose a unique, explicit representation for the constrained rules in\({\mathcal A}{\mathcal R}_{\subseteq L_{1} ,\,\subseteq R_{1} }^+ (L, S)\). This representation leads to the distinct and complete production of all the constrained rules in each rule class, which is described in the algorithm \(MAR\_MaxSC\_OneClass\).
3.2.1 The unique structure and representation of the equivalence class of frequent sub itemsets restricted on X with upper bound \(Z_{1} \)
The unique structure and representation of the equivalence class of frequent sub itemsets restricted on \(\mathrm{X}\) with upper bound \(\mathrm{Z}_{1} \) proposed in this section are used to make the unique structure and representation for the right-hand \(R^{\prime }\in {\mathcal F}{\mathcal S}(S\backslash L^{\prime })_{L^{\prime },\subseteq R_{1}^*} \) and left-hand sides \(L^{\prime }\in {\mathcal F}{\mathcal S}_{\subseteq L_{C_{1} } } \) of the rules \(r:L^{\prime }{\rightarrow }R^{\prime }\) in each class\({\mathcal A}{\mathcal R}_{\subseteq L_{1} ,\subseteq R_{1} }^+ (L,S)\).
For any \(X,~Y,~Z_{1} ~{\mathcal A}\),
(where \(Z_{1} \) is an upper bound, X is a restriction). Let us call,
(Note that if \(X\cap Y=\varnothing \) and \(Z_{1} \subseteq Y\), then the necessary condition to \({\mathcal F}{\mathcal S}(Y)_{X,\subseteq Z_{1}} \ne \varnothing \) is that \(Y \ne \varnothing ,\,{Z_1} \ne \varnothing ,X \cap {Z_1} = \varnothing \), with \(Y=\varnothing \) or or \(Z_{1} =\varnothing \) or \(X\cap Z_{1} \ne \varnothing ~\), we have \({\mathcal F}{\mathcal S}(Y)_{X,\subseteq Z_{1} } \equiv \varnothing \).)
.\(R_{\min } \equiv Minimal\{R_{k} \equiv S_{k} \backslash X,S_{k} \in \mathcal {G}(X+Y),R_{k} \subseteq Z_{1} \},R_{U}^{k} \equiv U_{R_{j}{\in R_{\min } ,j\le k}} R_j ,R_{U,k} \equiv \left\{ \begin{array}{ll} {R_U^{k-1} \backslash R_{k}} ,&{}\quad if\,k\ge 2 \\ {\varnothing },&{}\quad if\,k=1 \\ \end{array}\right. ,\) with \(R_{k} \in R_{\mathrm{min}} ,R_{-,k}\equiv Z_{1} \backslash R_U^k \), Then, we denote
Proposition 3
(Uniquely representing frequent itemsets in \({\mathcal F}{\mathcal S}(Y)_{X,\subseteq Z_{1} } \ne \varnothing \) by \({\mathcal F}{\mathcal S}^{{*}}(Y)_{X,\subseteq Z_{1} } \mathrm{})\). \(\forall X,Y,Z_{1} \subseteq {\mathcal A}:X\cap Y=\varnothing ,\varnothing \ne Z_{1} \subseteq Y:\)
-
(a)
The frequent itemsets in \({\mathcal F}{\mathcal S}^{{*}}(Y)_{X,\subseteq Z_{1}}\) are distinctly enumerated.
-
(b)
\({\mathcal F}{\mathcal S}(Y)_{X,\subseteq Z_{1}} = {\mathcal F}{\mathcal S}^{*}(Y)_{X,\subseteq Z_{1}} .\)
-
(c)
\({\mathcal F}{\mathcal S}^{*}(Y)_{X,\subseteq Z_{1}} \ne {\varnothing } \Leftrightarrow {\mathcal G}_{X+Z_{1}} (X+Y)\ne {\varnothing }\). (H\(_{3}\))
Proof
-
(a)
We establish this by the method of contradiction. Assume that there exist two identical sets \(R^{{\prime }1},R^{{\prime }{2}}\) in \({\mathcal F}{\mathcal S}^{{*}}(Y)_{X,\subseteq Z_{1}}\), so that \(R^{{\prime }1}=R^{{\prime }{2}}\), i.e.,\(\exists k_{2} >k_{1} \ge 1, R^{{\prime }j}\equiv Z_0 +R_{k_j } +R_{k_j }^{\prime }+R_{k_j}^\sim , R_{k_j } \in R_{\mathrm{min}} , R_{k_j }^{\prime }\subseteq R_{U,kj} , R_{k_j }^\sim \subseteq R_{-,k_j } , \forall j=1,2\). Then, \(R_{k_{1} } \subseteq R_{k_{1} } +R_{k_{1} }^{\prime } +R_{k_{1} }^\sim =R_{k_{2} } +R_{k_{2} }^{\prime } +R_{k_{2} }^\sim \) Because \(R_{k_{1} } \cap R_{k_{2} }^\sim \subseteq R_{k_{1} } \cap R_{-,k_{2} } \subseteq R_{k_{1} } \cap R_{-,k_{1} } =\varnothing \) and \(R_{k_{1} } , R_{k_{2}}\), are two different minimal elements of \(R_{\mathrm{min}} \), we have \(R_{k_{1}} \subset R_{k_{2} } +R_{k_{2}}^{\prime } \), which contradicts the selection of \({R_{k_{2}}^{\prime }}^{(*)}\).
-
(b)
. “\(\subseteq \)”: First, we consider the case that \(R_{\mathrm{min}} \ne {\varnothing }\).
For \(R^{\prime }{\mathcal F}{\mathcal S}(Y)_{X,\subseteq Z_{1} } ~\ne \varnothing \), \(R^{\prime }\subseteq Z_{1} ,~R^{\prime }\ne \varnothing \). As \(R^{\prime }\subseteq Z_{1} ~\subseteq Y,~Y\cap X = \varnothing \), we have \(R^{\prime }\cap X =\varnothing ~,~S^{\prime }\equiv X~+~R^{\prime },~h(S^{\prime })~=~h(X+Y),~X\subseteq S^{\prime } \subseteq ~X+Y\).
From \(S^{\prime }\ne {\varnothing }\), take \(S_{k} \in {\mathcal G}(S^{\prime })~\subseteq {\mathcal G}(X+Y)\) (see [8]), \(S_{k} \subseteq ~S^{\prime }\), we have \(R_{k} \equiv S_{k} \backslash X~\subseteq S^{\prime }\backslash X~=~R^{\prime }\subseteq Z_{1}\).
Let \(B\equiv \{R_{i} ~\equiv S_{i} \backslash X:~S_{i} ~\in {\mathcal G}(S^{\prime }),~R_{i} ~\subseteq Z_{1} \},~C~\equiv \{R_{i} \equiv S_{i} \backslash X:~S_{i} ~\in {\mathcal G}(X+Y),~R_{i} ~\subseteq Z_{1} \}\). Thus, \(R_{k} ~\in ~B\). Since B, C are finite and \(\varnothing \ne B \subseteq C\), there exists a minimal set \(R_{\mathrm{min},S^{\prime }}\equiv ~Minimal(B)\ne {\varnothing }, R_{\mathrm{min}} \equiv ~Minimal(C)\ne {\varnothing }\). Thus, we always acquire the minimum index k of sets \(R_{i} \) in \(R_{\mathrm{min},S^{\prime }} \quad ~Minimal(B)\).
On the contrary, assume that \(R_{k} ~\notin R_{\mathrm{min}} \). Then, \(\exists R_j ~\in R_{\mathrm{min}} :~R_j ~\subset ~R_{k} \), with \(R_j \equiv S_j \backslash X,~S_j ~\in {\mathcal G}(X+Y)\) and \(h(S_j )=h(X+Y),~S_j ~\subseteq X\cup S_j =X+~R_j ~\subseteq ~X+~R_{k} ~\subseteq ~X+R^{\prime }=S^{\prime }\subseteq ~X+Y,~h(X+Y)=h(S_j )=h(S^{\prime })\). Hence, \(S_j ~\in {\mathcal G}(S^{\prime }),~R_j ~\subseteq S^{\prime }\backslash X=R^{\prime }\subseteq Z_{1} ,~R_j ~\in B\cap R_{\mathrm{min}} \). We then have \(R_j \in R_{\mathrm{min},S^{\prime }} \) and \(R_j \quad \subset \quad R_{k} \quad \in \quad R_{\mathrm{min},S^{\prime }} \). This is impossible as the assumption is that \(R_{k} \) is the minimal set in B! Therefore, \(R_{k} \in ~R_{\mathrm{min}} \ne \varnothing \).
This implies that, if \(R_{\mathrm{min}} =\varnothing \), \({\mathcal F}{\mathcal S}(Y)_{X,\subseteq Z_{1} } = \varnothing {\mathcal F}{\mathcal S}^{{*}}(Y)_{X,\subseteq Z_{1} } \).
We also have \(S^{\prime }=~S_{k} ~+~S_{k} ^{{\prime }{\prime }} \), for \(S_{k}^{{\prime }{\prime }} ~\equiv S^{\prime }\backslash S_{k} \). It follows from \(S^{\prime }\supseteq ~X\) that \(S^{\prime }=~X+R_{k} +R_{k}^{\prime } +R_{k}^\sim ~=~X+R^{\prime }\), with \(R^{\prime }\equiv ~R_{k} +R_{k}^{\prime } +R_{k}^\sim ,~R_{k}\equiv S_{k} \backslash X~R_{\mathrm{min}} ,~R_{k}^{\prime } ~\equiv ~(S_{k}^{{\prime }{\prime }} \backslash X)\cap R_U^k ~=~[(S^{\prime }\backslash X)\backslash S_{k} ]\cap R_U^{k-1} ~\subseteq ~R_U^{k-1} \backslash S_{k} ~\subseteq R_U^{k-1} \backslash R_{k} ~\equiv R_{U,k} \) (as \(R_{k} \cap [(S^{\prime }\backslash X)\backslash S_{k} ]~\subseteq R_{k} \backslash S_{k} ~=~\varnothing ),~R_{k}^\sim ~\equiv ~(S_{k}^{ {\prime }{\prime }} \backslash X)\backslash R_U^k ~\subseteq ~(S^{\prime }\backslash X)\backslash R_U^k ~\subseteq Z_{1} \backslash R_U^k ~\equiv R_{-,k} \).
We now suppose that \(\exists R_j \equiv S_j \backslash X~R_{\mathrm{min}} :~1\le j~<k\) and \(R_j \subset R_{k} +R_{k}^{\prime } \). Thus, \(h(S_j )=h(X+Y),~R_j ~\subseteq Z_{1} ,~S_j ~\subseteq X\cup S_j =X+R_j ~\subseteq X+R_{k} +R_{k}^{\prime } ~\subseteq ~X+R^{\prime }\equiv S^{\prime }\subseteq X+Y,~h(X+Y)=h(S_j )=h(S^{\prime })\). Then, \(S_j \in {\mathcal G}(S^{\prime })\) and \(R_j ~\in B\cap R_{\mathrm{min}} \). Hence, \(R_j \in ~R_{\mathrm{min},S^{\prime }} \), i.e., \(j~<k\): a contradiction on how to choose index k! We can conclude that \(R^{\prime }\in {\mathcal F}{\mathcal S}^{{*}}(Y)_{X,\subseteq Z_{1} } \).
“\(\supseteq \)”: For any \(R^{\prime }\in {\mathcal F}{\mathcal S}^{{*}}(Y)_{X,\subseteq Z_{1} } \), \(R^{\prime }=R_{k} +R_{k}^{\prime } +R_{k}^\sim \) where \(R_{k} \equiv S_{k} \backslash X~\in R_{\mathrm{min}} \), \(S_{k} \in {\mathcal G}\left( {X+Y} \right) ,~h(S_{k} )=h(X+Y)\), \(R_{k} ~\subseteq ~Z_{1} ,~R^{\prime }\ne \varnothing \). Furthermore, \(R_{k}^{\prime } \subseteq R_{U,k} \subseteq Z_{1} \), \(R_{k}^\sim \subseteq R_{-,k} \subseteq Z_{1} \). Then, \(R^{\prime }\subseteq ~Z_{1} ~\subseteq ~Y\) and \(R^{\prime }\cap X=\varnothing \). Otherwise, because \(X+Y~\supseteq X+R^{\prime }\supseteq X+R_{k} ~=~X\cup S_{k} ~\supseteq ~S_{k} \), we have \(h(S_{k} )~=~h(X+R^{\prime })~=~h(X+Y)\). Hence, \(^{\prime } \quad {\mathcal F}{\mathcal S}(Y)_{X,\subseteq Z_{1} } \).
In fact, if \(S_{k} \backslash X~\subseteq Z_{1} \), \(S_{k} ~\subseteq ~X\cup S_{k} ~=~X+R_{k} ~\subseteq ~X+Z_{1} \) (as \(X\cap Z_{1} =\varnothing )\). In contrast, if \(S_{k} \subseteq ~X+Z_{1} \), \(S_{k} \backslash X\subseteq ~Z_{1} \).
+ If \({\mathcal F}{\mathcal S}^{{*}}(Y)_{X,\subseteq Z_{1} } \ne \varnothing \quad ~\), \(R_{\mathrm{min}} \ne \varnothing \quad ~\). We immediately have (c.2).
+ Suppose that (c.2) is true. Thus, \(R_{\mathrm{min}} \ne \varnothing \). For \(R^{{*}}~\equiv Z_{1} \), we have \(\varnothing \ne R^{{*}}~\subseteq ~Y\). Take an arbitrary \(R_{k} ~\equiv S_{k} \backslash X~\in ~R_{\mathrm{min}} :~S_{k} ~\in ~{\mathcal G}(X+Y),~S_{k} ~\subseteq ~X+Y,~R_{k} ~\subseteq Z_{1} =R^{{*}}\). Then, \(S_{k} ~\subseteq ~S_{k} \cup X=X+R_{k} ~\subseteq X+R^{{*}}~\subseteq ~X+Y.\) This implies that \(h(X+Y)~=~h(S_{k} )~=~h(X+R^{{*}})\). Hence, \(~\exists R^{{*}}~\in {\mathcal F}{\mathcal S}(Y)_{X,\subseteq Z_{1} } ={\mathcal F}{\mathcal S}^{{*}}(Y)_{X,\subseteq Z_{1} } ~\ne \varnothing \). \(\square \)
This concludes the proof of Proposition 3, which implies the following remark.
Remark 2
-
(a)
If \(R_{\mathrm{min}} \ne \varnothing \), \({\mathcal F}{\mathcal S}(Y)_{X,\subseteq Z_{1} } \ne \varnothing ~\equiv {\mathcal F}{\mathcal S}^{{*}}(Y)_{X,\subseteq Z_{1} } \).
-
(b)
\({\mathcal G}_{X+Z_{1} } (X+Y)~\ne \varnothing ~\Leftrightarrow \exists S_{k} ~\in ~{\mathcal G}(X+Y):~S_{k} ~\subseteq X+Z_{1} \Leftrightarrow R_{\mathrm{min}} \ne \varnothing \).
-
(c)
Let us consider \(\forall R^{\prime }\in {\mathcal F}{\mathcal S}^{{*}}(Y)_{X,\subseteq Z_{1} } \) such that \(R^{\prime }\equiv R_{k} +R_{k}^{\prime } +R_{k}^\sim \). If \(~\exists R^{\prime }=\varnothing \), then \(\exists S_{k} \in G(X+Y):~R_{k} ~\equiv S_{k} \backslash X~=\varnothing \). Therefore, \(S_{k} ~\subseteq ~X~\subseteq ~X+Y\) and \(h(X+Y)=h(X)=h(S_{k} )\). Furthermore, \(R_{\mathrm{min}} ~\equiv \{R_{1} \equiv \varnothing \}, ~R_{U,1} =~R_U^1 ~=\varnothing ,~R_{-,1} ~\equiv ~Z_{1} ~\ne \varnothing ~\) and \(R^{\prime }=R_{1}^\sim ~\subseteq ~R_{-,1} \). Hence, \(R^{\prime }\) is empty if \(h(X+Y)=h(X)\) and \(R_{\mathrm{min}} =~\{\varnothing ~\}\). Then, \(R^{\prime }\in {\mathcal F}{\mathcal S}^{{*}}(Y)_{X,\subseteq Z_{1} } \Leftrightarrow \varnothing ~\subset R^{\prime }\subseteq ~Z_{1} \ne \varnothing ~\).
For practical purposes, when computing \(R_{\mathrm{min}} \), we consider the following two cases:
-
If \(R_{\mathrm{min}} =~\{\varnothing ~\}\), then \(R_{U,1} =R_U^1 =\varnothing ,R_{-,1} \equiv Z_{1} ~\ne ~\varnothing \) and
$$\begin{aligned}&{\mathcal F}{\mathcal S}^{{*}}(Y)_{X,\subseteq Z_{1} } \equiv ~\{R^{\prime }|~\varnothing ~\ne R^{\prime }\equiv ~R_{1}^\sim ,~R_{1}^\sim ~\subseteq ~Z_{1} \}\\&\quad =~\{R^{\prime }|~\varnothing ~R^{\prime }\subseteq ~Z_{1} \} \end{aligned}$$ -
If \(R_{\mathrm{min}} =~\{\varnothing ~\}\):
$$\begin{aligned}&{\mathcal F}{\mathcal S}^{{*}}(Y)_{X,\subseteq Z_{1} } ~\equiv ~\{R^{\prime }\equiv ~R_{k} +R_{k}^{\prime } +R_{k}^\sim ~|~R_{k} \in R_{\mathrm{min}} ,\\&~R_{k}^{\prime } \subseteq R_{U,k} ,~R_{k}^\sim \subseteq R_{-,k} ,~(R_j \not \subset R_{k} +R_{k}^{\prime },\\&~\forall R_j \in R_{\mathrm{min}} :~1\le j<k)\}. \end{aligned}$$
It is worth noting that, in this case, we are not required to check whether \(R^{\prime }\ne \varnothing \) in generating \(R^{\prime }\) because it is always true.
-
(d)
(The advantage of \(^{(*)}\) for exponentially decreasing redundancy). Assume that we are currently forming the sets \(\mathrm{R}^{\prime }\). Starting with \(R_{k} \), we grow subsets \(R_{k}^{\prime } \subseteq R_{U,k} \) and then \(R_{k}^\sim \subseteq R_{-,k} \) to complement \(R^{\prime }\); if \(^{(*)}\) is incorrect, it is unnecessary to consider the approximately (\(2^{|R_{U,k} \backslash R_{k}^{\prime } |}-1)\) supersets \(R ^{{\prime }{\prime }}\) of \(R_{k}^{\prime } (R_{k}^{\prime } \subset ~R^{{\prime }{\prime }}\subseteq R_{U,k} )\) and add all (\(2^{|R_{-,k} |})\) subsets \(R_{k}^\sim \) of \(R_{-,k} \mathrm{to}\,R^{\prime }\). Essentially, we have eliminated approximately (\(2^{|R_{U,k} \backslash R_{k}^{\prime } |}-1)\).(\(2^{|R_{-,k} |})\) redundant subset candidates for \(R^{\prime }\). Next, we consider the remaining sets \(R_{k}^{{\prime }{\prime }} \subseteq R_{U,k} \) (such that \(R_{k}^{\prime }\not \subset ~R_{k}^{{\prime }{\prime }}\subseteq R_{U,k} )\) or the subsequent sets \(R_{k} \) in \(R_{\mathrm{min}} \). Using the necessary and sufficient condition (*), we can perfectly eliminate duplicates when generating the rules \(r:L^{\prime }\rightarrow R^{\prime }\) in each class \({\mathcal A}{\mathcal R}_{\subseteq L_{1} ,~\subseteq R_{1} } (L,~S)\) based solely on minimal sets or generators. Due to their low cardinality and size, the algorithms applied during this generation are fast and efficient.
-
(e)
(Modifying the computation of the upper bound sets \(R_{U,k} \) and \(R_{-,k} \) of \(R_{k}^{\prime } \) and \(R_{k}^\sim \), respectively). We can see that, for each \(k>1\), the operations of \(R_U^{k-1} =R_U^{k-2} \cup R_{k-1} \), \(R_{U,k} \equiv R_U^{k-1} \backslash R_{k} \)and \(R_{-,k} \equiv Z_{1} \backslash R_U^k \) must be executed on sets that are potentially non-disjoint. To conserve calculation time, it is important to observe that
$$\begin{aligned} R_{U,k}= & {} [(R_U^{k-2} \backslash R_{k-1} )+R_{k-1} ]\backslash R_{k}\\&=(R_{U,k-1} +R_{k-1} )\backslash R_{k} ,\\ R_{-,k}= & {} R_{-,k-1} \backslash R_{k} ,\forall k\ge 2~\mathrm{and}~R_{U,1}\\&\equiv \varnothing ,~R_{-,1} \equiv Z_{1} \backslash R_{1}. \end{aligned}$$In other words, \(R_{U,k} {=}\left\{ \begin{array}{ll} {(R_{U,k-1} {+}R_{k-1} )\backslash R_{k}} ,&{}{if\,\,k{\ge } 2} \\ {\varnothing }, &{} {if\,\,k{=}1} \\ \end{array}\right. \), \(R_{-,k} =\left\{ \begin{array}{ll} {R_{-,k-1} \backslash R_{k}}, &{}\quad {if\,\,k\ge 2} \\ {Z_{1} \backslash R_{1}} ,&{}\quad {if\,\,k=1} \\ \end{array}\right. \).
Thus, for each \(k~\ge 2\), we compute the disjoint union \(R_{U,k} =R_{U,k-1} +R_{k-1} \) where \(R_{U,k-1} ~\subseteq R_U^{k-2} \) and the difference \(R_{-,k} =R_{-,k-1} \backslash R_{k}\) where \(R_{-,k-1} ~\subseteq ~Z_{1} ,R_{k} ~\subseteq ~R_U^k \). It is readily apparent that this new calculation is faster than the old one.
For special values of \(Y,~X~\mathrm{and}~Z_{1} \) in \({\mathcal F}{\mathcal S}(Y)_{X,\subseteq Z_{1} } \), we have two structures, \({\mathcal F}{\mathcal S}_{\subseteq L_{C1} } \) and \({\mathcal F}{\mathcal S}(S_{S_{1}^{*} } \backslash L^{\prime })_{L^{\prime },\subseteq R_{1}^{*} } \).
3.2.2 Structure and unique representation of the \({\mathcal F}{\mathcal S}_{\subseteq L_{C1} } \) and \({\mathcal F}{\mathcal S}(S_{S_{1}^*} \backslash L^{\prime })_{L^{^{\prime }},\subseteq R_{1}^*} \) itemsets
Suppose that \((L,~S)~{\in } {\mathcal N}{\mathcal F}{\mathcal C}{\mathcal S}_{\subseteq L_{1} ,~\subseteq R_{1} } (s_0 ,s_{1} ,c_0 ,c_{1} ):S_{S_{1}^{*} }{\in } {\mathcal F}{\mathcal C}{\mathcal S}_{\subseteq S_{1}^{*} } (s_0^{*} ,s_{1}^{*} )\), \(~\varnothing {\ne }~L~\subseteq S\), \(L_{C_{1} } \in {\mathcal F}{\mathcal C}{\mathcal S}_{\subseteq C_{1} } (s_0^{\prime } ,s_{1}^{\prime } )\) and L’ \(\in {\mathcal F}{\mathcal S}_{C_0 \subseteq L_{C1} } \). As \(L_{C_{1} } \in {\mathcal F}{\mathcal C}{\mathcal S}_{\subseteq C_{1} } (s_0^{\prime } ,s_{1}^{\prime } )\), we have \({\mathcal G}_{C_{1} } (L)~\ne \varnothing ,~\exists L_{i} \in {\mathcal G}(L):\varnothing \subseteq L_{i} ~\subseteq ~L_{C_{1} } \) and \(L_{C_{1} } ~\ne \varnothing \).
Corollary 1
\(\forall ~L,~C_{1} ~\subseteq {\mathcal A}\), if \(L_{C_{1} } \equiv ~L\cap C_{1} ~\ne \varnothing \) and \({\mathcal G}_{C_{1} } (L)\ne \varnothing \), then \({\mathcal G}(L_{C_{1} } )={\mathcal G}_{C_{1} } (L)\).
Structure and unique representation of the itemsets of \({\mathcal F}{\mathcal S}_{\subseteq L_{C1} }\) For \(Y~\equiv L_{C_{1} } ,~X~\equiv \varnothing \) and \(Z_{1} =L_{C_{1} } \). As \(L_{C_{1} } \in {\mathcal F}{\mathcal C}{\mathcal S}_{\subseteq C_{1} } (s_0^{\prime } ,s_{1}^{\prime } )\), we know from Corollary 1 that \({\mathcal G}_{C_{1} } (L)~=~{\mathcal G}(L_{C_{1} } )~\ne \varnothing \) and \(\forall L_{i} \in {\mathcal G}_{C_{1} } (L):\varnothing \subset L_{i} \subseteq L_{C_{1} } \). Thus,
Based on the representation of \(R_{\mathrm{min}} \) in \({\mathcal F}{\mathcal S}^{{*}}(Y)_{X,\subseteq Z_{1} }, K_{\mathrm{min}} \equiv ~Minimal\{L_{i} ,~L_{i} \in {\mathcal G}_{C_{1} } (L)\}={\mathcal G}_{C_{1} } (L), L_U^i \equiv \cup _{L_{k} {\mathcal G}_{C_{1} } (L),k\le i} L_{k} \),\(L_{U,i} \left\{ \begin{array}{ll} {L_U^{i-1} \backslash L_{i}},&{} {if\,\,i\ge 2} \\ {\varnothing } ,&{} {if\,\,i=1} \\ \end{array}, L_{-,i} \equiv {L_{C_{1}}}/{L_U^i } \right. \) and
Because \({\mathcal G}_{C_{1} } (L)~\ne \varnothing \) and \(L_{C_{1} } \ne {\varnothing }\), it follows from Proposition 3(c) that \({\mathcal F}{\mathcal S}_{\subseteq L_{C_{1} } }^{*} \ne {\varnothing }\).
Structure and unique representation of the itemsets of \({\mathcal F}{\mathcal S}(S_{S_{1}^*} \backslash L^{\prime })_{L^{^{\prime }},\subseteq R_{1}^*} \) For \(Y~{\equiv } S_{S_{1}^{*} } \backslash L^{\prime },~X~{=}~L^{\prime },~Z_{1} {=}~R_{1}^{*} ~=~(S\cap R_{1} )\backslash L\mathrm{^{\prime }:}Z_{1} ~\subseteq Y\). Based on the fact that \(S_{S_{1}^{*} } \in ~{\mathcal F}{\mathcal C}{\mathcal S}_{\subseteq S_{1}^{*} } (s_0^{*} ,s_{1}^{*} )\) and Corollary 1, we have \({\mathcal G}\left( {S_{S_{1}^{*} } } \right) ={\mathcal G}_{S_{1}^{*} } (\mathrm{S})\ne \varnothing \). Then,
We denote \(~R_{\mathrm{min}} {\equiv } Minimal\{R_{k} ~{\equiv } S_{k} \backslash L^{\prime },~S_{k} ~{\in } {\mathcal G}_{S_{1}^{*} } (S),~R_{k} ~\subseteq R_{1}^{*} \}\), \(R_U^k \equiv \cup _{R_j\in R_{\mathrm{min}} ,j\le k} R_j , \), \(R_{U,k} \equiv \left\{ \begin{array}{ll} {R_U^{k-1} \backslash R_{k}} , &{} {ifk\ge 2} \\ {\varnothing } ,&{} {ifk=1} \\ \end{array} \right. \), \(R_{-,k} \equiv R_{1}^{*} \backslash (L^{\prime }+R_U^k )~=~(S\cap R_{1} )\backslash (L^{\prime }+R_U^k )\) and
By Proposition 3(c), \({\mathcal F}{\mathcal S}^{{*}}(S_{S_{1}^{*} } \backslash L^{\prime })_{L^{\prime },\subseteq R_{1}^{*} } \ne \varnothing \Leftrightarrow [{\mathcal G}_{L^{\prime }\cup R_{1} } (S)\ne \varnothing ~\mathrm{and}~(S\cap R_{1} )\backslash L^{\prime }\ne \varnothing ~]\). In fact, for \(\forall S_{k} \in {\mathcal G}(S)\), we have \(S_{k} ~\subseteq L^{\prime }+\left( {S\cap R_{1} } \right) \backslash L^{\prime }=\left( {S\cap R_{1} } \right) L^{\prime }=S~\cap ~\left( {L^{\prime }\cup R_{1} } \right) ~\Leftrightarrow ~S_{k} ~\subseteq ~L^{\prime }\cup R_{1} \) as \(S_{k} \subseteq S\).
The following is a consequence of Proposition 3.
Consequence 2 (Unique representation and distinct generation of the two sides of the rules in \({\mathcal A}{\mathcal R}_{\subseteq L_{1} ,~\subseteq R_{1} }^+ (L,~S))\). \(\forall (L,~S)~\in ~{\mathcal N}{\mathcal F}{\mathcal C}{\mathcal S}_{\subseteq L_{1} ,~\subseteq R_{1} } (s_0 ,s_{1} ,c_0 ,c_{1} ):\)
-
(a)
The itemsets in \({\mathcal F}{\mathcal S}^{*}(S_{S_{1}^*} \backslash L^{\prime })_{L^{^{\prime }},\subseteq R_{1}^*}\) and \({\mathcal F}{\mathcal S}_{\subseteq L_{C_{1} } }^*\) are generated distinctly.
-
(b)
\({\mathcal F}{\mathcal S}(S_{S_{1}^*} \backslash L^{\prime })_{L^{{\prime }},\subseteq R_{1}^*} ={\mathcal F}{\mathcal S}^{*}(S_{S_{1}^*} \backslash L^{\prime })_{L^{^{\prime }},\subseteq R_{1}^*} ,FS_{\subseteq L_{C_{1} } } ={\mathcal F}{\mathcal S}_{\subseteq L_{C_{1} } }^*.\)
-
(c)
\({\mathcal F}{\mathcal S}_{\subseteq L_{C_{1} } }^*\ne \varnothing .\)
-
(d)
\(\forall L^{\prime }{\mathcal F}{\mathcal S}_{\subseteq L_{C_{1} } }^*,\) then \({\mathcal F}{\mathcal S}^{*}(S_{S_{1}^*} \backslash L^{\prime })_{L{^{\prime }},\subseteq R_{1}^*} \ne \varnothing \Leftrightarrow [{\mathcal G}_{L^{^{\prime }}\cup R_{1} } (S)\ne \varnothing \,\,and\,\,(S\cap R_{1} )\backslash L^{\prime }\ne \varnothing ]\).
The general procedure \(MFS\_RestrictMaxSC(Y,~X,~Z_{1}, ~{\mathcal G}(X+Y))\) completely and distinctly generates the itemsets of \({\mathcal F}{\mathcal S}^{{*}}(Y)_{X,\subseteq Z_{1} } \)(shown in Fig. 3)
Based on Remark 2, we can add Lines 4–7 to the procedure. Furthermore, at Line 21, we do not check if \(\mathrm{R}_\mathrm{k} +\mathrm{R^{\prime }}_\mathrm{k} +R_{k}^\sim \ne \emptyset \) (because it is obvious). The special cases of this procedure produce the results shown in Table 1.
In each rule class, cases 1 and 2 are used to distinctly enumerate the left-hand sides L’ of \({\mathcal F}{\mathcal S}_{\subseteq L_{C_{1} } }^*\) and the right-hand sides \(R^{\prime }\)of \({\mathcal F}{\mathcal S}^{*}(S_{S_{1}^*} \backslash L^{\prime })_{L^{^{\prime }},\subseteq R_{1}^*} ,\) respectively. Moreover, cases 3 and 4 give us two efficient procedures for generating \(L^{\prime }{\mathcal F}{\mathcal S}(L)\) and \(R^{\prime }{\mathcal F}{\mathcal S}(S\backslash L^{\prime })_{L^{^{\prime }}} \) in each class \({\mathcal A}{\mathcal R}(L,S)\) (used in [9]). They are also used to generate the rules via the \(PP\_MAR\_MaxSC\_{2}\) post-processing approach as discussed in Sect. 3.1.1.
3.2.3 Structure and unique representation of rule class \({\mathcal A}{\mathcal R}_{\subseteq L_{1},\subseteq R_{1} }^+ (L,S)\)
For \(\forall (L,~S)~\in {\mathcal N}{\mathcal F}{\mathcal C}{\mathcal S}_{\subseteq L_{1} ,~\subseteq R_{1} } (s_0 ,s_{1} ,c_0 ,c_{1} )\), let us denote
The following consequence is deduced from Consequence 2.
Consequence 3 (Necessary and sufficient conditions for the nonemptiness of \({\mathcal A}{\mathcal R}_{\subseteq L_{1} ,~\subseteq R_{1} }^+ (L,~S)\) and its representation) \(\forall (L,~S)\in ~{\mathcal N}{\mathcal F}{\mathcal C}{\mathcal S}_{\subseteq L_{1} ,~\subseteq R_{1} } (s_0 ,s_{1} ,c_0 ,c_{1} )\):
-
(a)
The rules in \({\mathcal A}{\mathcal R}_{\subseteq L_{1} ,\subseteq R_{1} }^*(L,S)\) are enumerated non-repeatedly.
-
(b)
\({\mathcal A}{\mathcal R}_{\subseteq L_{1} ,\subseteq R_{1} }^+ (L,S)={\mathcal A}{\mathcal R}_{\subseteq L_{1} ,\subseteq R_{1} }^*(L,S).\)
-
(c)
\({\mathcal A}{\mathcal R}_{\subseteq L_{1} ,\subseteq R_{1} }^*(L,S)\ne \varnothing \Leftrightarrow Suff\_FS_{\subseteq L_{C_{1} } }^*(S,R_{1} )\ne \varnothing .\)
-
(d)
\({\mathcal A}{\mathcal R}_{\subseteq L_{1} ,\subseteq R_{1}}^*(L,S)=\sum _{L{\prime }\in Suff\_FS_{\subseteq L_{C_{1} } }^*(S,R_{1} )} \{r:L^{^{\prime }}\rightarrow R^{^{\prime }}:R^{^{\prime }}\in {\mathcal F}{\mathcal S}^{*}(S_{S_{1}^*} \backslash L^{\prime })_{L^{^{\prime }},\subseteq R_{1}^*} \}.\)
Remark 3
For \({\mathcal F}{\mathcal S}^{{*}}(S_{S_{1}^{*} } \backslash L^{\prime })_{L^{\prime },\subseteq R_{1}^{*} } \), if \(S\equiv L\in {\mathcal G}(L),\exists !L^{\prime }\equiv ~L~\in ~[L]\) Then, \(Z_{1} =(\mathrm{L}\cap R_{1} )\backslash \mathrm{L^{\prime }}=\varnothing \) and \({\mathcal F}{\mathcal S}^{{*}}(S_{S_{1}^{*} } \backslash L^{\prime })_{L^{\prime },\subseteq R_{1}^{*} } =\varnothing !\) Hence, when \(L\equiv S\), we always assume that \(L\notin {\mathcal G}(L)\).
The \(MAR\_MaxSC\_OneClass\) algorithm given in Fig. 4 distinctly generates all the association rules with the constraints \({\mathcal A}{\mathcal R}_{\subseteq L_{1} ,~\subseteq R_{1} }^{*} (L,~S)\) for each pair \((L,~S)~\in ~{\mathcal N}{\mathcal F}{\mathcal C}{\mathcal S}_{\subseteq L_{1} ,~\subseteq R_{1} } (s_0 ,s_{1} ,c_0 ,c_{1} )\).
Example 3
(Illustrating the advantage of distinctly generating all the rules in \({\mathcal A}{\mathcal R}_{\subseteq L_{1} ,~\subseteq R_{1} }^{*} (L,~S)\) by \(MAR\_MaxSC\_OneClass)\). Let us consider the mining of association rules with constraints on database \({\mathcal T}\) with \(s_0 =1/7,s_{1} =5/7,c_0 =1/3,c_0 =0.9\)
-
(a)
For the constraints \(L_{1} =ceg,~R_{1} =ai\), we have \(S_{1}^{*} =ceagi\), \(C_{1} =~ceg\), \(s_0^{*} =~1.33/7\), \(s_{1}^{*} ~=~5/7\) and \(supp(S_{1}^{*} )=2/7\). Then, \({\mathcal A}{\mathcal R}{\mathcal S}_{\subseteq L_{1} ,~\subseteq R_{1} } (s_0 ,s_{1} ,c_0 ,c_{1} )\ne \varnothing \) . Now, consider the rule class with respect to \((L,~S)=(cegi,~acegi)~\in {\mathcal N}{\mathcal F}{\mathcal C}{\mathcal S}(s_0 ,s_{1} ,c_0 ,c_{1} )\) in Example 1(b): \({\mathcal G}(\mathrm{L})=\{e,~g\},{\mathcal G}(S)=\{ae,~ag\},~supp(L)=4/7\) and \(supp(S)=2/7\). We have \(S_{S_{1}^{*} } \mathrm{=}ceagi\in {\mathcal F}{\mathcal C}{\mathcal S}_{\subseteq S_{1}^{*} } (s_0^{*} ,s_{1}^{*} )\) as \(S~\in {\mathcal F}{\mathcal C}{\mathcal S}(s_0^{*} ,s_{1}^{*} )\) and \({\mathcal G}_{S_{1}^{*} } (S)\equiv ~\{ae,~ag\}~\ne \varnothing \). Furthermore, because \(s_0^{\prime } =supp(S)/c_{1} =2.22/7\), \(s_{1}^{\prime } ~min(1;~supp(S)/c_0 )=6/7\), so \(L~\in {\mathcal F}{\mathcal C}{\mathcal S}(s_0^{\prime } ,s_{1}^{\prime } )\). In addition, \({\mathcal G}_{C_{1} } (L)=\{e,g\}~\ne \varnothing \). Then, \(L_{C_{1} } =ceg\in {\mathcal F}{\mathcal C}{\mathcal S}_{\subseteq C_{1} } (s_0^{\prime } ,s_{1}^{\prime } )\). First, we consider the formation of \({\mathcal F}{\mathcal S}_{\subseteq L_{C_{1} } }^{*} \)(Line 4). For \(L_{1} =e\), because \(L_U^1 =e,~L_{U,1} =~\varnothing \) and \(~L_{-,1} =cg\), we have the following left-hand sides: \(e+\varnothing +\varnothing ,~e+\varnothing +c,~e+\varnothing +g,~e+\varnothing +cg\). For \(L_{1} =g\), we have \(L_U^2 =eg\), \(L_{U,2} =e\) and \(L_{-,2} =c\). Hence, the \(MFS\_RestrictMaxSC\) procedure generates the new left-hand sides \(g+\varnothing +\varnothing ,~g+\varnothing +c\). Note that we did not generate \(g+e+\varnothing ,~g+e+c\) again as \(L_{1} ~\subset ~g+e\). Next, we concentrate on the generation of the right-hand sides in accordance with left-hand side e (at Line 5). We have \(R_{1}^{*} =ai\) and \({\mathcal G}_{S_{1}^{*} } (S)~=~\{ae,~ag\}\). Hence, \(R_{\mathrm{min}} =\{a\}\). For a, we have \(R_U^1 =a\), \(R_{U,1} =\varnothing \), \(R_{-,1} =ai\backslash (e+a)=i\). \(MFS\_RestrictMaxSC\) distinctly generates all the right-hand sides in \({\mathcal F}{\mathcal S}^{{*}}(S_{S_{1}^{*} } \backslash L^{\prime })_{L^{\prime },\subseteq R_{1}^{*} } \): a, ai. Thus, we receive two rules: \(e{\rightarrow }a,~e{\rightarrow }ai\). Continuing with the left-hand sides of ce, eg, ceg, g and gc, we receive ten additional rules of \({\mathcal A}{\mathcal R}_{\subseteq L_{1} ,~\subseteq R_{1} }^{*} (L,~S)\) without any duplicates (see Example 1(b) for the full results).
-
(b)
For the constraints \(L_{1} =a,~R_{1} =cfhi\), we have \(S_{1}^{*} =acfhi,C_{1} =a,s_0^{*} =1.66/7,s_{1}^{*} =5/7\mathrm{and}\,\,supp(S_{1}^{*} )=2/7\). Thus, \({\mathcal A}{\mathcal R}{\mathcal S}_{\subseteq L_{1} ,~\subseteq R_{1} } (s_0 ,s_{1} ,c_0 ,c_{1} )\ne \varnothing ~\). Let \((L,~S)~=~(ai,~acfhi)\in ~{\mathcal N}{\mathcal F}{\mathcal C}{\mathcal S}(s_0 ,s_{1} ,c_0 ,c_{1} )\) with \({\mathcal G}(L)=\{a\},{\mathcal G}(S)=\{cf,~ch\},~supp(L)=5/7\), and \(supp(S)=2/7\). We have \(S_{S_{1}^{*} } =acfhi\in FCS_{\subseteq S_{1}^{*} } (s_0^{*} ,s_{1}^{*} )\) because \(S\in {\mathcal F}{\mathcal C}{\mathcal S}(s_0^{*} ,s_{1}^{*} )\) and \({\mathcal G}_{S_{1}^{*} } (S)=\{cf,ch\}\ne \varnothing \). As \(s_0^{\prime } =supp(S)/c_{1} =2.11/7\) and \(s_{1}^{\prime }\equiv \min (1;supp(S)/{c_{0}}) = 6/7\), we have \(L\in ~{\mathcal F}{\mathcal C}{\mathcal S}(s_0^{\prime } ,s_0^1 )\). Furthermore, \({\mathcal G}_{C_{1} } (L)\) = {a}\(\ne \quad \varnothing \). Then, \(L_{C_{1} } =a\in {\mathcal F}{\mathcal C}{\mathcal S}_{\subseteq C_{1} } (s_0^{\prime } ,s_{1}^{\prime } )\) . For \(G_{1} =a\), because \(L_U^1 =\mathrm{a},L_{U,1} =\varnothing \) and \(L_{-,1} =\varnothing \), we have \({\mathcal F}{\mathcal S}_{\subseteq L_{C1} }^{*} =\{\mathrm{a}\}\). Next, we generate all the rules with the same left-hand side a (Line 5). We have \(R_{1}^{*} =~cfhi\) and \({\mathcal G}_{S_{1}^{*} } (S)=\{cf,~ch\}\). Thus, \(R_{\mathrm{min}} =\{cf,~ch\}\). For cf, we have \(R_U^1 =~cf,R_{U,1} =\varnothing \) and \( R_{\_,1} =cfhi\backslash (a+cf)=hi\). Next, \(MFS\_RestrictMaxSC\) generates the right-hand sides of \(cf+\varnothing +\varnothing ,~cf+\varnothing +i,~cf+\varnothing +h\) and \(cf+\varnothing +hi\) (in \({\mathcal F}{\mathcal S}^{*}(S_{S_{1}^*} \backslash L^{\prime })_{L^{^{\prime }},\subseteq R_{1}^*} )\) without any duplicates. Therefore, we have four rules: \(a\rightarrow cf,~a\rightarrow cfi,a\rightarrow cfh,a\rightarrow cfhi\). For ch, we have \(R_U^2 =cfh, R_{U,2} =f, R_{-,1} =cfhi\backslash (a+cfh)=i\). Then, \(MFS\_RestrictMaxSC\) derives \(ch+\varnothing +\varnothing ,~ch+\varnothing +i\). The right-hand sides of \(ch+f+\varnothing ,~ch+f+i\) are not generated again as \(cf~\subset ch+f\). Finally, we have \({\mathcal A}{\mathcal R}_{\subseteq L_{1} ,~\subseteq R_{1} }^{*} (L,~S) =\{a\rightarrow cf,~a\rightarrow cfi,~a\rightarrow cfh,~a\rightarrow cfhi,~a\rightarrow ch,~a\rightarrow chi\}\).
Example 3 implies that our algorithm solely derives the association rules satisfying the constraints of \({\mathcal A}{\mathcal R}_{\subseteq L_{1} ,~\subseteq R_{1} }^{*} (L,~S)\) without generating duplicates and redundant candidates in \({\mathcal A}{\mathcal R}(L,~S)\).
3.3 Completely and distinctly deriving all the association rules with the constraints of \({\mathcal A}{\mathcal R}{\mathcal S}_{\subseteq L_{1} ,~\subseteq R_{1} } (s_0 ,s_{1} ,c_0 ,c_{1} )\)
Theorem 2 follows from Theorem 1 and Proposition 3 as follows. Based on it, the \(MAR\_MaxSC\) algorithm is proposed (see Fig. 5) to efficiently mine the set of all association rules with maximum single constraints \({\mathcal A}{\mathcal R}{\mathcal S}_{\subseteq L_{1} ,~\subseteq R_{1} } (s_0 ,s_{1} ,c_0 ,c_{1} )\) from lattice \({\mathcal L}{\mathcal C}{\mathcal G}\).
Theorem 2
(Completely and distinctly deriving all the constrained association rules of \({\mathcal A}{\mathcal R}{\mathcal S}_{\subseteq L_{1} ,~\subseteq R_{1} } (s_0 ,s_{1} ,c_0 ,c_{1} ))\). Suppose that (H\(_{1})\) is satisfied. We have
4 Experimental results
We compare the performance of three methods for mining association rules with constraints as follows. The first method, \(PP\_MAR\_MaxSC\_1\), includes three phases: (a) using dEclat to mine frequent itemsets, (b) integrating the constraints into the \(Gen\_Rules\) [31] algorithm to generate rule candidates, and (c) post-processing to filter out the rules satisfying the constraints. The two remaining methods consist of first mining the lattice \({\mathcal L}{\mathcal C}{\mathcal G}\) of closed itemsets together with their generators with \(Charm\_L\) and MinimalGenerators and then executing the \(PP\_MAR\_M\mathrm{a}xSC\_2\) and \(MAR\_MaxSC\) algorithms, respectively. The source code (in C\(^{++})\) for dEclat, \(Charm\_L\) and MinimalGenerators can be downloaded from http://www.cs.rpi.edu/~zaki/wwwnew/pmwiki.php/Software/Software#patutils (converted to C\(^{\# })\). The \(PP\_MAR\_MaxSC\_2\) and \(MAR\_MaxSC\) algorithms are also coded in C\(^{\# }\). The experiments were carried out on an i5-2400 CPU 3.10 GHz @ 3.09 GHz PC with 3.16 GB of main memory. Four benchmark databases in the FIMDR (Frequent Itemset Mining Dataset Repository, http://fimi.cs.helsinki.fi/data/) were used in the experiments (Table 2).
We fixed the maximum support and confidence thresholds at 1 (as per tradition). For each database and given minimum support, we chose the set \({\mathcal A}^{F}\) of all frequent items. Ten pairs of maximum constraints (\({L_1 , R_1 }\)), were randomly retrieved from \({\mathcal A}^{F}\) of sizes \(\left| {L_1 } \right| =p_1 \% *\left| {{\mathcal A}^{F}} \right| \) and \(\left| {R_1 } \right| =p_1 \% *\left| {{\mathcal A}^{F}} \right| \). We set \(p_1 =30\% \) and \(p_2 =70\% \) for Connect, Pumsb and Chess, and \(p_1 =8\% \), \(p_2 =58\% \) for Mushroom (we achieved similar results for different values of \(p_1 ,p_2 )\). We executed the three methods on each database (DB) with two given minimum supports MS (%) and confidences MC (%) and noted the average running times of ten constraint pairs, \(T\_MaxSC\_1(DB,MS,MC)\), \(T\_MaxSC\_2(DB,MS,MC)\) and \(T\_MaxSC(DB,MS,MC)\), called \(T\_MaxSC\_1\), \(T\_MaxSC\_2\) and \(T\_MaxSC\). All three methods finished their executions on Mushroom, Chess and Pumsb; however, after 12-h running on Connect, \(PP\_MAR\_MaxSC\_1\) did not halt.
Figures 6, 7, 8 and 9 show the average running times of the three methods on several characteristic experiments.
Table 3 shows the \(RT_1 \), \(RT_2 \) running time ratios (average on different minimum confidences) of two post-processing methods compared with our method for each (DB, MS) pair. More concretely, for (Chess, 78), we have \(RT_1 =\mathop \sum \nolimits _{MC\in \{70,65,60,55,50,45,40\}} \frac{T\_MaxSC\_1(Chess,78,MC)}{T\_MaxSC(Chess,78,MC)}/7{=}245.2\) and \(RT_2 =\mathop \sum \nolimits _{MC\in \{70,65,60,55,50,45,40\}} \frac{T\_MaxSC\_2(Chess,78,MC)}{T\_Ma\mathrm{x}SC(Chess,78,MC)}/7=10.2\). Thus, our \(MAR\_MaxSC\) method is 245 and 10 faster than the post-processing methods using \(PP\_MAR\_MaxSC\_1\) and \(PP\_MAR\_MaxSC\_2,\) respectively.
The reason is as follows. Two post-processing methods (\(PP\_MAR\_M\mathrm{a}xSC\_1\) and \(PP\_MAR\_M\mathrm{a}xSC\_2)\) consume significant times to generate large amounts of rule candidates, however, most of them do not satisfy the maximum single constraints. Indeed, we find that the percent ratios of the numbers of redundant candidate rules to the total of all rules generated by \(PP\_MAR\_M\mathrm{a}xSC\_1\) and \(PP\_MAR\_M\mathrm{a}xSC\_2\) are both 99%, approximately, for all above experiments.
5 Conclusions and future work
Two serious problems encountered during the mining of association rules with maximum single constraints are that (1) their cardinality grows exponentially, and the known algorithms for mining them typically generate numerous redundancies and duplicates and (2) their constraints are frequently modified. We generate a solution to this problem with a mathematical approach. Starting with a lattice \({\mathcal L}{\mathcal C}{\mathcal G}\) of closed itemset and their generators, which are suitable for use with frequently modified constraints, we efficiently extract the corresponding frequent sub itemsets. Based on this lattice, a proposed suitable equivalence relation partitions the set of association rules with maximum single constraints into disjoint equivalence classes. We then use the closed itemsets and their generators (of generally low cardinality) to uniquely represent the rules in each class. After studying our results, we propose using the \(MAR\_MaxSC\) algorithm to distinctly and completely produce all the constrained rules in each rule class.
Our approach can be adapted to process on big data because it can be exploited in parallel and distributed environments. In the future, we will get bigger data sets to test the approach. As an interesting extension, we plan to adapt a Hadoop Map/Reduce framework. In addition, it is important for us to apply our approach to mining problems with additional types of constraints.
Notes
We write “if and only if” as simply “iff”.
References
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference on Very Large Data Bases, pp. 487–499 (1994)
Agrawal, R., Imielinski, T., Swami, N.: Mining association rules between sets of items in large datasets. In: Proceedings of the 1993 ACM SIGMOID, pp. 207–216 (1993)
Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A.I.: Fast discovery of association rules. In: Advances in Knowledge Discovery and Data Mining, AAAI Press, pp. 307–328 (1996)
Anh, T., Hai, D., Tin, T., Bac, L.: Efficient algorithms for mining frequent itemsets with constraint. In: Proceedings of the 3rd International Conference on Knowledge and Systems Engineering, pp. 19–25 (2011)
Anh, T., Hai, D., Tin, T., Bac, L.: Mining frequent itemsets with dualistic constraints. In: PRICAI 2012, LNAI, vol. 7458, pp. 807–813. Springer (2012)
Anh, T., Tin, T., Bac, L., Hai, D.: Mining Association Rules Restricted on Constraint. In Proceedings of the IEEE-RIVF International Conference on Computing and Communication Technologies 2012, pp. 51–56 (2012)
Anh T., Tin T., Bac L.: Structures of association rule set. In: LNAI, vol. 7197, Part II, pp. 361–370. Springer (2012)
Anh T., Tin T., Bac L.: An approach for mining concurrently closed itemsets and generators. In: Advanced Computational Methods for Knowledge Engineering, SCI, vol. 479, pp. 355–366. Springer (2013)
Anh, T., Tin, T., Bac, L.: An approach for mining association rules intersected with constraint itemsets. Adv. Intell. Syst. Comput. 245, 351–363 (2013b)
Bayardo, R.J. Jr.: Efficiently mining long patterns from databases. In: Proceedings of the ACM-SIGMOD 1998 International Conference on Management of Data, pp. 85–93 (1998)
Burdick, D., Calimlim, M., Gehrke, J.: MAFIA: a maximal frequent itemset algorithm for transactional databases. In: Proceedings of 2001 ICDE, pp. 443–452 (2001)
Bayardo, R.J., Agrawal, R., Gunopulos, D.: Constraint-based rule mining in large, dense databases. In: Data Mining and Knowledge Discovery, vol. 4, no. (2/3), pp. 217–240. Kluwer Academic Pub. (2000)
Cong, G., Liu, B.: Speed-up iterative frequent itemset mining with constraint changes. In: Proceedings of ICDM 2002, pp. 107–114 (2002)
Cristofor, L., Simovici, D.: Generating an informative cover for association rules. In: Proceedings of the IEEE International Conference on Data Mining 2002, pp. 597–600 (2002)
Das, A., Ng, W.-K., Woon, Y.-K.: Rapid association rule mining. In: Proceedings of 10th International conference on Information and knowledge management, pp. 474–481. ACM Press (2001)
Ganter, B., Wille, R., Franzke, C.: Formal concept analysis: mathematical foundations. Springer, New York (1997)
Grahne, G., Zhu, J.: High performance mining of maximal frequent itemsets. In: Proceedings of SIAM 2003 Workshop on High Performance Data Mining: Pervasive and Data Stream Mining (2003)
Goethals, B., Zaki, M.J.: Advances in frequent itemset mining implementations. In: Report on FIMI 2003, ACM SIGKDD Explorations Newsletter, vol. 6, no. 1, pp. 109–117 (2004)
Gouda, K., Zaki, M.J.: Genmax: an effcient algorithm for mining maximal frequent itemsets. Data Min. Knowl. Discov. 11(3), 223–242 (2005)
Han, J., Pei, J., Yin, Y., Mao, R.: Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min. Knowl. Discov. 8(1), 53–87 (2004)
Hai, D., Tin, T., Bay, V.: An efficient method for mining frequent itemsets with double constraints. Int. J. Eng. Appl. Artif. Intell. 27, 148–154 (2013)
Hai, D., Tin, T.: An efficient method for mining association rules based on minimum single constraints. Vietnam J. Comput. Sci. 2(2), 67–83 (2015)
Ho, B.: An approach to concept formation based on formal concept analysis. IEICE Trans. Inf. Syst. E78–D(5), 553–579 (1995)
Klemettinen, M., Mannila, H., Ronkainen, P., Toivonen, H., Verkamo, A.I.: Finding interesting rules from large sets of discovered association rules. In: Proceeding of the 3rd CIKM Conference, pp. 401–407 (1994)
Li, G., Hamilton, H.J.: Basic association rules. In: Proceedings of the 4th SIAM International Conference on Data Mining, pp. 166–177 (2004)
Lee, A.J., Lin, W.C., Wang, C.S.: Mining association rule with multi-dimensional constraints. J. Syst. Softw. 79(1), 79–92 (2006)
Mannila, H., Toivonen, H., Verkamo, I.A.: Efficient algorithms for discovering association rules. In: Workshop on Knowledge Discovery in Databases 1994, pp. 181–192 (1994)
Nguyen, R.T., Lakshmanan, V.S., Han, J., Pang, A.: Exploratory Mining and Pruning Optimizations of Constrained Association Rules. In: Proceedings of the 1998 ACM-SIG-MOD International Conference on the Management of Data, pp. 13–24 (1998)
Oded, M., Lior, R.: Data mining and knowledge discovery Handbook. Springer, New York (2010)
Park, J.S., Chen, M.S., Yu, P.S.: An effective hash based algorithm for mining association rules. In: Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data, pp. 175–186 (1995)
Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Efficient mining of association rules using closed itemset lattice. Inf. Syst. 24(1), 25–46 (1999)
Pasquier, N., Taouil, R., Bastide, Y., Stumme, G., Lakhal, L.: Generating a condensed representation for association rules. J. Intell. Inf. Syst. 24(1), 29–60 (2005)
Pei, J., Han, J., Lakshmanan, V.S.: Pushing convertible constraints in frequent itemset mining. Data Min. Knowl. Discov. 8(3), 227–252 (2004)
Szathmary, L., Valtchev, P., Napoli, A.: Efficient vertical mining of frequent closed itemsets and generators. In: Proceedigns of IDA 2009, pp. 393–404 (2009)
Srikant, R., Vu, Q., Agrawal, R.: Mining association rules with item constraints. In: Proceedings of KDD 1997, pp. 67–73 (1997)
Tin, T., Anh, T.: Structure of set of association rules based on concept lattice. In: Advances in Intelligent Information and Database Systems, SCI, vol. 283, pp. 217–227. Springer (2010)
Tin, T., Anh, T., Thong, T.: Structure of association rule set based on min-min basic rules. In: Proceedings of the International Conference on Computing and Communication Technologies 2010, pp. 83–88 (2010)
Wille, R.: Concept lattices and conceptual knowledge systems. Comput. Math. Appl. 23(6–9), 493–515 (1992)
Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W.: New algorithms for fast discovery of association rules. In: Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, pp. 283–286 (1997)
Zaki, M. J., Gouda, K.: Fast vertical mining using diffsets. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge discovery and Data Mining, pp. 326–335. ACM (2003)
Zaki, M.J.: Mining non-redundant association rules. Data Min. Knowl. Discov. 9(3), 223–248 (2004)
Zaki, M.J., Hsiao, C.J.: Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Trans. Knowl. Data Eng. 17(4), 462–478 (2005)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Tran, A., Truong, T. & Le, B. Efficiently mining association rules based on maximum single constraints. Vietnam J Comput Sci 4, 261–277 (2017). https://doi.org/10.1007/s40595-017-0096-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40595-017-0096-2