3.3.2. The Construction of the Algorithm
Time series association rules mining with up-to-date patterns:
Input: A log database D with n transactions stored in the order of transaction time with equal time intervals; each of them includes the transaction ID, transaction time, and items. The time T, the minimum support threshold , the minimum UDP threshold min_UDP, and the minimum confidence threshold are also included.
Output: Rules mined from the time-series.
Step 1: Scan the database D to generate the candidate , and record the count value and the of item i in the log database.
Step 2: Complete the following substeps for the items in :
Substep 2.1: Calculate the Support of the item in .
Substep 2.2: If the Support of the item is more than , then put the item in . Otherwise, put the item in .
Step 3: For the items i in , complete the following substeps.
Substep 3.1: Set the as the first transaction ID in the of the item i, and verify if the item i satisfies Formula (10). If the item i satisfies Formula (10), then it will be retained in and then will be put in .
Substep 3.2: Set the as the next transaction ID in the of the item i; decrease the count of item i by one; and repeat this substep until is equal to zero. If is equal to zero and the item or itemset still cannot satisfy Formula (10), then it will be deleted from .
Step 4: Calculate the item or itemset as greater than or equal to or not. If so, save the item or itemset; else, delete it.
Step 5: Combine the set and the set to form . Set r = 1, where r is used to keep the current number of items in the itemset to be processed.
Step 6: Generate the candidate set from in a similar manner to the a priori algorithm; moreover, the order of items should be considered as we mentioned above.
Step 7: Generate the frequent -patterns from in a similar manner to STEPS 2 and 3.
Step 8: If the is null, proceed to the next step. Otherwise, jump to STEPS 5 and 6.
Step 9: Calculate the Confidence and Lift of the itemsets in the with Formulas (9) and (4). If the Confidence of the itemsets is greater than , then generate the rules in a manner similar to the a priori algorithm. Otherwise, delete the itemsets that cannot meet the requirement in .
Step 10: Output the association rules mined from the log database.
Note that in the above algorithm, transactions in the log database must be the time-series with equal intervals.
3.3.4. An Example
In this section, an example is given to illustrate the proposed TSARM-UDP algorithm.
Table 1 shows the log database used in the example. The database contains 10 transactions and six items, denoted from
a to
f.
Input: T = 3, = 0.5, = 0.1, = 0.4, log database D.
Output: Rules mined from D.
Step 1: Scan the database, and find the
and the
of item
i in
D. Take item
a as an example. It appears in Transactions 4, 5, and 8. Thus,
is three, and
is {4, 5, 8}. The result of STEP 1 is shown in
Table 2.
Step 2: Calculate the
TSupport in
Table 2 using Formula (8). Using item
b as an example, the count of
b is five. Thus, according to Formula (8), the
TSupport of
b is 0.5. The
given above is 0.5, so
b will be placed in
. The
TSupport of item
c is 0.3. This value is less than
, so it will be placed in
. The
TSupport calculation results are shown in
Table 3, namely
and
.
Step 3: For the items in , the following steps are performed. Items a and c are used as examples. For item a, , so . In addition, , , and . Substitute the above parameters into Formula (10). On the left side of the inequation is . On the right side of the inequation is . The results do not satisfy the inequation, so the algorithm jumps to Substep3.2. , and . Thus, the updated parameters are substituted for the inequation, and recalculate. The result still cannot satisfy the inequation. Repeat SUBSTEP 3.2. , and . Then, substitute the updated parameters into the inequation, and recalculate. The result still cannot satisfy the inequation. Repeat SUBSTEP 3.2. . Thus, delete item a from .
For item c, , so , , , and . The method of calculating item a above is used to calculate item c. The left side of the inequation is six, and the right side of the inequation is also six. Thus, the result satisfies the inequation, and c will remain in .
After calculating each item in , then delete the items that do not satisfy the inequation. The items that remain in are .
Step 4: Calculate the count of each item in , and delete the items that do not satisfy being equal to or greater than . Update .
Step 5: Combine set and set to form . Set .
Step 6: Generate the candidate set
from
through the method mentioned above, and the order of items should be considered.
is shown in
Table 4.
Step 7: Generate the frequent two-patterns in a way similar to STEPS 2 and 3. are null, and . Thus, .
Step 8: We can generate from , according to the method we mentioned in the previous article. We can get = , but each itemset in cannot satisfy the threshold and Formula (9). Thus, are null. The algorithm runs to STEP 8.
Step 9: In this step, we calculate the
TConfidence of itemsets in
by Formula (9). Taking itemsets
as an example:
, and
. According to Formula (9), the
TConfidence of itemsets
is equal to
. Then, we calculate the
Lift of itemsets,
, which is greater than one. Thus,
is valid. The
TConfidence and
Lift of each itemset are given in
Table 5.
As shown in
Table 5, two itemsets satisfy the
and
Lift requirement. The rule generation method is similar to the a priori algorithm, but needs to consider the order of items and the other steps. The generated rules are given below:
, with TConfidence = 3/7, Lift = 10/7
, with TConfidence = 4/7, Lift = 10/7
Step 9: Output the rules.