Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use Association Mining algorithms Apriori and FP-Tree

2025-02-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

This article mainly explains "how to use association mining algorithms Apriori and FP-Tree". Interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Now let the editor take you to learn how to use the association mining algorithms Apriori and FP-Tree.

Both Apriori algorithm and FPTree algorithm are association rules mining algorithms in data mining, which deal with the simplest single-layer and single-dimensional Boolean association rules.

Apriori algorithm

Apriori algorithm is the most influential algorithm for mining frequent itemsets of Boolean association rules. Is based on the fact that the algorithm uses a priori knowledge of the properties of frequent itemsets. Apriori uses an iterative method called layer-by-layer search, and the k-itemset is used to explore (Know1)-itemsets. First, find out the set of frequent 1-itemsets. The collection is recorded as L1. L1 is used to find the set L2 of the frequent 2-itemset, while L2 is used to find L3, and so on until the frequent k-itemset cannot be found. Finding each Lk requires a database scan.

The idea of this algorithm is simply that if set I is not a frequent itemset, then all larger sets that contain set I cannot be frequent itemsets.

The original data of the algorithm is as follows:

TID

List of item_ID's

T100

T200

T300

T400

T500

T600

T700

T800

T900

I1,I2,I5

I2,I4

I2,I3

I1,I2,I4

I1,I3

I2,I3

I1,I3

I1,I2,I3,I5

I1,I2,I3

The basic process of the algorithm is shown in the following figure:

Firstly, all transactions are scanned to get 1-itemset C1, and frequent 1-itemsets are obtained by filtering out unconditional itemsets according to support requirements.

The following recursive operation is performed:

It is known that frequent k-itemsets (frequent 1-itemsets are known). According to the items in frequent k-itemsets, all possible Kitch1 _ itemsets are connected and pruned (if all subsets of k itemsets of the kitch1 _ itemsets can not meet the support condition, then the itemsets are pruned) to get itemsets, and then filter out the items in the itemset that do not meet the support conditions to get frequent kumb1-itemsets. If you get it,

If the itemset is empty, the algorithm ends.

Method of connection: assuming that all the items in the item set are arranged in the same order, then if the former Kmuri item in [I] and [j] are exactly the same, and the k item is different, then

[I] and [j] are connectable. For example, {I1Magi I2} and {I1Magi I3} in are connectable. After connecting, we get {I1Magi I2jec I3}, but {I1Magi I2} and {I2Magi I3} are not connectable, otherwise it will cause duplicates in the itemset.

Let me give you another example about pruning, for example, in the process of generation, the enumerated 3 _ itemsets include {I1memi2, I3, I5}, {I2, I3, and 4), but because {I3 direction I4} and {I4 # I5} do not appear in the, so {I2, I3, and {I2, respectively, I3, and 5} have been pruned.

Under the huge amount of data, the space-time complexity of Apriori algorithm can not be ignored.

Space complexity: if the quantity reaches the order of magnitude, then the candidate in will reach the order of magnitude.

Time complexity: the database needs to be scanned for each calculation.

FP-Tree algorithm

FPTree algorithm: completes the function of Apriori algorithm without generating candidates.

The basic data structure of the FPTree algorithm consists of a FP tree and an item header table, and each item points to where it appears in the tree through a node chain. The basic structure is as follows. It should be noted that the header table needs to be sorted by decreasing support, and the node with high support in FPTree can only be the ancestor of the node with low support.

In addition, I would like to explain several basic concepts in the FPTree algorithm:

FP-Tree: that is the tree above, which sorts the transaction data items in the transaction data table according to the support degree, inserts the data items in each transaction into a tree with NULL as the root node in descending order, and records the support of the node at each node.

Conditional pattern base: contains a collection of prefix paths that appear with suffix patterns in FP-Tree. That is, the collection of ancestor paths of all nodes of the same frequent item in the PF tree. For example, I3 appears in the FP tree for three times, and its ancestral paths are {I2 Magi I1 2 (frequency is 2)}, {I2 2} and {I1 2} respectively. The set of these three ancestral paths is the conditional pattern base of frequent item i3.

Conditional tree: a new FP-Tree formed by conditional pattern bases according to the construction principle of FP-Tree. For example, the condition tree of i3 in the above figure is:

1. Construct the item header table: scan the database to get the set F of frequent items and the support of each frequent item. Sort F in descending order of support, and mark it as L.

2. Construct the original FPTree: rearrange the frequent items of each thing in the database according to the order in L. And insert each frequent item of each thing into the FPTree with null as the root in the order after rearrangement. If the frequent item node already exists at the time of insertion, increase the frequent item node support by 1; if the node does not exist, create a node with support of 1 and link the node to the item header table.

3. Call FP-growth (Tree,null) to start mining. The pseudo code is as follows:

Procedure FP_growth (Tree, a)

If Tree contains a single path P then {

Each combination of nodes in for path P (recorded as b)

Generate the mode b U a whose support degree support = the minimum support degree of the node in b

} else {

For each an i in the head of Tree (scanned in order from low to high support) {

Produce a pattern b = an i U a with support degree support = an I. support

Construct the conditional pattern base of b, and then construct the conditional FP- tree Treeb of b.

If Treeb is not empty then

Call FP_growth (Treeb, b)

}

}

FP-growth is the core of the whole algorithm, just a few more words.

Input to the FP-growth function: tree refers to the original FPTree or the conditional FPTree,an of a pattern refers to the suffix of the pattern (a=NULL in the first call, and an is the pattern suffix in subsequent recursive calls)

The output of the FP-growth function: outputs all patterns and their support during a recursive call (for example, {I1meni2 ~ I3} has a support of 2). The pattern of the output result of each call to FP_growth must contain the pattern suffix input by the FP_growth function.

Let's simulate the execution of FP-growth.

1. In the first layer of FP-growth recursive call, the a=NULL before and after the pattern is actually the frequent 1-itemset.

2. For each frequent 1-item, recursively call FP-growth () to obtain multiple frequent itemsets.

Here are two examples to illustrate the execution of FP-growth.

1. The conditional pattern base of I5 is (I2I1pur1), (I2I1I3 1), and the conditional FP- tree constructed by I5 is as follows. FP-growth is then called recursively with the pattern suffix I5. This conditional FP- tree is single-path, enumerating all the combinations of {I2pur2Magi I1VRO 2Magi I3V1} directly in FP_growth, and then combining with the pattern suffix I5 to get all patterns with support > 2: {I2I5 2, I1I5 2, I2I1I5 2}.

2. The case of i5 is relatively simple, because the conditional FP- tree corresponding to i5 is single-path, so let's take a look at the slightly more complex case i3. The conditional pattern base of i3 is (I2I1 2), (I2I1 2), (I1V2). The conditional FP- tree is generated as shown in the figure on the left, and then recursively calls FP-growth with the pattern prefix I3. The conditional FP- tree of i3 is still a multipath tree. First, by combining the pattern suffix i3 and each item in the item header table of the conditional FP- tree, we get a set of patterns {I2 I3 4, I1 I3 4}, but this set of patterns are not all patterns with the suffix I3. You also need to call FP-growth recursively, with the pattern suffix {I1diem I3}, and the conditional pattern base of {I1diem I3} is {I2 2}, which generates a conditional FP- tree as shown in the following figure. This is a single-path conditional FP- tree. In FP_growth, take I2 and the pattern suffix {I1recoveri3} and get the pattern {I1I2i3pur2}. In theory, you should also calculate the pattern set with the pattern suffix {I2jini3}, but the conditional pattern base of {I2meni3} is empty and the recursive call ends. All modes with the final pattern suffix i3 with support greater than 2 are: {I2 I3VR 4, I1 I3VR 4, I1I2 I3RO 2}.

According to the FP-growth algorithm, the support degree > 2 frequent patterns are as follows:

Item

Conditional mode base

Conditional FP- tree

Generated frequent pattern

I5

I4

I3

I1

{(I2 I1VO1), (I2 I1 I3VO1)

{(I2 I1VR 1), (I2R 1)}

{(I2 I1VO2), (I2RUL2), (I1RAPUL2)}

{(I2VR 4)}

I2 I5:2, I1 I5:2, I2 I1 I5:2

I2 I4:2

I2 I3:4, I1 I3:4, I2 I1 I3:2

I2 I1:4

FP-growth algorithm is an order of magnitude faster than Apriori algorithm, and it also has an order of magnitude of optimization in terms of space complexity than Apriori algorithm. However, for massive data, the time and space complexity of FP-growth is still very high, and the improved methods that can be adopted include database partition, data sampling and so on.

At this point, I believe you have a deeper understanding of "how to use the association mining algorithms Apriori and FP-Tree". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report