In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
The example analysis of the Apriori algorithm of association rules, aiming at this problem, this article introduces the corresponding analysis and solution in detail, hoping to help more partners who want to solve this problem to find a more simple and feasible method.
01
-
Background and basic Concepts of Association Rule Mining
In the dataset shown below, each row in the table represents a purchase list. Note that we only care about whether the record appears or not, not how many times a record has been purchased, such as buying ten cartons of milk.
The collection of all items in the data record is called the total itemset, which is in the table above:
S = {Milk, bread, diapers, beer, eggs, cola}
Association rules
It is an associated rule, and the form is defined like this: two disjoint non-empty sets X, Y, if there is
X- > Y, let's say that Xmuri-> Y is an association rule, for example, {beer}-- > {diaper} is an association rule.
The strength of association rules is described by support (support) and confidence (confidence).
Support degree
Support (XMY-> Y) = the number of simultaneous occurrences of items in set X and set Y in one record / the number of data records. For example: support ({beer}-> {diaper}) = the number of times beer and diapers appear at the same time / the number of data records = 3max 560%
Self-confidence
Confidence (XMY-> Y) = the number of simultaneous occurrences of items in set X and set Y in one record / the number of occurrences of set X. For example: confidence ({diaper}-> {beer}) = the number of times beer and diapers appear at the same time / the number of times diapers appear = 3 times 4 = 75%.
The higher the degree of support and confidence, the stronger the rules, and association rule mining is to mine rules that meet a certain intensity.
02
-
An exhaustive algorithm for Mining Association rules
Association rule mining
Given a transaction data set T, find out all the association rules in which support support > = min_support and confidence confidence > = min_confidence.
Exhaustive algorithm
Find out the required rule is to enumerate all the combinations of itemsets, and test whether each combination satisfies the condition. The time complexity of an itemset with n elements is O (2 ^ N). The total itemset S = {milk, bread, diapers, beer, eggs, cola} in the above table, the number of elements is 6, and the number of all combinations is 63.
For simplicity, it is known that the total itemset of a commodity number is: {1, 2, 3}, then all possible combinations are:
{1}, {2}
{1}, {3}
{2}, {3}
{1}, {2,3}
{2}, {1,3}
{3}, {1,2}
{1,2,3}
There are 7 items (2 ^ 3-1). Check the above combinations respectively and find out the association rules that meet the requirements of support and confidence in each combination.
For ordinary supermarkets, the number of itemsets of goods is also more than 10,000, so the algorithm of exponential time complexity can not solve the problem in an acceptable time.
How to quickly dig out the association rules that meet the conditions is the main problem to be solved in association mining.
03
-
Apriori algorithm for Optimization algorithm of Association rules Mining
Mining association rules is carried out in two steps:
1) generate frequent itemsets
In this stage, all the itemsets that meet the minimum support are found, and these itemsets are called frequent itemsets.
2) generate rules
The rules that satisfy the minimum confidence are generated on the basis of the frequent itemsets generated in the previous step, and the resulting rules are called strong rules.
The time spent on association rule mining is mainly in the first step: generating frequent itemsets. Because there are not many frequent itemsets found, 2) is less time-consuming than 1).
In order to reduce the generation time of frequent itemsets, some sets which can not be frequent itemsets should be eliminated as soon as possible. Apriori algorithm mainly reduces frequent itemsets through two laws.
Two laws
From high to low. If a collection is a frequent itemset, then all its subsets are frequent itemsets. Suppose that a set {A _ (A) B} is a frequent itemset, then its subsets {A}, {B} are frequent itemsets.
From low to high. If a collection is not a frequent itemset, none of its supersets are frequent itemsets. Suppose that the set {A} is not a frequent itemset, then any of its supersets, such as {A _ Magi B} and {A _ Magi B ~ C}, must not be a frequent itemset either.
What has practical application value is Article 2, the evolution from low-level frequent itemsets to high-level frequent itemsets. Just imagine, if the support of the second-level itemsets {A _ Magi B} is not greater than the threshold, that is, if it is not a frequent itemsets, how can the third-level {A _ Mague B} or higher be frequent itemsets? If it is, {A _ Magi B} must be a frequent itemset, doesn't it contradict the original condition?
First of all, the first-level candidate itemsets are counted, the candidate sets that do not meet the conditions are cleared, and the first-level itemsets that meet the conditions are obtained. on the basis of generating the first-level itemsets, the secondary itemsets are generated, and the secondary itemsets that meet the conditions are obtained. again, according to the idea of Law 2, for example, {milk, beer} is not a frequent itemset, so {milk, beer, diaper}, {milk, beer. Bread} is not a frequent itemset.
Apriori algorithm
It belongs to the candidate elimination algorithm, which is a process of generating a candidate set according to law 2, eliminating a candidate set that does not meet the conditions according to the preset of support and credibility, and looping until no more candidate sets are generated.
Pseudo code of the algorithm:
Public void Apriori () {
/ / obtain the original data record
Record = getRecord ()
/ / get the first candidate set
List candidateItemset = findFirstCandidate ()
/ / get the candidate set that candidateItemset meets the supported set
List lItemset = getSupportedItemset (candidateItemset)
/ / as long as you can continue to dig
While (endTag! = true) {
/ / get the next candidate set
List ckItemset = getNextCandidate (lItemset)
/ / get the candidate set that ckItemset meets the supported set
List lkItemset = getSupportedItemset (ckItemset)
/ / get frequent itemsets at this level
Save (IkItemset)
/ / Save data to prepare for the next iteration
LItemset = lkItemset
}
This paper summarizes the classical algorithm Apriori algorithm for mining association rules, which uses a law: if a set is not a frequent itemset, then all its supersets are not frequent itemsets. From bottom to top, mining frequent itemsets of all levels that meet the support and credibility thresholds.
This is the answer to the sample analysis question about the Apriori algorithm of association rules. I hope the above content can be of some help to you. If you still have a lot of doubts to be solved, you can follow the industry information channel to learn more about it.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.