In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly introduces how to use Python code to achieve Apriori association rules algorithm related knowledge, the content is detailed and easy to understand, the operation is simple and fast, has a certain reference value, I believe that everyone after reading this article on how to use Python code to achieve Apriori association rules algorithm will have a harvest, let's take a look.
I. Overview of Association rules
It has been almost 30 years since Agrawal and others first put forward the concept of association rules in 1993. Today, all kinds of algorithms emerge one after another, this can be regarded as an antique, older than many people, and it is often an entry-level algorithm for data mining, but there is not much in-depth research, especially in the field of risk control. It has extremely important application potential and is an underestimated algorithm, which is rarely mentioned in public articles. I try to analyze them one by one, hoping to give you some enlightenment.
I have thought deeply and comprehensively, and conducted a large number of experiments. I feel that this topic can be talked about for three days and three nights. The world is changing, but the essence has not changed. All kinds of connections exist all the time, intentionally or unintentionally.
For example, if your girlfriend bows her head and plays with fingers + silence, there is a good chance that she will be angry, then this is the rule you summed up. I believe many people have heard of the example of beer and diapers, and the story goes like this: in a supermarket, people found a particularly interesting phenomenon that diapers and beer, two unrelated goods, were put together. But this strange move has greatly increased sales of diapers and beer. Why is there such a strange phenomenon? This interesting thing happened because American women bought diapers before their husbands came home, and then their husbands bought their favorite beer.
Many people only remember beer diapers and seldom think deeply, let's change a little bit, daily things, there are a lot of association rules?
2. Examples of application scenarios 1. Prediction of stock price rise and fall
Volume + high turnover rate-> high probability rise, historical data mining, if you find that volume + high turnover rate stocks are likely to rise, then mine the stocks that meet the conditions on the same day, and then buy the next day, lie down and earn.
2. Recommendations for videos, music, books, etc.
According to historical data, if there is a large-scale existence of some users to watch the play list is: mini Times-> Shanghai Fortress, then a new user will immediately recommend the Shanghai Fortress to the tiny Times, and there is a good chance that the Hulan account will also be watched. It's so dirty.
3. Taxi route prediction (considering time and space)
Mine the following rules based on a large amount of data
Morning: starting point-> destination company
Evening: starting point-> destination high-speed railway station
Weekend: starting Home-> destination Shopping Center
Then when you open the software every morning, the taxi-hailing software will recommend your company as the destination, which will greatly reduce the taxi-hailing time for users. As shown in the picture below, I entered the name of the community and immediately recommended three places to me, Hangzhou East Railway Station, because the usual taxi-hailing combination has the highest degree of support.
4. Automatic mining of risk control strategy
According to the historical title, summed up the law found that the commodity title contains the old driver + Baidu network disk-> porn risk is high, then encounter this title contains these two words, directly refused.
According to the historical behavior data, it is found that the silent user + non-permanent login + change password-> is likely to be stolen. If the new account meets these three conditions, the account will be frozen or verified immediately. The risk of account theft can be avoided.
According to the historical data, if it is found that user A + B logs in 10 seconds apart every day, it can be considered that there is an association between An and B, which may be the same batch of wool accounts controlled by the machine.
The automatic mining of risk control strategy is also the place that we should focus on and explain later.
Third, the three most important concepts
There are three core concepts of association rules that need to be understood: support, confidence and promotion. The following is an example of the most classic beer-diaper case to illustrate these three concepts, if the following is a list of items purchased by several customers:
1. Support
Support (Support): the ratio of the number of times a combination of goods appears to the total number of orders.
In this example, we can see that "milk" appears four times, so the support rating of "milk" in these five orders is 4 bank 5: 0.8.
Similarly, "milk + bread" appears three times, so the support rating of "milk + bread" in these five orders is 3max 50.6.
Is it very simple to understand this? you can use your hands to calculate the support of 'diaper + beer'.
2. Confidence
Confidence (Confidence): refers to the probability that when you buy item A, you will buy item B. in the subset containing A, the support of B, that is, the proportion of orders containing B.
Confidence level (milk → beer) = 3Accord 4beer 0.75, which represents how many more orders have bought beer in the order for which milk has been purchased, as shown in the table below.
Confidence (beer → milk) = 3tick 4 milk 0.75, which means that if you buy beer, what is the probability that you will buy milk?
Confidence level (beer → diaper) = 4 stroke 4 diaper 1.0, which indicates how likely you are to buy a diaper if you buy beer. The table below shows 100%.
From the above example, we can see that confidence is actually a conditional concept, that is, in the case of A, what is the probability of B happening. If you only know these two concepts, in many cases it is not enough, you need to use the concept of degree of promotion. For example, in the case of A, the probability of B is 80%, so whether AB has anything to do with it, not necessarily, the proportion of B in the market is 95%. The appearance of your A reduces the probability of B appearing.
3. Degree of improvement
Degree of improvement (Lift): when we do commodity recommendation or risk control strategy, we focus on the degree of improvement, because the degree of improvement represents the occurrence of A, the degree of increase in the probability of the occurrence of B.
Promotion (A → B) = confidence (A → B) / support (B)
So there are three possibilities for the degree of improvement:
Promotion degree (A → B) > 1: indicates improvement
Degree of improvement (A → B) = 1: indicates that there is no improvement or decrease.
Degree of lifting (A → B)
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.