Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to implement Apriori Association rules algorithm with Python Code

2025-03-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly introduces how to use Python code to achieve Apriori association rules algorithm related knowledge, the content is detailed and easy to understand, the operation is simple and fast, has a certain reference value, I believe that everyone after reading this article on how to use Python code to achieve Apriori association rules algorithm will have a harvest, let's take a look.

I. Overview of Association rules

It has been almost 30 years since Agrawal and others first put forward the concept of association rules in 1993. Today, all kinds of algorithms emerge one after another, this can be regarded as an antique, older than many people, and it is often an entry-level algorithm for data mining, but there is not much in-depth research, especially in the field of risk control. It has extremely important application potential and is an underestimated algorithm, which is rarely mentioned in public articles. I try to analyze them one by one, hoping to give you some enlightenment.

I have thought deeply and comprehensively, and conducted a large number of experiments. I feel that this topic can be talked about for three days and three nights. The world is changing, but the essence has not changed. All kinds of connections exist all the time, intentionally or unintentionally.

For example, if your girlfriend bows her head and plays with fingers + silence, there is a good chance that she will be angry, then this is the rule you summed up. I believe many people have heard of the example of beer and diapers, and the story goes like this: in a supermarket, people found a particularly interesting phenomenon that diapers and beer, two unrelated goods, were put together. But this strange move has greatly increased sales of diapers and beer. Why is there such a strange phenomenon? This interesting thing happened because American women bought diapers before their husbands came home, and then their husbands bought their favorite beer.

Many people only remember beer diapers and seldom think deeply, let's change a little bit, daily things, there are a lot of association rules?

2. Examples of application scenarios 1. Prediction of stock price rise and fall

Volume + high turnover rate-> high probability rise, historical data mining, if you find that volume + high turnover rate stocks are likely to rise, then mine the stocks that meet the conditions on the same day, and then buy the next day, lie down and earn.

2. Recommendations for videos, music, books, etc.

According to historical data, if there is a large-scale existence of some users to watch the play list is: mini Times-> Shanghai Fortress, then a new user will immediately recommend the Shanghai Fortress to the tiny Times, and there is a good chance that the Hulan account will also be watched. It's so dirty.

3. Taxi route prediction (considering time and space)

Mine the following rules based on a large amount of data

Morning: starting point-> destination company

Evening: starting point-> destination high-speed railway station

Weekend: starting Home-> destination Shopping Center

Then when you open the software every morning, the taxi-hailing software will recommend your company as the destination, which will greatly reduce the taxi-hailing time for users. As shown in the picture below, I entered the name of the community and immediately recommended three places to me, Hangzhou East Railway Station, because the usual taxi-hailing combination has the highest degree of support.

4. Automatic mining of risk control strategy

According to the historical title, summed up the law found that the commodity title contains the old driver + Baidu network disk-> porn risk is high, then encounter this title contains these two words, directly refused.

According to the historical behavior data, it is found that the silent user + non-permanent login + change password-> is likely to be stolen. If the new account meets these three conditions, the account will be frozen or verified immediately. The risk of account theft can be avoided.

According to the historical data, if it is found that user A + B logs in 10 seconds apart every day, it can be considered that there is an association between An and B, which may be the same batch of wool accounts controlled by the machine.

The automatic mining of risk control strategy is also the place that we should focus on and explain later.

Third, the three most important concepts

There are three core concepts of association rules that need to be understood: support, confidence and promotion. The following is an example of the most classic beer-diaper case to illustrate these three concepts, if the following is a list of items purchased by several customers:

1. Support

Support (Support): the ratio of the number of times a combination of goods appears to the total number of orders.

In this example, we can see that "milk" appears four times, so the support rating of "milk" in these five orders is 4 bank 5: 0.8.

Similarly, "milk + bread" appears three times, so the support rating of "milk + bread" in these five orders is 3max 50.6.

Is it very simple to understand this? you can use your hands to calculate the support of 'diaper + beer'.

2. Confidence

Confidence (Confidence): refers to the probability that when you buy item A, you will buy item B. in the subset containing A, the support of B, that is, the proportion of orders containing B.

Confidence level (milk → beer) = 3Accord 4beer 0.75, which represents how many more orders have bought beer in the order for which milk has been purchased, as shown in the table below.

Confidence (beer → milk) = 3tick 4 milk 0.75, which means that if you buy beer, what is the probability that you will buy milk?

Confidence level (beer → diaper) = 4 stroke 4 diaper 1.0, which indicates how likely you are to buy a diaper if you buy beer. The table below shows 100%.

From the above example, we can see that confidence is actually a conditional concept, that is, in the case of A, what is the probability of B happening. If you only know these two concepts, in many cases it is not enough, you need to use the concept of degree of promotion. For example, in the case of A, the probability of B is 80%, so whether AB has anything to do with it, not necessarily, the proportion of B in the market is 95%. The appearance of your A reduces the probability of B appearing.

3. Degree of improvement

Degree of improvement (Lift): when we do commodity recommendation or risk control strategy, we focus on the degree of improvement, because the degree of improvement represents the occurrence of A, the degree of increase in the probability of the occurrence of B.

Promotion (A → B) = confidence (A → B) / support (B)

So there are three possibilities for the degree of improvement:

Promotion degree (A → B) > 1: indicates improvement

Degree of improvement (A → B) = 1: indicates that there is no improvement or decrease.

Degree of lifting (A → B)

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report