How to use Python to analyze the Affinity of goods 04/19 Update SLTechnology News&Howtos

How to use Python to analyze the Affinity of goods

2025-04-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article will explain in detail how to use Python to analyze the affinity of goods. The editor thinks it is very practical, so I share it for you as a reference. I hope you can get something after reading this article.

01 introduction to data mining

Data mining aims to enable computers to make decisions based on existing data. Decisions can be to predict next year's sales, the number of people, block spam, and test the language of the site. So far, there have been many applications of data mining, even so many new application fields are constantly emerging.

Data mining involves knowledge of algorithms, optimal strategies, statistics, engineering and computer science. In addition, we will also use concepts or knowledge in linguistics, neuroscience, urban planning and other fields. If you want to give full play to the power of data mining, the algorithm must be necessary. (readers are recommended to swipe LeetCode here.)

Generally speaking, there are three basic steps in data mining: 1. Create a dataset. The dataset can directly reflect some real events. 2. Selection algorithm. Choose an appropriate algorithm to better deal with the data; 3, optimize the algorithm. Every data mining algorithm has parameters, either contained by the algorithm itself or added by the user, these parameters will affect the specific decision of the algorithm.

02 Affinity Analysis case

Now let's use an example to illustrate. I don't know if you find that when you go to the supermarket, the supermarket is basically divided into regions according to the types of goods, but there are exceptions to some things, and there are different kinds of goods next to one item. I don't know if you have found this phenomenon, and if you don't understand it. What I want to tell you here is that this arrangement also makes sense, and this truth is the affinity between the goods!

Pre-knowledge:

(1) defaultdict (int): initialized to 0 (2) defaultdict (float): initialized to 0.0 (3) defaultdict (str): initialized to''

The defaultdict (function_factory) here builds a dictionary-like object in which the value of keys is assigned by itself, but the type of values is an instance of the class of function_factory and has a default value. For example, default (int) creates an instance that any values in the dictionary object is int, and even if a key does not exist, d [key] has a default value, which is 0 for int ().

03 code implementation

Now move on to the code section:

Import numpy as np from collections import defaultdict dataset_filename = "affinity_dataset.txt" features = ["bread", "milk", "cheese", "apple", "banana"] # guess what this is for X = np.loadtxt (dataset_filename) print (X [: 5]) # print shopping information for the first five lines

Count the number of people who buy apples and bananas:

Num_apple_purchases = 0 # initialize a variable for sample in X: if sample [3] = = 1: num_apple_purchases+=1 print ("{0} people bought Apples" .format (num_apple_purchases)) num_banana_purchases = 0 for sample in X: if sample [4] = = 1: num_banana_purchases + = 1 print ("{0} people bought banana" .format (num_banana_purchases))

Now, in order to calculate the confidence and support of the rules, we can store the results in the form of a dictionary:

Valid_rules = defaultdict (int) invalid_rules = defaultdict (int) num_occurances = defaultdict (int) for sample in X: for premise in range (4): if sample [premise] = 0: continue num_ occurances [premise] + = 1 # when the customer has a purchase item, the value becomes 1 for conclusion in range (4): if premise = = conclusion: # visit the same key It is meaningless to skip continue if sample [conclusion] = = 1: valid_rules [(premise]) Conclusion)] + = 1 else: invalid_rules [(premise,conclusion)] + = 1

After we have all the necessary statistics, we will calculate the support and confidence of each rule. As mentioned earlier, support is the number of times a rule is implemented:

Support = valid_rules # confidence is calculated similarly, traversing each rule to calculate confidence = defaultdict (float) for premise,conclusion in valid_rules.keys (): rule = (premise,conclusion) confidence [rule] = valid_ rules [rule] / num_ rules [rule]

Declare a function and receive the following parameters: the feature index value, the support dictionary, the confidence dictionary and the feature list as the prerequisite and conclusion, respectively.

Def print_rule (premise,conclusion, support, confidence,features): premise_name = features [premise] conclusion_name = features [conclusion] print ("Rule:if a person buys {0} they will also buy {1}" .format (premise_name,conclusion_name)) print ("- Support: {0}" .format (support [(premise,conclusion)]) print ("- Confidence: {0v .3f}" .format (confidence [(premise)) Conclusion)]) premise = 1 conclusion = 3 features = ["bread", "milk", "cheese", "apple", "banana"] print_rule (premise,conclusion,support,confidence,features) from operator import itemgetter sorted_support = sorted (support.items (), key=itemgetter (1), reverse=True)

When the sorting is complete, you can output the top five rules with the highest support:

For index in range (5): print ("Rule # {0}" .format (index+1)) premise,conclusion = sorted_ support [index] [0] print_rule (premise,conclusion,support,confidence,features)

This is the end of this article on "how to use Python for affinity analysis of goods". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, please share it out for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.