Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use Python to analyze shopping data

2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly explains "how to use Python to analyze shopping data". Interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn how to use Python to analyze shopping data.

1. Analytical thinking

In fact, in terms of today's data, what we mainly do is exploratory analysis; first of all, sort out the existing fields, including title (extracted category), price, sales volume, store name, delivery place. Let's do a detailed dimension split and visual graphic selection:

Category:

What are the TOP 10 for category sales? (table or horizontal bar chart)

Popular (appear the most) category display; (word cloud)

Price: the distribution of the price range of the New year goods; (circle chart, observed proportion)

Sales volume, shop name:

What are the top-selling TOP 10 in the store? (bar chart)

Make a linkage with the category, such as ordering nuts, corresponding to the store showing the sales ranking; (linkage, using tripartite tools)

Place of shipment: which cities have the highest sales? (map)

two。 Crawl data

Crawling mainly uses selenium to simulate clicking on the browser, provided that selenium and browser driver have been installed. Here I use the Google browser. Find the corresponding version number and download the corresponding version driver, which must correspond to the browser version number.

Pip install selenium

After the installation is successful, run the following code, enter the keyword "New year goods", scan the code, and wait for the program to collect slowly.

# coding=utf8import refrom selenium.webdriver.chrome.options import Optionsfrom selenium import webdriverimport timeimport csv# searches for goods and gets the product page number def search_product (key_word): # location input box browser.find_element_by_id ("Q"). Send_keys (key_word) # define the click button And click browser.find_element_by_class_name ('btn-search'). Click () # maximize window: in order for us to scan browser.maximize_window () # wait 15 seconds, give us enough time to scan time.sleep (15) # to locate this "page number" Get the text "page_info = browser.find_element_by_xpath ('/ / div [@ class=" total "]') .text # note that findall () returns a list, although there is only one element that is also a list at this time. Page = re.findall ("(\ d +)" Page_info) [0] return page# get data def get_data (): # through page analysis, it is found that all the information is under the items node items = browser.find_elements_by_xpath ('/ / div [@ class= "items"] / div [@ class= "item J_MouserOnverReq"]') for item in items: # Parameter information pro_desc = item.find_element_by _ xpath ('. / / div [@ class= "row row-2 title"] / a'). Text # Price pro_price = item.find_element_by_xpath ('. / / strong'). Text # number of payments buy_num = item.find_element_by_xpath ('. / / div [@ class= "deal-cnt"]'). Text # flagship store shop = item.find_element_by_xpath ('. / / div [@ class= "shop"] / a') .text # place of shipment address = item.find_element_by_xpath ('. / / div [@ class= "location"]') .text # print (pro_desc Pro_price, buy_num, shop, address) with open ('{} .csv '.format (key_word), mode='a', newline='', encoding='utf-8-sig') as f: csv_writer = csv.writer (f, delimiter=',') csv_writer.writerow ([pro_desc, pro_price, buy_num, shop Address]) def main (): browser.get ('https://www.taobao.com/') page = search_product (key_word) print (page) get_data () page_num = 1 while int (page)! = page_num: print ("*" * 100) print ("crawling page {}" .format (page_num + 1)) browser.get (' Https://s.taobao.com/search?q={}&s={}'.format(key_word, Page_num * 44) browser.implicitly_wait (25) get_data () page_num + = 1 print ("data crawling complete!") If _ _ name__ = ='_ main__': key_word = input ("Please enter the item you want to search:") option = Options () browser = webdriver.Chrome (chrome_options=option, executable_path=r "C:\ Users\ cherich\ AppData\ Local\ Google\ Chrome\ Application\ chromedriver.exe") main ()

The collection results are as follows:

Data preparation is complete, and the process of extracting categories from the title is time-consuming. It is recommended that you directly use the sorted data.

The general idea is to segment the title, identify named entities, mark nouns, and find category names, such as nuts, tea, etc.

3. Data cleaning

The file cleaning here is almost done with Excel, the data set is small, and the efficiency of using Excel is very high, for example, there is a price range here. By now, data cleaning has been completed (you can use tripartite tools to do visualization), if you like to toss around, you can go on to see how to analyze with Python.

4. Visual analysis

1. Read the file

Import pandas as pdimport matplotlib as mplmpl.rcParams ['font.family'] =' SimHei'from wordcloud import WordCloudfrom ast import literal_evalimport matplotlib.pyplot as pltdatas = pd.read_csv ('. / New year. Csv',encoding='gbk') datas

2. Visualization: word cloud map

Li = [] for each in datas ['keyword']. Values: new_list = str (each) .split (',') li.extend (new_list) def func_pd (words): count_result = pd.Series (words). Value_counts () return count_result.to_dict () frequencies = func_pd (li) frequencies.pop ('other') plt.figure (figsize = (10J 4), dpi=80) wordcloud = WordCloud (font_path= "STSONG.TTF", background_color='white') Width=700,height=350) .fit_words (frequencies) plt.imshow (wordcloud) plt.axis ("off") plt.show ()

The chart shows that we can see the word cloud map, and the font of the popular category is the largest, followed by nuts, tea, pastries and so on.

3. Visualization: drawing a circle chart

# plt.pie food_type = datas.groupby ('price range'). Size () plt.figure (figsize= (8), dpi=80) explodes= [size= 0.3plt.pie (food_type, radius=1,labels=food_type.index, autopct='%.2f%%', colors= ['# F4A460]] size= 0.3plt.pie (food_type, radius=1,labels=food_type.index, autopct='%.2f%%', colors= ['# F4A460]], wedgeprops=dict (width=size) Edgecolor='w')) plt.title ('percentage of annual price range', fontsize=18) plt.legend (food_type.index,bbox_to_anchor= (1.5,1.0)) plt.show ()

The chart shows that the circle chart is similar to the pie chart, and the proportion of the representative part relative to the whole is about 33%. You can see that the annual goods of 0 ~ 200 yuan are about 33%, and 100 ~ 200 yuan is also 33%. It shows that the price of most New year goods tends to be less than 200.

4. Visualization: drawing bar graphs

Data = datas.groupby (name of by=' store') ['sales volume'] .sum (). Sort_values (ascending=False) .head (10) plt.figure (figsize = (10prime4), dpi=80) plt.ylabel ('sales volume') plt.title ('Top Ten shops of Annual sales', fontsize=18) colors = ['# F4A460 'CD00' 'FFA07A',' # FFD700'] plt.bar (data.index,data.values Color=colors) plt.xticks (rotation=45) plt.show ()

The chart shows: the above is the ranking of stores by sales volume. We can see that the first place is the flagship store of three squirrels. It seems that everyone likes to eat practical goods during the Spring Festival.

5. Visualization: drawing horizontal bar chart

Foods = datas.groupby (by=' category') ['sales'] .sum (). Sort_values (ascending=False) .head (10) foods.sort_values (ascending=True,inplace=True) plt.figure (figsize = (10Co4), dpi=80) plt.xlabel ('sales volume') plt.title ('year's recommended purchase list', fontsize=18) colors = ['# F4A460', 'CD00', 'CD00', 'CD96CDD','# EEB4B4','# FFA07A' '# FFD700'] plt.barh (foods.index,foods.values, color=colors,height=1) plt.show ()

The chart shows: according to the category sales ranking, the first place is nuts, which verifies the above hypothesis that people like to eat nuts.

At this point, I believe you have a deeper understanding of "how to use Python to analyze shopping data". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report