Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use Python to crawl zongzi data on Taobao and analyze them

2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

This article will explain in detail how to use Python to crawl zongzi data on Taobao and analyze it. The content of the article is of high quality, so the editor will share it for you to do a reference. I hope you will have a certain understanding of the relevant knowledge after reading this article.

Reptile

Crawling Taobao data, the method adopted this time is: Selenium controls the automatic operation of Chrome browser. In fact, we can also use the Ajax interface to construct links, but it is very cumbersome (including encryption keys, etc.). Using Selenium directly to simulate the browser will save a lot of things.

The most common problem is that the chromedriver driver does not match the version of Google browser, which can be easily solved. Next, we start to use selenium to grab Taobao goods, and use Xpath to parse to get the product name, price, number of payers, store name, shipping address information, and finally save the data locally.

The crawler process is shown below:

Selenium automatic crawling (needs to be scanned and logged in once on Taobao)

From selenium import webdriver# searches for items and gets the item page number def search_product (key_word): # location input box browser.find_element_by_id ("Q"). Send_keys (key_word) # defines the click button And click browser.find_element_by_class_name ('btn-search'). Click () # maximize window: in order for us to scan browser.maximize_window () # wait 15 seconds, give us enough time to scan time.sleep (15) # to locate this "page number" Get the text "page_info = browser.find_element_by_xpath ('/ / div [@ class=" total "]') .text # note that findall () returns a list, although there is only one element that is also a list at this time. Page = re.findall ("(\ d +)", page_info) [0] return page

See the end of the article for the detailed crawler code download.

Data collation

The data we crawled at this time:

Pre-collated data

The data is still rough, and there are a few problems we need to deal with:

Add column name

Remove duplicate data (there will be duplicates in the process of page crawling)

The record of the number of buyers is empty, which is replaced by 0 payment.

Convert the number of buyers into sales (note that some units are ten thousand)

Delete goods without shipping address and extract the provinces among them

Part of the code:

# Delete items without shipping address And extract the province df = df [df ['shipping address'] .notna ()] df ['province'] = df ['shipping address'] .str.split (') .apply (lambda xpurx [0]) # delete the redundant column df.drop (['payers', 'shipping address', 'num',' unit'], axis=1, inplace=True) # reset index df = df.reset_index (drop=True) df.head (10)

Collated data

In this way, we have completed the cleaning and finishing of the data, which is convenient for visualization in the next step.

By the way, do a sort to see which zongzi is the most expensive!

Df1 = df.sort_values (by= "price", axis=0, ascending=False) df1.iloc [: 5jie:]

I want to taste it.

Data visualization

In this article, we intend to use pyecharts for visual presentation. Some students may be using the old version (0.5x). The 1.x version of Pyecharts is not compatible with the old version (0.5x). If you can't import it, it may be this problem.

Visualization of all statements are based on v1.7.1, you can query your pyecharts version with the following statements:

Import pyechartsprint (pyecharts.__version__)

Sector chart

The most expensive zongzi, 1780 yuan, seems to be unaffordable, so what price does everyone buy?

First, it is divided according to the interval recommended by Taobao:

Def price_range (x): # divide the price range if x according to the recommendation of Taobao

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report