Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Analysis on the process of collecting Commodity Review Information in a certain East of Python Crawler

2025-01-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly explains the "Python crawler some east commodity review information collection process analysis", the article explains the content is simple and clear, easy to learn and understand, the following please follow the editor's ideas slowly in depth, together to study and learn "Python crawler some east commodity review information collection process analysis"!

Comment interface

I. Interface lookup

Click on a product at will, jump to the details page, and click on the product evaluation.

Scroll down to see the number of comment display pages. Only 100 pages are displayed here.

To find a real comment interface, refresh the page directly, which is quite troublesome to find.

Turn on debugging, clear the request content, and click directly to view the interface information on the second page, as shown below

Looking at the response information, it is easy to tell from the field comments that this is the comment interface you are looking for, which also contains popular comment information.

Second, parameter search

First take a screenshot to record the request parameters by clicking on the second page

Then click on the third page, search the productP directly in the search box on the left, filter the useless interface information, view the request parameters, and compare them with the request parameters on the previous page.

The following conclusions can be drawn from the analysis here.

ProductId represents the ID of the current product. If you change the ID of the product, you can collect comments on different products.

Page represents the number of pages visited. Here, the number of pages is calculated from 0, and the number of pages requested by the parameter is equal to the number of pages actually clicked minus 1.

Third, code testing

The code is as follows. Ua and referer need to be added to the headers when the request is made. Here, only 2 is set for page turning.

The implementation results are as follows:

The code only extracts the product ID, comment content, comment time, and the data marked in the red box in the following figure

If you want to extract additional field information, you can add it yourself in the code.

Search interface

I. Interface lookup

Search for food as an example, enter food, and click search

Continue to scroll down to check the number of returned pages of goods. Here is also the maximum return of 100 pages of information.

Second, parameter search

Similarly, according to the slide, turn the page to see the changes in the parameters.

There is a lot of product display information on the page, so it is possible to load a request temporarily. If you continue to scroll down, you can see that a new request has been added. The request parameters are as follows. (note: the new parameters can be ignored)

Then click on the third page

If you can't find the pattern, you can continue to click on the page to view the change rules.

The construction logic of the interface parameters is as follows:

There are two requests per page, and the initial value of page is 1

The value of s increases by 25 per request, with an initial value of 1

The values of other parameters remain unchanged, and some of the new parameters can be ignored

Third, html page parsing

Directly navigate to the product location on the page, and you can see that all the product information is in the li tag under the ul tag.

Click on the li tag, you can see that the a tag under div/div contains commodity title information, commodity link information, and the link contains the product_id information we need to extract. Right-click copy and copy xpath to extract location information directly.

Fourth, code testing

The code is as follows. Note that in headers, the referer parameter needs to be url encoded.

Thank you for your reading, the above is the content of "Python Crawler East Commodity Review Information Collection process Analysis". After the study of this article, I believe you have a deeper understanding of the problem of Python Crawler commodity comment information collection process analysis, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report