Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How does Python crawl JD.com 's commodity ranking

2025-02-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

What this article shares with you is about how Python climbs JD.com 's commodity ranking. The editor thinks it is very practical, so I share it with you to learn. I hope you can get something after reading this article.

Establish demand objectives

The reason for writing crawlers must be that there is a need to write, otherwise it will be meaningless. Once you know what you want to do, let's start loading practical information. I am also a beginner. Don't spray if you don't like it.

Python3

Urllib library

Pyquery library

Before crawling, we need to understand the application of several libraries of python.

Actual combat operation

Step one:

First of all, let's search for a random name, and here we take the computer as an example. Through the connection, we found that several keywords were extracted from the above domain name link, which is also our focus, first get the first step of the link.

In the picture, we can see that the values after keyword and wq correspond to the names we searched for, so we will definitely use them later. Let's go on with the analysis. Let's grab a package and have a look. I believe everyone will use F12. Through observation, we found that when we click on the search, the first link it visits is the top link, the parameter is the keyword we entered, and the return result is a web page source code, which is our current page code by comparison.

Step 2:

Through the above step analysis, we have determined that JD.com 's web page structure is still relatively simple, directly return to the web page source code, then we only need to use the CSS method of pyquery library to extract the content we need. Then we also found that the psort parameters represent the sales volume and price of the search results. This is of course we clicked twice to try out, the structure of each website is different, there is no explanation here. I write different rules for different parameters in the following code. You can also move a little bit by yourself to see what that parameter is. There is no more to say below, the code is also very simple, you can see for yourself.

Import urllib.requestimport urllib.parsefrom pyquery import PyQuery as pq# transcodes and replaces the characters in the link shuru=input ('this program is to query the ranking of JD.com 's products!\ nPlease enter the keywords to query:\ n') shuru=urllib.parse.quote (shuru) # choose the search results to be sorted by the number guize=input (' Please choose the result sorting method:\ n0 represents synthesis, 1 represents price from high to low, 2 represents price from low to high, 3 represents sales volume. 4 represents the number of reviews, 5 represents the new product:\ n') guize=int (guize) if guize= = 0: psort= 'elif guize= = 1: psort=' & psort=1'elif guize= = 2: psort='& psort=2'elif guize= = 3: psort='& psort=3'elif guize= = 4: psort='& psort=4'elif guize= = 5: psort='& psort=5'print (psort) headers = {'user-agent':' Mozilla/5.0 (Windows NT 6.1) WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.25 Safari/537.36 Core/1.70.3741.400 QQBrowser/10.5.3863.400',} # link psort is the ranking condition, 3 represents sales volume, 4 represents number of comments, 5 represents new products, 2 represents price ascending order, 1 represents price descending order Synthesis does not have this parameter url= "https://search.jd.com/Search?keyword="+shuru+"&wq="+shuru+psort+"&click=0"print(' query link is:% slots% (url) response=urllib.request.Request (url,headers=headers) response=urllib.request.urlopen (response) # the above steps have obtained the source code of a single web page, followed by analysis to extract useful data. # use pyquery library to deal with source code Then use CSS to extract the required information res=pq (response.read (). Decode ('utf-8')) lis=res ('. Gl-warp.clearfix. Gl-I wrap. items () i=1for li in lis: ming=li (. P-name em'). Text () jia=li ('. P-price strong i'). Text () print ("% d name\ n% s"% (I) Ming) print ('price: ¥% s\ n\ n% (jia)) i=i+1 if iTunes 11: breakinput (' enter any character to exit') above is how Python crawled JD.com 's ranking The editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report