Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Scrapy setting & quot; request Pool & quot

2025-02-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/03 Report--

Common errors in crawler requests

200: how to handle the request successfully: get the content of the response and process it

201: the request is completed, and the result is that a new resource is created. The URI of the newly created resource can be handled in the responding entity: it is not encountered in the crawler

202: the request is accepted, but the processing has not been completed. Processing method: blocking wait

204: the server has implemented the request, but no new message has been returned. If the customer is a user agent, there is no need to update their own document view for this. Handling method: discarding

This status code is not directly used by HTTP/1.0 applications, but is only used as the default interpretation of the 3XX type response. There are multiple requested resources available. Handling method: if it can be processed in the program, it will be further processed, and if it cannot be processed in the program, it will be discarded.

301: all requested resources are assigned a permanent URL so that the resource handling method can be accessed in the future through this URL: redirect to the assigned URL

302: the requested resource is temporarily saved at a different URL: redirected to temporary URL

304 requested resources are not updated: discarded

400 illegal request processing: discarding

401 unauthorized processing: discard

403 forbidden processing: discard

404 did not find a way to handle it: discard

The 5XX response code with a status code starting with "5" indicates that the server has found an error and cannot continue to perform request processing: discard

Without saying too much, just play the code from scrapy import log import random from scrapy.downloadermiddlewares.useragent import UserAgentMiddleware class RotateUserAgentMiddleware (UserAgentMiddleware): # for more useragent strings,you can find it in http://www.useragentstring.com/pages/useragentstring.php user_agent_list = ["Mozilla/5.0 (Windows NT 6. 1) WOW64) AppleWebKit/537.1 "(KHTML, like Gecko) Chrome/22.0.1207.1 Safari/537.1", "Mozilla/5.0 (X11; CrOS i686 2268.111.0) AppleWebKit/536.11" (KHTML, like Gecko) Chrome/20.0.1132.57 Safari/536.11 "," Mozilla/5.0 (Windows NT 6.1) WOW64) AppleWebKit/536.6 "(KHTML, like Gecko) Chrome/20.0.1092.0 Safari/536.6", "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.6" (KHTML, like Gecko) Chrome/20.0.1090.0 Safari/536.6 "," Mozilla/5.0 (Windows NT 6.2) WOW64) AppleWebKit/537.1 "(KHTML, like Gecko) Chrome/19.77.34.5 Safari/537.1", "Mozilla/5.0 (X11)" Linux x86 / 64) AppleWebKit/536.5 "(KHTML, like Gecko) Chrome/19.0.1084.9 Safari/536.5", "Mozilla/5.0 (Windows NT 6 / 0) AppleWebKit/536.5"(KHTML, like Gecko) Chrome/19.0.1084.36 Safari/536.5", "Mozilla/5.0 (Windows NT 6.1) WOW64) AppleWebKit/536.3 "(KHTML, like Gecko) Chrome/19.0.1063.0 Safari/536.3", "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/536.3" (KHTML, like Gecko) Chrome/19.0.1063.0 Safari/536.3 "," Mozilla/5.0 (Macintosh) Intel Mac OS X 10: 8) AppleWebKit/536.3 "(KHTML, like Gecko) Chrome/19.0.1063.0 Safari/536.3", "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3" (KHTML, like Gecko) Chrome/19.0.1062.0 Safari/536.3 "," Mozilla/5.0 (Windows NT 6.1) WOW64) AppleWebKit/536.3 "(KHTML, like Gecko) Chrome/19.0.1062.0 Safari/536.3", "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3" (KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3 "," Mozilla/5.0 (Windows NT 6.1) WOW64) AppleWebKit/536.3 "(KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/536.3" (KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3 "," Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3 "(KHTML) Like Gecko) Chrome/19.0.1061.0 Safari/536.3 "," Mozilla/5.0 (X11) Linux x86 / 64) AppleWebKit/535.24 "(KHTML, like Gecko) Chrome/19.0.1055.1 Safari/535.24", "Mozilla/5.0 (Windows NT 6.2) WOW64) AppleWebKit/535.24 "(KHTML, like Gecko) Chrome/19.0.1055.1 Safari/535.24"] def process_request (self, request Spider): ua = random.choice (self.user_agent_list) if ua: # displays the currently used useragent print "* Current UserAgent:%s*"% ua # record log.msg ('Current UserAgent:' + ua) request.headers.setdefault ('User-Agent', ua)

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report