In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/03 Report--
Common errors in crawler requests
200: how to handle the request successfully: get the content of the response and process it
201: the request is completed, and the result is that a new resource is created. The URI of the newly created resource can be handled in the responding entity: it is not encountered in the crawler
202: the request is accepted, but the processing has not been completed. Processing method: blocking wait
204: the server has implemented the request, but no new message has been returned. If the customer is a user agent, there is no need to update their own document view for this. Handling method: discarding
This status code is not directly used by HTTP/1.0 applications, but is only used as the default interpretation of the 3XX type response. There are multiple requested resources available. Handling method: if it can be processed in the program, it will be further processed, and if it cannot be processed in the program, it will be discarded.
301: all requested resources are assigned a permanent URL so that the resource handling method can be accessed in the future through this URL: redirect to the assigned URL
302: the requested resource is temporarily saved at a different URL: redirected to temporary URL
304 requested resources are not updated: discarded
400 illegal request processing: discarding
401 unauthorized processing: discard
403 forbidden processing: discard
404 did not find a way to handle it: discard
The 5XX response code with a status code starting with "5" indicates that the server has found an error and cannot continue to perform request processing: discard
Without saying too much, just play the code from scrapy import log import random from scrapy.downloadermiddlewares.useragent import UserAgentMiddleware class RotateUserAgentMiddleware (UserAgentMiddleware): # for more useragent strings,you can find it in http://www.useragentstring.com/pages/useragentstring.php user_agent_list = ["Mozilla/5.0 (Windows NT 6. 1) WOW64) AppleWebKit/537.1 "(KHTML, like Gecko) Chrome/22.0.1207.1 Safari/537.1", "Mozilla/5.0 (X11; CrOS i686 2268.111.0) AppleWebKit/536.11" (KHTML, like Gecko) Chrome/20.0.1132.57 Safari/536.11 "," Mozilla/5.0 (Windows NT 6.1) WOW64) AppleWebKit/536.6 "(KHTML, like Gecko) Chrome/20.0.1092.0 Safari/536.6", "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.6" (KHTML, like Gecko) Chrome/20.0.1090.0 Safari/536.6 "," Mozilla/5.0 (Windows NT 6.2) WOW64) AppleWebKit/537.1 "(KHTML, like Gecko) Chrome/19.77.34.5 Safari/537.1", "Mozilla/5.0 (X11)" Linux x86 / 64) AppleWebKit/536.5 "(KHTML, like Gecko) Chrome/19.0.1084.9 Safari/536.5", "Mozilla/5.0 (Windows NT 6 / 0) AppleWebKit/536.5"(KHTML, like Gecko) Chrome/19.0.1084.36 Safari/536.5", "Mozilla/5.0 (Windows NT 6.1) WOW64) AppleWebKit/536.3 "(KHTML, like Gecko) Chrome/19.0.1063.0 Safari/536.3", "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/536.3" (KHTML, like Gecko) Chrome/19.0.1063.0 Safari/536.3 "," Mozilla/5.0 (Macintosh) Intel Mac OS X 10: 8) AppleWebKit/536.3 "(KHTML, like Gecko) Chrome/19.0.1063.0 Safari/536.3", "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3" (KHTML, like Gecko) Chrome/19.0.1062.0 Safari/536.3 "," Mozilla/5.0 (Windows NT 6.1) WOW64) AppleWebKit/536.3 "(KHTML, like Gecko) Chrome/19.0.1062.0 Safari/536.3", "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3" (KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3 "," Mozilla/5.0 (Windows NT 6.1) WOW64) AppleWebKit/536.3 "(KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/536.3" (KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3 "," Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3 "(KHTML) Like Gecko) Chrome/19.0.1061.0 Safari/536.3 "," Mozilla/5.0 (X11) Linux x86 / 64) AppleWebKit/535.24 "(KHTML, like Gecko) Chrome/19.0.1055.1 Safari/535.24", "Mozilla/5.0 (Windows NT 6.2) WOW64) AppleWebKit/535.24 "(KHTML, like Gecko) Chrome/19.0.1055.1 Safari/535.24"] def process_request (self, request Spider): ua = random.choice (self.user_agent_list) if ua: # displays the currently used useragent print "* Current UserAgent:%s*"% ua # record log.msg ('Current UserAgent:' + ua) request.headers.setdefault ('User-Agent', ua)
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.