Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

An example Analysis of Network request of Python Crawler

2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)05/31 Report--

This article introduces the relevant knowledge of "Python crawler web request case analysis". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

1.IP Agent

Some websites will detect the number of visits to a certain IP over a period of time. If you visit too many times, access will be prohibited. In this case, you need to set up some proxy servers and change the proxy at regular intervals. Classification of IP agents:

① transparent proxy: the destination website can know that the proxy and source IP address are used, which obviously does not meet the requirements.

② Anonymous proxy: the destination website knows that the proxy is used, but does not know the source IP address

③ high concealment agent: the safest way, the target site does not know that the agent is used, nor the source IP address.

2.Cookie

To solve the stateless state of http, the server generates Cookie as the request header and stores it in the browser when sending the request to the server for the first time; the browser will carry the Cookie information when sending the request again.

Import urllib.requestfrom http import cookiejarfilename = 'cookie.txt'# get Cookiedef get_cookie (): # instantiate a MozillaCookieJar to store cookie cookie = cookiejar.MozillaCookieJar (filename) # create handler object handler = urllib.request.HTTPCookieProcessor (cookie) # create opener object opener = urllib.request.build_opener (handler) # request URL url =' https://tieba.baidu.com/index.html?traceid=#' resp = opener.open (url) # send request # Store cookie file cookie.save () # read cookiedef use_cookie (): # instantiate MozillaCookieJar cookie = cookiejar.MozillaCookieJar () # load cookie file cookie.load (filename) print (cookie) get_cookie () use_cookie () 3. Exception handling

① urllib.error.URLError: used to catch exceptions generated by urllib.request and return the cause of the error using the reason property

Import urllib.requestimport urllib.error url = 'http://www.google.com'try: resp = urllib.request.urlopen (url) except urllib.error.URLError as e: print (e.reason)

Output result:

[WinError 10060] the connection attempt failed because the connecting party did not reply correctly or the connected host did not respond after a period of time.

② urllib.error.HTTPError: errors used to process HTTP and HTTPS requests

There are three attributes:

Code: the status code returned by the request

Reason: returns the cause of the error

Headers: the response header information returned by the request

Import urllib.requestimport urllib.error url = 'https://movie.douban.com/'try: resp = urllib.request.urlopen (url) except urllib.error.HTTPError as e: print (' cause:', e.reason) print ('response status code:', str (e.code)) print ('response header data:', e.headers)

This is the end of the content of "Web request case Analysis of Python Crawler". Thank you for your reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report