Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Python crawl preparation four define Opener and set proxy IP

2025-03-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/03 Report--

Handler and Opener

Handler processor and custom Opener

Opener is an instance of urllib2.OpenerDirector, and we've been using urlopen, which is a special opener (that is, we built it).

However, the urlopen () method does not support other HTTP/GTTPS advanced features such as agents, cookie, and so on. All need to support these features:

1. Use related Handler processors to create processor objects for specific functions

two。 Then use these handler objects through the urllib2.build_opener () method to create a custom opener object

3. Using a custom opener object, call the open () method to send the request.

If all requests in the program use a custom opener, you can use urllib2.install_open () to define the custom opener object as a global opener, which means that if you call urlopen later, you will use this opener (choose according to your own needs)

Custom opener ()

# _ * _ coding:utf-8 _ * _ import urllib2# to build a HTTPHandler processor object that supports processing HTTP requests http_handler = urllib2.HTTPHandler () # call the build_opener () method to build a custom opener object The argument is the built processor object opener = urllib2.build_opener (http_handler) request = urllib2.Request ('http://www.139.com')# calls the open () method of the custom opener object and sends the request request response = opener.open (request) print response.read ()

Set up proxy IP

Many websites will detect the number of visits to an IP in a certain period of time (through traffic statistics, system logs, etc.). If the number of visits is not like that of a normal person, it will prohibit the visit of this IP.

So we can set up some proxy servers and change a proxy every once in a while, even if IP is disabled, we can still use another IP to continue crawling.

In urllib2, use ProxyHandler to set up the use of proxy server, and use custom opener to use proxy:

Agent IP website: http://www.xicidaili.com/;https://www.kuaidaili.com/free/inha/

# _ * _ coding:utf-8 _ * _ import urllib2# builds a Handler processor object with parameters of a dictionary type, including proxy type and proxy server IP+Porthttpproxy_handler = urllib2.ProxyHandler ({'http':'118.114.77.47:8080'}) # using proxy opener = urllib2.build_opener (httpproxy_handler) request = urllib2.Request (' http://www.baidu.com/s')#1 if so written) Custom proxies are used only when sending requests using the opener.open () method, while urlopen () does not use custom proxies. Response = opener.open (request) # 12 if you write this, you will apply opener globally, and then all requests, whether opener.open () or urlopen (), will use a custom agent. # urllib2.install_opener (opener) # response = urllib2.urlopen (request) print response.read ()

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report