Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How does a crawler deploy a proxy ip

2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/03 Report--

This article mainly shows you "how the crawler deploys the agent ip". The content is simple and clear. I hope it can help you solve your doubts. Let the editor lead you to study and learn the article "how the crawler deploys the agent ip".

The method for a crawler to deploy a proxy ip:

1. Using IP proxy pool technology, one IP agent is randomly selected from the IP proxy pool each time to crawl data.

Import urllib.requestimport random# builds the IP proxy pool ip_pool = ['58.221.55.58 def ip 808,' 120.198.248.26 ip_pool, '221.229.166.55] def ip (80808) Url): # randomly select an IP proxy from the IP proxy pool ip = random.choice (ip_pool) print (ip) # format IP proxy proxy = urllib.request.ProxyHandler ({'http': ip}) # load IP proxy opener = urllib.request.build_opener (proxy, urllib.request.HTTPHandler) return urllib.request.urlopen (url). Read (). Decode (' utf-8' 'ignore') data = ip (ip_pool,' https://www.baidu.com/?tn=98010089_dg&ch=15')print(data)

2. It is safer to use the combination of IP agent pool and user agent to increase the number of visits. Using the dynamic ip of ip agent to build its own proxy ip pool can guarantee the quality of ip.

Because the ip resource of the ip agent is independent and valid.

Import urllib.requestimport randomimport urllib.error# custom UA_IP class, which is used to randomly get def UA_IP (thisUrl): # build user agent pool ua_pool = ['Mozilla/5.0 (Windows NT 6.1) WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.89 Safari/537.36', 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/13.0.782.41 Safari/535.1 QQBrowser/6.9.11079.201', 'Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727) .net CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; InfoPath.3; .NET 4.0C; .NET 4.0E)', 'Mozilla/5.0 (Windows NT 6.1; WOW64) Rv:6.0) Gecko/20100101 Firefox/6.0'] # build ip proxy pool ip_pool = ['139.196.196.74clients,' 112.124.47.21cycles, '61.129.70.109' '221.229.166.55'] thisUA = random.choice (ua_pool) # randomly select a user agent thisIP = random.choice (ip_pool) from the user agent pool # randomly select an IP agent headers = ('User-Agent') from the IP agent pool ThisUA) # Construction header # format IP proxy = urllib.request.ProxyHandler ({'http': thisIP}) # load IP proxy opener = urllib.request.build_opener (proxy Urllib.request.HTTPHandler) # load proxy opener.addheaders = [headers] # set opener to global urllib.request.install_opener (opener) # crawl information from web pages data = urllib.request.urlopen (thisUrl). Read (). Decode ('utf-8',' gnore') return data # web page pool Later, select a crawl urls from the page pool to perform the page information = ['https://mp.csdn.net/mdeditor/88323361#',' https://mp.csdn.net/mdeditor/88144295#', 'https://mp.csdn.net/mdeditor/88144295#',' https://mp.csdn.net/mdeditor/88081609#'] # crawl 1000 times for i in range (0) 1000): try: thisUrl = random.choice (urls) data = UA_IP (thisUrl) print (len (data)) except urllib.error.HTTPError as e: if hasattr (e, 'code'): print (e.code) if hasattr (e,' reason'): print (e.reason) these are all the contents of the article "how crawlers deploy proxy ip" Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report