In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article introduces the relevant knowledge of "examples of reptiles using ip proxy pool". In the operation process of actual cases, many people will encounter such difficulties. Next, let Xiaobian lead you to learn how to deal with these situations! I hope you can read carefully and learn something!
description
1. In the proxy IP collection module, collect proxy IP and detect proxy IP. If not, filter it out. If available, specify a default score, stored in the database.
2. In the proxy IP detection module, obtain all proxy IPs from the database and detect the proxy. If the proxy IP cannot be used, score-1. If the score is 0, delete it from the database. Otherwise, update the database. If the proxy IP can be used, restore the default score and update the database.
3. In the proxy API module, provide the available proxy IP for the crawler from the database.
examples
data model
class Proxy(object): def __init__(self, ip, port, protocol=-1, nick_type=-1, speed=-1, area=None, score=MAX_SCORE, disable_domains=[]): # ip: IP address of proxy self.ip = ip # port: port number of proxy IP self.port = port # protocol: Protocol type supported by proxy IP, http is 0, https is 1, https and http are both supported is 2,-1 is unavailable self.protocol = protocol # nick_type: proxy IP anonymity, high anonymity: 0, anonymous: 1, transparent: 2 self.nick_type = nick_type # speed: Response speed of proxy IP, in s self.speed = speed # area: proxy IP area self.area = area # score: score of the agent IP, used to measure the availability of the agent; self.score = score #Default scores can be configured via configuration files. When checking agent availability, subtract 1 for each failed request and delete from pool when reduced to 0. If inspection agent is available, restore default score #disabled_domains: List of unavailable domains, some proxy IPs are unavailable under some domains, but available under others self.disable_domains = disable_domains # 3. Provide the__str__method, which returns a data string def __str__(self): #Return data string return str(self.__ dict__)"Examples of crawler using ip proxy pool" is introduced here, thank you for reading. If you want to know more about industry-related knowledge, you can pay attention to the website. Xiaobian will output more high-quality practical articles for everyone!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.