In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article mainly explains "what to do when the collection of web crawlers is limited". The content of the explanation in the article is simple and clear, and it is easy to learn and understand. let's study and learn "what to do when web crawler collection is limited"!
1. Make the dynamic change of the waiting time, that is, the minimum time interval minus the reading time of the web page, to ensure that the average crawling time of the web page is the minimum interval when the network is smooth and poor.
This method may allow single-threaded crawlers to visit small-scale sites, but when multithreaded distributed crawlers visit large-scale sites, the overall crawling time is determined by multiple parallel crawling tasks. Various anomalies (page invalid or connection timeout) are even more unable to calculate the crawling time.
2. Considering all kinds of factors, it is obvious that the fuzzy method is needed, and the correct calculation method is not needed to control the crawling speed of the crawler, and the speed is intuitively expressed in frequency (page / minute)-PID control algorithm is one of them. The principle of PID controller to control the speed of reptiles is simple: the speed is fast, the delay time increases slowly, and the delay time decreases.
3. Use proxy IP to solve the problem that IP is limited, but we must pay attention to analyze the anti-crawling mechanism of different websites.
Thank you for your reading. the above is the content of "what to do when web crawler collection is restricted". After the study of this article, I believe you have a deeper understanding of what to do when web crawler collection is restricted. the specific use of the situation also needs to be verified by practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.