In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article mainly explains "how to improve the work efficiency of the reptile". The content of the explanation is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought. Let's study and learn "how to improve the work efficiency of reptiles"!
1. reduce the number of visits as much as possible.
The main time of the single crawler task is waiting for the response of the network request, so if the network request can be reduced, the request will be reduced as much as possible, which can reduce the pressure on the target site and the proxy server. at the same time, you can reduce your workload and improve your work efficiency.
2. Simplify the process and reduce repetition.
Strictly speaking, most sites do not have a cross-tree structure, but a multi-crossing network structure, so that there will be a lot of repetition when going deep into the web page from multiple entrances. Generally speaking, the only judgment is made according to URL or ID, and the crawled web page does not have to climb. If you can get some data on one or more pages, choose to get it on only one page.
3. Multithreading, IO blocking is a large number of crawler tasks, multithreading concurrency effectively improves the overall speed.
Multithreading can improve the utilization of resources, the program design is more robust, and the program response is faster.
4. Distribute tasks.
The above three points have achieved the extreme, but the number of web pages that each machine can climb in a unit time is not enough to achieve the goal, can not complete the task in time, can only complete the crawler task on multiple machines at the same time, this is the distributed crawler. For example, if you have 100W pages to climb, you can use five machines to climb 20W pages without repeating each other, which is five times less than a single machine.
Thank you for your reading. the above is the content of "how to improve the work efficiency of reptiles". After the study of this article, I believe you have a deeper understanding of how to improve the work efficiency of reptiles. The specific use of the situation also needs to be verified by practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.