In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article introduces the knowledge of "Why can't you write distributed crawlers in Nutch"? many people will encounter this dilemma in the operation of actual cases, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!
1. Nutch runs on hadoop, and hadoop itself takes a lot of time.
If the number of cluster machines is small, the crawling speed is not as fast as stand-alone crawlers.
2. Nutch is a crawler designed for search engines, which is not accurate.
Most users need a crawler to crawl accurate data (extract precision). In the whole process of running Nutch, 2/3 is designed for search engines and doesn't make much sense for selection. In other words, using Nutch for data extraction will waste a lot of time on unnecessary calculations. And, through the secondary development of Nutch to make it suitable for selected services, basically destroy the framework of Nutch, so that the face of Nutch is completely different, with the ability to modify Nutch, compared with its own rewriting distributed crawler framework.
3. Nutch can provide extraction function.
But anyone who develops Nutch plug-ins knows how bad the Nutch plug-in system is. Using the reflection mechanism to load and invoke plug-ins makes it very difficult to write and debug the program, not to mention developing a complex fine extraction system.
4. It takes a long time to write and debug with Nutch crawler.
It is usually more than ten times that of stand-alone crawlers. The cost of learning Nutch source code is very high, and there will be all kinds of problems in the debugging process, except the program itself.
That's all for "Why can't you write distributed crawlers in Nutch?" Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.