In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article is about how to configure nutch+hadoop. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.
Use of nutch+hadoop configuration
Configure nutch+hadoop
1. Download nutch. If you don't need to specifically develop hadoop, you don't need to download hadoop. Because hadoopcore package and related configuration are included in nutch.
2. Set up a directory (according to your preferences)
/ nutch
/ search (nutchinstallationgoeshere) nutch is installed here, that is, unzipped here
File system save point of / filesystemhadoop
/ local/crawl is followed by an index for search
/ home (nutchuser'shomedirectory) if you use system users, this is basically useless
/ tomcat starts the app that nutch.war uses for search indexes
3The JAVA_HOME must be configured, otherwise the system will not work.
4. Configure ssh for master and slave, otherwise enter passwd every time
Ssh-keygen-trsa
Then you can enter the car.
Cpid_rsa.pubauthorized_keys
(copy to other slave) scp/nutch/home/.ssh/authorized_keysnutch@devcluster02:/nutch/home/.ssh/authorized_keys
5, dos2unix all the .sh, nuch, hadoop files under bin and conf
Dos2unix/nutch/search/bin/*.sh/nutch/search/bin/hadoop
Configure hadoop-size.xml
6. Remember to transfer the master file from nutch/hadoopcopy to this nutch, which should be bug. In other words, this file is required for startup, and the file content is the default localhost (if it is distributed, it may need to be configured)
7 namenode needs to be formatted in the configuration process of nutchaugh Hadoop
Bin/hadoopnamenode-format
8, start: bin/start-all.sh
9, configure crawl (take configuring a URL lucene.apache.org as an example)
Cd/nutch/search mkdirurls viurls/urllist.txt http://lucene.apache.org cd/nutch/search bin/hadoopdfs-puturlsurls cd/nutch/search viconf/crawl-urlfilter.txt changethelinethatreads:+ ^ http://([a-z0-9]*\.)*MY.DOMAIN.NAME/ toread:+ ^ http://([a-z0-9]*\.)*apache.org/
10, start crawl
Bin/nutchcrawlurls-dircrawled-depth4
11, query
Bin/hadoopdfs-copyToLocalcrawled/media/do/nutch/local/ (crawled) copy the things of index into the above configured local, because search cannot be searched in dfs (from the documentation)
12, start nutch.war, test
Under classes in vinutch-site.xmlnutch.war
Starttomcat
Note:
1Masters file nutch is not available, so you need to copy it to conf.
2 there is a problem with the log4j configuration of Magazine Crawl by default, which needs to be added:
Hadoop.log.dir=.
Hadoop.log.file=hadoop.log
3Query Nutch 2.0 must be configured with nutch-site.xml. Reconfigure http.agent. Default.xml already exists.
Problems with nutch+hadoop configuration:
1. When I was running the hadoop program, I terminated it halfway, and then when I added or deleted files to hdfs, there was a Namenodeisinsafemode error:
Rmr:org.apache.hadoop.dfs.SafeModeException:Cannotdelete/user/hadoop/input.Namenodeisinsafemode
Commands for resolution:
Bin/hadoopdfsadmin-safemodeleave# shuts down safemode
Index command:
Bin/nutchindexplainindex/paodingindexesplainindex/crawldbplainindex/linkdbplainindex/
Segments/20090528132511plainindex/segments/20090528132525plainindex/segments/20090528132602
Eg:
Index:
Bin/nutchindexcrawled/indexes_newcrawled/crawldbcrawled/linkdbcrawled/segments/20100313132517
Merge:
Bin/nutchmergecrawled/index_newcrawled/indexes_new
Deduplicating dedup:
Bin/nutchdedupcrawled/index_new .
Thank you for reading! This is the end of the article on "how to configure nutch+hadoop". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, you can share it for more people to see!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 233
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.