Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to configure Nutch1.7 to eclipse

2025-02-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

This article mainly introduces how to configure Nutch1.7 to eclipse, the article is very detailed, has a certain reference value, interested friends must read it!

Get to the point:

1. Deploy the nutch project to eclipse

First of all, go to the nutch official website to find the FAQ link http://wiki.apache.org/nutch/FAQ, and enter the link.

Click the second link to view.

Refer to the documentation for configuration, there are jams do not understand, continue to look for information on the Internet for specific problems. You can refer to http://blog.csdn.net/witsmakemen/article/details/8866235 when integrating.

You must have the following prerequisites before running:

A. Install and configure Apache Ant, http://ant.apache.org/manual/index.html on windows

B. It goes without saying that installing Eclipse is a must.

C. Install svn in Linux. Purpose: to move out the nutch2.7 source code.

D, check out the nutch2.7 code under Linux

[root@nutch-five branch-1.7] # svn co http://svn.apache.org/repos/asf/nutch/branches/branch-1.7/

E. Install the ivy plug-in in Linux. Aim to download the jar package dynamically according to the ivy configuration file.

F, compile branch2.7

[root@nutch-five branch-1.7] # ant

2. Execute the ant command under Linux to compile the source code: ant

3. After the compilation is successful, copy the trunk folder into windows and import it into eclipse

A, eclipse: File-New-Java Project

B, click NEXT

Locate the conf folder and click Add Folder 'conf' to build path

Defautl output is set to apache-nutch-1.7/conf

Here I write it as conf because I can't create conf.

Click Finish

So far, the establishment has been successful.

C. at this point, you will find that there is an error in the project (the small red fork), which is caused by the lack of references.

Take parse-html as an example:

Import org.cyberneko.html.parsers.*

The error is reported here because of the lack of nekohtml-0.9.5.jar

How to get nekohtml-0.9.5.jar:

Find runtime/local/plugins/lib-nekohtml/nekohtml-jar, right-click add to build path, other bug, and so on.

The total jar packages you need here are: runtime/local/plugins/lib-nekohtml/nekohtml-jar, runtime/local/plugins/ parse-html/tagsoup.jar, runtime/local/plugins/feed / rome.jar (solve the problem of com.sun.syndication.io.SyndFeedInput error), runtime/local/plugins/ urlfilter-automaton/automaton.jar (solve the problem of dk.brics.automaton.RunAutomaton error)

There should be nothing wrong with the whole project at this point.

4. Create a new folder urls

Create a new file url in the folder to store the URL to be crawled

Such as: http://www.163.com/

5. Run the program

6. Report an exception:

ERROR security.UserGroupInformation (UserGroupInformation.java:doAs (1193))-PriviledgedActionException as:hadoop cause:java.io.IOException: Failed to set permissions of path:\ tmp\ hadoop-hadoop\ mapred\ staging\ hadoop1071373990\ .staging to 0700

Solution:

Download the hadoop source code, recompile the hadoop-core.jar package for hadoop, and replace the jar package for hadoop-core.jar in the nutch project. (note: when downloading, determine which version of the jar package of hadoop referenced by nutch, and then download the corresponding version. The jar package corresponding to hadoop can be found in runtime/local/lib. )

Download the hadoop source code:

Download link: http://apache.dataguru.cn/hadoop/common/hadoop-1.2.1/

1. Comment out the

2. Get rid of create-native-configure dependency in compile-core-native

3. Modify line hadoop-1.1.2/src/core/org/apache/hadoop/fs/FileUtil.java 691to change throw new IOException to LOG.warn

4. Compile the project with ant. After successful compilation, take out the hadoop-core.jar folder from the build folder, put it into the nutch project, replace the original hadoop-core.jar in the project, and put it under the build/lib folder. Then add the jar package through buildpath.

After adding, run the test

7. Continue to run and report errors

Java.lang.RuntimeException: Error in configuring object

Solution:

Change the nutch-default.xml

Plugin.folders

. / src/plugin

Directories where nutch plugins are located. Each

Element may be a relative or absolute path. If absolute, it is used

As is. If relative, it is searched for on the classpath.

Just change the red.

With reference to the above run method, continue to run the test and generate an exception:

Java.net.SocketException: Software caused connection abort: recv failed

Solution:

The link is unsuccessful, can be ignored, and has no effect on the program.

So far, nutch has been integrated into eclipse successfully.

Related exception:

ERROR security.UserGroupInformation (UserGroupInformation.java:doAs (1193))-PriviledgedActionException as:hadoop cause:java.io.IOException: Failed to set permissions of path:\ tmp\ hadoop-hadoop\ mapred\ staging\ hadoop1071373990\ .staging to 0700

Solution:

Recompile the hadoop-core.jar package for hadoop and replace the jar package for hadoop in the nutch project

1. Comment out the

2. Get rid of create-native-configure dependency in compile-core-native

3. Modify line hadoop-1.1.2/src/core/org/apache/hadoop/fs/FileUtil.java 691to change throw new IOException to LOG.warn

4. Compile the project with ant. After successful compilation, take out the hadoop-core.jar folder from the build folder, put it into the nutch project, replace the original hadoop-core.jar in the project, and put it under the build/lib folder. Then add the jar package through buildpath.

After adding, run the test

When setting up the source code environment, it was found that the package of sun.net.util.ipaddressutil did not exist.

Classes in the sun.net package are not allowed by default in eclipse. The solution is to customize the right button on the access rules project-> Project Properties-> java builder path- > Libraries tab, click the Access rules,add sun/** in JRE System Library to be accessible, and if this item exists, edit.

These are all the contents of the article "how to configure Nutch1.7 to eclipse". Thank you for reading! Hope to share the content to help you, more related knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report