Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to integrate Nutch+solr+mmseg4j

2025-03-31 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

This article mainly introduces how to integrate Nutch+solr+mmseg4j, has a certain reference value, interested friends can refer to, I hope you can learn a lot after reading this article, the following let the editor take you to understand it.

Chapter 1 installation and configuration solr4.2# download solr4.2.0 version

[root@nutch nutch3] # wget http://archive.apache.org/dist/lucene/solr/4.2.0/solr-4.2.0.tgz

# decompress solr4.2.0 file

[root@nutch nutch3] # tar-xzvf solr-4.2.0.tgz

# copy nutch/conf/schema.xml to solr/collection1/conf

In the solr4.2.0 version, we need to copy the schema-solr4.xml file of nutch to the conf directory under collection1 and specify it as schema.xml

[root@nutch nutch3] # cp / home/nutch3/release-1.6/runtime/local/conf/schema-solr4.xml / home/nutch3/solr-4.2.0/example/solr/collection1/conf/schema.xml

# start the solr server

[root@nutch example] # java-jar start.jar &

An error is reported after startup:

_ version_ does not exist

Unable to use updateLog: _ version_field must exist in schema, using indexed= "true" stored= "true" and multiValued= "false" (_ version_ does not exist)

The _ version_ field does not exist

Solution:

Modify solr/collection1/conf/schema.xml and add the following:

# shut down the solr server

[root@nutch example] # jps

4625 jar

4664 Jps

[root@nutch example] # kill-9 4625

[root@nutch example] #

Solr4.2.0 version and solr3.6.2 version, the biggest difference is that

In the solr4.2.0 version, we no longer need to replace all text in the solr/conf/solrconfig.xml file with content

# restart the solr server

[root@nutch example] # java-jar start.jar &

Open a browser to access port 8983

Http://192.168.1.49:8983/solr/

Configure solr4.2 with word splitter mmseg4j 1.9.0

We submit the index without a participle, which uses the default participle. But the default participle effect is not what we want. So, we configure solr with a mmseg4j word splitter.

# stop the service

[root@nutch example] # jps

5927 Jps

5853 jar

[root@nutch example] # kill-9 5853

# download mmseg4j1.9

[root@nutch nutch3] # wget http://mmseg4j.googlecode.com/files/mmseg4j-1.9.1.v20130120-SNAPSHOT.zip

# unzip command to extract mmseg4j1.9

[root@nutch nutch3] # unzip mmseg4j-1.9.1.v20130120-SNAPSHOT.zip-d mmseg4j-1.9.1

# create a lib directory

[root@nutch nutch3] # mkdir solr-4.2.0/example/solr/collection1/lib

Copy 3 jar packages from the dist directory under mmseg4j1.9 to the solr/collection1/lib directory

[root@nutch nutch3] # cp mmseg4j-1.9.1/mmseg4j-1.9.1-SNAPSHOT/dist/* solr-4.2.0/example/solr/collection1/lib

Modify the schema.xml to specify the Tokenizer that uses seg4j

[root@nutch nutch3] # vi solr-4.2.0/example/solr/collection1/conf/schema.xml

Replace and replace with

There are two main points to pay attention to when configuring the word splitter in Solr:

Copy the jar package

Replace the schema word splitter

Run SOLR and submit Index # start the solr server

[root@nutch example] # java-jar start.jar &

# submit Index

[root@nutch local] # bin/nutch solrindex http://192.168.1.49:8983/solr/ data/crawldb/-linkdb data/linkdb/-dir data/segments/

Go to the solr management interface to view the index information and view the schema.xml configuration file on the web page. Thank you for reading this article carefully. I hope the article "how to integrate Nutch+solr+mmseg4j" shared by the editor will be helpful to everyone. At the same time, I also hope that you will support and pay attention to the industry information channel. More related knowledge is waiting for you to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report