What about the bug of mmseg4j-1.9 solr4? 02/11 Update SLTechnology News&Howtos

What about the bug of mmseg4j-1.9 solr4?

2026-02-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article mainly explains "how to deal with the bug of mmseg4j-1.9 solr4". The content of the explanation is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "how to deal with the bug of mmseg4j-1.9 solr4".

At present, Chinese word segmentation mmseg4j can not work properly under solr4.

The solution is simple, except that the solr4 interface has changed a little.

The author of the Chinese word segmentation mmseg4 plug-in did not keep up with the changes in the "solr4 interface" in time. Although the word segmentation algorithm is correct, the added document cannot be indexed.

The source code 80m is unreadable. Guess in the source code to find the reason why you can't create a new index, it's hard to find it, but you just "happen" to find it.

Bug description:

(1) java.lang.RuntimeException: java.lang.NoSuchMethodError: org.apache.l

Ucene.analysis.Tokenizer.reset

Error message:

Http://code.google.com/p/mmseg4j/issues/detail?id=31, I encountered such an error in the word segmentation test.

Solution:

The setReader in this file is provided by the new version of solr4. The old interface reset has expired.

(2)

Description that cannot be indexed: http://code.google.com/p/mmseg4j/issues/detail?id=38

Reason: MMSegTokenizer is still based on the previous version of the solr interface.

MMSegTokenizer is cached in solr, and both it and the thesaurus are cached at startup. When there is a new phrase to be segmented later, the MMSegTokenizer.reset method is called to pass the new word in to MMSegTokenizer. But in the new version of solr4, instead of calling the reset method (that is, the reset method shown in the figure above), it calls setReader so that the object mmSeg of the actual MMSegTokenizer participle does not get new data. So I added the following hack code so that mmSeg can get the new data.

Solution:

Find the MMSegTokenizer.java file and open the content in the box above is my new addition. Find the mmSeg object yourself and fill in 0 with a default value of the ReaderStatus property.

Then compile the package. And put it in the solr. Restart tomcat and it will work.

Thank you for your reading, the above is the content of "how to deal with the bug of mmseg4j-1.9 solr4", after the study of this article, I believe you have a deeper understanding of how to deal with the bug of mmseg4j-1.9 solr4, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.