How to implement lucene4.7 regular query 07/03 Update SLTechnology News&Howtos

How to implement lucene4.7 regular query

2025-07-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article mainly explains the "lucene4.7 regular query how to achieve", the article explains the content is simple and clear, easy to learn and understand, the following please follow the editor's ideas slowly in depth, together to study and learn "lucene4.7 regular query how to achieve" it!

Lucene built-in a lot of query API, as well as a more powerful custom query way of QueryParse, in most cases we use the built-in query API, basically can meet our needs, but if you want more flexibility to customize their own query or rewrite their own query API then you can inherit the QueryParse class to complete this work.

In some ways, the function of regular query (RegexpQuery) is similar to that of wildcard query (WildcardQuery), because they can both do the same thing, but the difference is that regular query supports more flexible customization and refinement of queries, which is different from the generalization of wildcards, and regular queries naturally support the use of powerful regular expressions to accurately match one or more term. Fields that use regular queries are preferably non-participle, because participle fields can cause boundary problems, causing the query to fail and get no results, just like WildcardQuery.

Let's first take a look at the test data of Sanxian. In order to see the impact of word segmentation and non-word segmentation on the query, Sanxian is tested with the same content. The word segmentation tool uses IK's word splitter. The screenshot is as follows:

In the figure above, Sanxian uses two fields to store the same content, one with word segmentation and one without word segmentation. The core code for using regular query is shown below:

RegexpQuery query=new RegexpQuery (new Term (field, ". *" + searchStr+ ". *"); / / System.out.println (query.toString ()); TopDocs s=search.search (query,null, 100); / / TopDocs s=search.search (bool,null, 100); System.out.println (s.totalHits) For (ScoreDoc ss:s.scoreDocs) {Document docs=search.doc (ss.doc); System.out.println ("id= >" + docs.get ("id") + "name== >" + docs.get ("bookName") + "author== >" + docs.get ("author")) / / System.out.println (docs.get (field));}

Let's test it first and make a fuzzy query on the fields without word segmentation. The code for the test is as follows:

Dao.testRegQuery ("bookName", "concurrency")

The results are as follows:

Hit data: 2id = > 2 name== > concurrent data challenges facing huge challenges author== > concurrent data challenges facing huge challenges id= > 4 name== > our concurrency quantity is not very large author== > our concurrency quantity and Qin Dongliang is not very large

We find that it is very good at completing fuzzy queries and takes less time than wildcards to query the same query conditions. Let's retrieve the fields of the participle. The test code is as follows:

Dao.testRegQuery ("author", "concurrency")

The results are as follows:

Hit data: 3ID = > 2 name== > concurrent data challenges facing huge challenges author== > concurrent data challenges facing huge challenges id= > 3 name== > the food is perfect! Author== > our concurrency number is not very large id= > 4 name== > our concurrency number is not very large and Qin Dongliang is not very large author== > our concurrency number is not very large.

We find that there is no problem with the fuzzy matching of the word segmentation field, so let's test the boundary query of the word segmentation field. The code is as follows:

Dao.testRegQuery ("bookName", "EQ"); dao.testRegQuery ("bookName", "quantity and"); System.out.println ("= contrast limit ="); dao.testRegQuery ("author", "EQ"); dao.testRegQuery ("author", "quantity and")

The results are as follows:

Hit data: 1id = > 1 name== > the quick brown fox jumps over the lazy dog author== > the quick brown fox jumps over the lazy dog hit data: 1ID = > 4 name== > our concurrency number and Qin Dongliang is not very large author== > our concurrency number is not very large = contrast limit = hit data: 0 hit data: 0

From the above results, we can find that if the field after word segmentation is divided into two term between a word, no matter how you blur the data between the two term boundaries, no matter how you blur the data between the two lucene boundaries, you can't find any results, but the fields without word segmentation can be found. This is because the fields without word segmentation are handled as a separate term to match internally. It is precisely with term as the minimum retrieval unit, so we need to pay special attention to the fact that we can retrieve the results. When implementing our business, we should design the optimal word segmentation strategy according to our own scene.

The following Sanxian wants to test the old line of regular query, using regular expression to query, the code is as follows:

Dao.testRegQuery ("bookName", "[fb] ox"); / / search using regular expressions

The results are as follows:

Hit data: 2id = > 1 name== > the quick brown fox jumps over the lazy dog author== > the quick brown fox jumps over the lazy dogid= > 5 name== > log is small box author== > log is small box Thank you for reading, above is the content of "how to achieve lucene4.7 regular query". After the study of this article, I believe you have a deeper understanding of how to achieve lucene4.7 regular query, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.