Lucence 04/27 Update SLTechnology News&Howtos

Lucence

2025-04-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Network Security >

Shulou(Shulou.com)06/01 Report--

Lucene is a sub-project of apache Software Foundation 4 jakarta project team, is an open source full-text search engine toolkit, but it is not a complete full-text search engine, but a full-text search engine architecture, provides a complete query engine and indexing engine, part of the text analysis engine. The purpose of Lucene is to provide an easy-to-use toolkit for software developers.

Paste this sentence to show that Lucene is just a toolkit, a search engine toolkit.

Some people may ask the difference between Lucene and solr. Solr is a search system. For example, the difference between servlet and struts2 is that Lucene is servlet,solr, just like solr,solr encapsulates Lucene.

Here's how Lucene works:

We use Lucene, but we actually use his inverted query

What is an inverted query? For instance

Xinhua Dictionary, we have all used it. Xinhua Dictionary is divided into two parts. The first part is the side head of the catalogue, and the second part is the text, a word-by-word explanation.

When we use Xinhua Dictionary, we usually look for words through the head of the side. No one looks for words in the dictionary page by page.

This is the case with the inversion of Lucene, which searches text, databases, web pages, and participles the content, like a side radical.

Emphasize again

The difference between search engine (Baidu, Google) and lucene

Search engine is an application, and lucene is a search tool class.

Name:lucene means to search for documents with the content "lucene" in the Field domain name.

Desc:lucene AND desc:java means to search for documents that include both the keyword "lucene" and "java".

It doesn't matter if you don't understand.

I'll explain the relationship between Doucment and Field next.

Here I use a piece of data in the database to explain

This piece of data is an document document

Each field is a Field domain

Is this saying suddenly enlightened?

Next, let's talk about the word splitter.

This lucene is created by foreigners. You know not to mention the support for Chinese. Not many foreigners also think of this. "I am Chinese" > > I am Chinese > > this effect is not what we want. What we want is words like "China" and "Chinese". Here I will not play charades. There are many Chinese word dividers on the market. I think the invincible existence is IK. This is a jar bag. Just import the project, saying that he is invincible because he can add his own words, such as "loser" and "rich and handsome", which can be added to the word separator to be recognized by the program.

This is the bag to be used.

After downloading ik, import these three files into the project, ext.dic is added, stop is stop word.

The previous ones are all Lucece's theories, and only the theories are understood, so the following code implementation process is easy.

/ / participle testCreateIndex () BookDao bookDao = List listBook = List documents = ArrayList Document doc = doc.add (TextField ("id", String.valueOf (bk.getId ()), Store.YES)) Doc.add (TextField ("name" doc.add) (TextField ("price" doc.add) (TextField ("pic" doc.add) (TextField ("desc" Analyzer analyzer = IndexWriterConfig config = Directory directory = FSDirectory.open (File ("H:\\ temp" IndexWriter writer = / / check serachIndex () Analyzer analyzer = QueryParser queryParser = QueryParser ("desc" Query query = queryParser.parse ("desc: Java AND lucene "Directory directory = FSDirectory.open (File (" H:\\ temp "IndexReader indexReader = IndexSearcher indexSearcher = TopDocs topDocs = indexSearcher.search (query) 10 System.out.println ("Total number of data items queried is:" + ScoreDoc [] docs = docID = Document doc = System.out.println ("docID:" + System.out.println ("bookid:" + doc.get ("id) "System.out.println (" pic: "+ doc.get (" pic "System.out.println (" name: "+ doc.get) (" name "System.out.println (" desc: "+ doc.get) (" desc "System.out.println (" price: "+ doc.get (" price "})

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.