What are the common API for Lucene4.7 indexing and retrieval 04/09 Update SLTechnology News&Howtos

What are the common API for Lucene4.7 indexing and retrieval

2025-04-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article introduces the relevant knowledge of "what are the common API of Lucene4.7 indexing and retrieval". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

In order to facilitate comparative learning, the tabular data are given below.

API component used during indexing API component IndexWriterIndexReaderIndexWriterConfigIndexSearcherDirectoryDirectoryAnalyzerQueryParser or query subclass DocumentTopDocsFieldScoreDoc--Term used during retrieval

Below, the author will analyze the components of the above picture one by one.

The home page will introduce each class during the index period.

The first IndexWriter is the core class in the indexing process, which is mainly responsible for creating the index or opening the existing index.

Add, delete, modify, etc.

2 API IndexWriterConfig does not have this configuration class in the lower version of Lucene, and this class is also more important. to use this class, you need to pass two parameters in its constructor, the first parameter is the current version number of Lucene, and the second is the word splitter used by the index. in addition to this most commonly used function, it also provides a large number of tool methods, such as setting the buffer size in memory. Set the size of the bulk submission of document data, get the thread status, set the creation mode, and whether to open the composite index, and so on, you can do some basic configuration optimization and other information on the index.

3Gravity Directory, which represents the storage location of Lucene index, is an abstract class. It has a series of subclasses that can be used to deal with indexes. Using different subclasses will have a great impact on the performance of the system, but in essence, improving performance is nothing more than trading space for time or taking time or space 2. When we use it, we can use its subclass to obtain the storage path where the index is located. Then pass it to the IndexWriter class constructor.

4 Lucene class is also the base class of all parsers. Before indexing, text files need to be processed by the parser and processed into corresponding lexical units with a unified format. It can extract valid information and filter out some banned words. Lucene comes with several parsers, but most of them are processed for English or European languages. If you want to use a Chinese word splitter, you can use its own SmartCN word splitter. You can also use open source IK,messeg4j and so on. Choosing what kind of analyzer is a very important step in the indexing process depends on your own business requirements.

5Jing document represents the meaning of a document, similar to a row of records in the database, we can add the domain fields we want to the document, and then index the documents one by one to provide retrieval.

6 Field Field is the field stored in the document, each field has a domain name and domain value, which is similar to the database field name and value, we can use Field to accurately control the value of each field, the most commonly used are 2 fields, one is the StringField that does not provide participle and the TextFiled of another participle, and of course there are some other Field, which will not be introduced here.

7 Directory IndexReader this class is used to get the index file stream opened by a subclass of Directory, and then initialize the query component in the constructor of IndexSearcher. This class does not exist in the lower version of IndexReader and is added in the later new version.

This class is the core class during the program search, is the bridge to connect the index, it is a read-only way to open the index, providing a large number of retrieval, sorting, filtering, and other functions.

9 API Parser or Query can complete some retrieval functions, but the difference is that QueryParser provides more powerful functions to facilitate custom development of some retrieval schemes, while Query and its series of subclasses are some of the Parser in Lucene. Using these Parser, you can complete some basic retrieval in most cases. If you need to customize your own retrieval scheme, you need to use QueryParser, in most cases. What we use most often is the TermQuery subclass under Query, and of course there are a large number of other feature-specific Query subclasses.

10Jie TopDocs is a simple container pointer that generally records the results of the first N searches. In TopDocs, it only stores the docid of this document and the scores obtained. In addition, the first N results, by default, are sorted according to the size of the scores.

11J ScoreDoc class usually we use is an array, it will only contain the docid of this document and the score obtained, unlike TopDocs, we can use this class for database-like paging operation, of course, you have to make sure you have enough memory, if it is a huge amount of data paging, this operation can easily cause memory overflow, then we need to consider other methods.

The 12recom term class is the most basic unit of search function. Similar to Field, it needs to pass in the domain name and the retrieved string when searching. It is a small and indispensable simplified class.

This is the end of the content of "what are the common API for Lucene4.7 indexing and retrieval". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.