In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly explains "what is the principle of Lucene full-text retrieval". The content of the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "what is the principle of Lucene full-text retrieval".
Among the data we are dealing with, there are three types of data:
Structured data: data with a fixed format or limited length, such as data in our database
Unstructured data: data with no fixed format and no fixed length, such as the text content on our web
Semi-structured data: such as Json, XML data.
So how do we deal with these different types of data?
For structured data in the database, use SQL statements to query
For unstructured data, we scan sequentially and retrieve full text.
Among them, sequential scanning is scanning from the beginning of the data to the last piece of data. Obviously, this is a great waste of time and performance.
So what is full-text search?
This is what Lucene is going to do. Let's first look at a picture to describe its role in the entire system:
For the application part of the upper layer of lucen, we can see that the application phone has structured, semi-structured and unstructured data, which is indexed by lucene; another application is retrieval, in which users search our index database by entering keywords of search conditions, and then return the results to users.
So what is an index?
Just like Pinyin search and radical Index in Xinhua Dictionary are used to look up words.
The same is true in lucene, where full-text search refers to the documents in which a word has appeared. For example:
In the image above, the keyword "lucene" appears in articles 1 and 3. The keyword "Solr" appears in articles 1, 3 and 5. The keyword "hadoop" appears in articles 3, 5, 7, 8 and 9.
Here we call the whole process "reverse indexing". The document list of each keyword on the right is called an inverted list.
What is a reverse index?
Reverse indexing: this string-to-file mapping is a reverse process of file-to-string mapping. In fact, it describes a mapping relationship.
Create an index
Okay. So what are the steps for creating a full-text search?
Here we divide the creation of a full-text search into three steps, or three things you need:
Data to be retrieved (Document)
Word Segmentation Technology (Analyzer)
Index creation (Indexer)
Let's give an example.
The first step, Document data instance
My blog space
HappyBKs's Lucene article
HappBKs's Hadoop article
The second step is the word segmentation technology. We use standard participle here. )
I | Yes | blog | customer | empty | interval
Happybks | of | lucene | text | Chapter
Happbks | of | hadoop | text | Chapter
Note that after standard word segmentation, Chinese is segmented by word, and English uppercase characters are converted to lowercase.
The third step is index creation.
Term
IDTermIDTermID my 1happybks2happbks3's 1, 2, 3, 1lucene2hadoop3, 1, 2, 3, empty, 1 chapter, 2 chapters, 3 rooms.
We merge the indexes.
TermIDTermIDTermID I 1happyybks2jue 3
1, 2, 2, 3
Blog 1lucene2hadoop3, one article, two, three.
Empty 1 chapter 2pence 3
Interval 1
This table is what we call an index.
Now, let's look at how to use the index for retrieval.
Index retrieval
There are four steps:
Search keywords (keywords)
Word Segmentation Technology (Analyzer)
Retrieval Index (Search)
Return the result
Let's put it in an example to sort out the steps.
The first step is to get the keywords searched by the user.
Lucene article
The second step is to adopt the technology of word segmentation
Lucene | text | Chapter
The third step, retrieve the index.
From the figure above, we can see that in the inverted table, the document containing all the participle units of keywords is document 2.
Thank you for your reading, the above is the content of "what is the principle of Lucene full-text retrieval". After the study of this article, I believe you have a deeper understanding of what the principle of Lucene full-text retrieval is, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.