Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the rules of use based on ElasticSearch Analyzer?

2025-03-31 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)05/31 Report--

In this article, the editor introduces in detail "what are the rules for the use of ElasticSearch Analyzer", the content is detailed, the steps are clear, and the details are handled properly. I hope this article "what are the rules for the use of ElasticSearch Analyzer" can help you solve your doubts.

Rules for using analyzer

A query can only find items that actually exist in an inverted index table, so it is important to ensure that the document applies the same analysis process as the query string in the search, so that the items in the query can match the items in the inverted index.

Although you are talking about a document, the parser can be determined by each field. Each field can have a different parser, either by configuring the parser for the field, or by using the default configuration of a higher-level type (type), index (index), or node (node). When indexing, a field value is analyzed based on configuration or default parser.

For example, add a field for my_index:

PUT / my_index/_mapping/my_type {"my_type": {"properties": {"english_title": {"type": "string", "analyzer": "english"}

Now we can analyze the word Foxes by using analyze API, and then compare the analysis results of the english_title field and the title field when indexing:

GET / my_index/_analyze {"field": "my_type.title", "text": "Foxes"} GET / my_index/_analyze {"field": "my_type.english_title", "text": "Foxes"}

The field title, using the default standard standard parser, returns the entry foxes.

The field english_title, using the english English parser, returns the entry fox.

This means that if you use the underlying term query precision item fox, the english_title field will match but the title field will not.

High-level queries such as match queries know the relationship of field mappings and can apply the correct parser for each field being queried. You can use validate-query API to view this behavior:

GET / my_index/my_type/_validate/query?explain {"query": {"bool": {"should": [{"match": {"title": "Foxes"}}, {"match": {"english_title": "Foxes"}]}

Returns the explanation result of the statement:

(title:foxes english_title:fox)

The match query uses the appropriate parser for each field to ensure that it uses the correct format for each field when looking for each item.

Default analyzer

Although we can specify a parser at the field level, if no parser is specified at that level, how can we determine which parser is used for this field?

Parsers can be defined at three levels: by field (per-field), by index (per-index), or by global default (global default). Elasticsearch processes in the following order until it finds a parser that can be used. The order of indexing is as follows:

The analyzer defined in the field mapping, otherwise

A parser named default in the index settings, which defaults to

Standard standard analyzer

When searching, the order is slightly different:

Query the self-defined analyzer, otherwise

The analyzer defined in the field mapping, otherwise

A parser named default in the index settings, which defaults to

Standard standard analyzer

Sometimes it makes sense to use a different parser when indexing and searching. We might want to index synonyms (for example, wherever quick occurs, as well as indexes for fast, rapid, and speedy). But when searching, we don't need to search for all the synonyms. Instead, we need to find out whether the word entered by the user is quick, fast, rapid or speedy.

To distinguish, Elasticsearch also supports an optional search_analyzer mapping, which is only applied when searching (when analyzer is also used for indexing). There is also an equivalent default_search mapping that specifies the default configuration of the index layer.

If these extra parameters are taken into account, the complete order of a search would be as follows:

Query the self-defined analyzer, otherwise

The search_analyzer defined in the field mapping, otherwise

The analyzer defined in the field mapping, otherwise

A parser named default_search in the index settings, which defaults to

A parser named default in the index settings, which defaults to

Standard standard analyzer

Elasticsearch participle (Analyzer) I. what is Analysis?

Analysis, called word segmentation, is the process of converting text into a series of words (term/token).

Analysis is implemented through Analyzer.

You can use Elasticserach's built-in profiler or optimize sub-requirements on demand or install parser plug-ins.

The same parser is required when converting entries when data is written and when querying Query statements.

II. The composition and working mechanism of Analyzer

Character Filter is targeted at raw text processing, such as removing html.

Tokenizer is divided into words according to the rules. Tokenizer Filter processes the segmented words, lowercase, deletes stopwords, and adds synonyms.

3. Some built-in word splitters in Elasticserach

1) three ways to use _ analyzer API

2) Standard Analyzer

Principle

Example

3) Simple Analyzer

Principle

Example

4) Whitespace Analyzer

Principle

Example

5) Stop Analyzer

Principle

Example

6) Keyword Analyzer

Principle

Example

7) Pattern Analyzer

Principle

Example

8) Language Analyzer

Support word segmentation by language

Example

IV. Chinese word segmentation

Difficulties in Chinese word segmentation:

Chinese sentences, cut into a word (not a word). In English, words are separated by natural spaces. A Chinese sentence has different understandings in different contexts.

1) ICU Analyzer

Principle

Demo (need to install the ICU Analyze plug-in in advance)

2) IK

3) THULAC

After reading this, the article "what are the rules for the use of ElasticSearch Analyzer" has been introduced. If you want to master the knowledge points of this article, you still need to practice and use it yourself. If you want to know more about related articles, welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report