Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the process of Springboot2.x integrating ElasticSearch7.x?

2025-04-10 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

Today, I will talk to you about the process of Springboot2.x integrating ElasticSearch7.x. Many people may not know much about it. In order to make you understand better, the editor has summarized the following content for you. I hope you can get something according to this article.

Mapping detailed explanation

Mapping is one of the most important parts of the whole ES search engine. Learning to build a good index can make our search engine more efficient and save resources.

What is Mapping?

Mapping is a term in Elasticsearch. Mapping is similar to the table structure definition schema in a database. It has the following functions:

1. Define the name of the field in the index 2. Define the data type of the field, such as string, number, Boolean 3. Field, configuration of inverted index, such as setting a field to not be indexed, recording position (location), etc.

In earlier versions of ES, there could be multiple Type under an index. Since 7. 0, an index has only one Type, or a Type has a Mapping definition.

Once you know what Mapping is, let's sit down and introduce the settings of Mapping:

Maping set dynamic (dynamic Mapping)

Official website reference: https://www.elastic.co/guide/en/elasticsearch/reference/7.1/mapping.html

PUT users {"mappings": {"_ doc": {"dynamic": false}

When creating an index, dynamic can be set to false, true, or strict.

For example, a new document contains a field. When Dynamic is set to true, the document can be introduced into ES, and this field can also be indexed, that is, this field can be searched, and Mapping can be updated at the same time; when dynamic is set to false, the data of the new field is written, and the data can be indexed, but the new field is discarded; when set to strict mode, the data is written directly.

Index

There is also the index parameter, which is used to control whether the current field is indexed. The default is true. If it is set to false (some business scenarios, some fields do not want to be searched), this field cannot be searched.

# index attribute controls whether fields can be indexed by PUT user_test {"mappings": {"properties": {"firstName": {"type": "text"}, "lastName": {"type": "text"}, "mobile": {"type": "text", "index": false} index_options

Parameter index_options is used to control the contents of inverted index records. There are four configurations as follows:

Doc: record only doc id

Freqs: record doc id and term frequencies

Positions: record doc id, term frequencies, and term position

Offsets: record doc id, term frequencies, term position, and character offects

In addition, the default configuration of text type is positions, and the default of other types is doc. The more records, the more storage space is taken up.

Null_value

Null_value is mainly the processing strategy when a field encounters a null value. The default is NULL, that is, a null value. In this case, ES ignores this value by default. You can set the default value of the field by setting this value. In addition, only KeyWord type supports setting null_value.

Example

# set Null_valueDELETE usersPUT users {"mappings": {"properties": {"firstName": {"type": "text"}, "lastName": {"type": "text"}, "mobile": {"type": "keyword" "null_value": "NULL"} PUT users/_doc/1 {"firstName": "Zhang", "lastName": "Fubing", "mobile": null} PUT users/_doc/2 {"firstName": "Zhang", "lastName": "Fubing2"} # View the results There are only records with _ id 2 GET users/_search {"query": {"match": {"mobile": "NULL"}} _ all

This attribute is rarely used now and will not be explained in depth.

Reference official website: https://www.elastic.co/guide/cn/elasticsearch/guide/current/root-object.html

Copy_to

This property is used to copy the current field to the specified field.

_ all has been replaced by copy_to in version 7.x

Can be used to meet specific scenarios

Copy_to copies the field values to the target field to achieve a function similar to _ all

The target field of copy_to does not appear in _ source

DELETE usersPUT users {"mappings": {"properties": {"type": "text", "copy_to": "fullName"}, "lastName": {"type": "text", "copy_to": "fullName"} PUT users/_doc/1 {"firstName": "Li" "lastName": "Sunke"} / / No new fields GET users/_doc/1 {"_ index": "users", "_ type": "_ doc", "_ id": "1", "_ version": 1, "_ seq_no": 0, "_ primary_term": 1, "found": true, "_ source": {"firstName": "Li" "lastName": "Sunke"}} GET users/_search?q=fullName: (Li sunke)

The previous usage was:

Curl-XPUT 'localhost:9200/my_index?pretty'-H' Content-Type: application/json'-d' {"mappings": {"my_type": {"properties": {"first_name": {"type": "text", "copy_to": "full_name" # 1} "last_name": {"type": "text", "copy_to": "full_name" # 2} "full_name": {"type": "text"} 'curl-XPUT' localhost:9200/my_index/my_type/1?pretty'-H 'Content-Type: application/json'-d' {"first_name": "John" "last_name": "Smith"} 'curl-XGET' localhost:9200/my_index/_search?pretty'-H 'Content-Type: application/json'-d' {"query": {"match": {"full_name": {# 3 "query": "John Smith", "operator": "and"}'

First_name (first name) and last_name (last name) fields are copied to the full_name field

The first_name (first name) and last_name (last name) fields can still be queried separately

Full_name can be queried by first_name (first name) and last_name (last name)

Some key points:

Field values are copied, not term (generated by the parsing process).

The _ source field is not modified to display the copied value.

The same value can be copied to multiple fields, manipulated by "copy_to": ["field_1", "field_2"].

Word splitters analyzer and arch_analyzerPUT / my_index {"mappings": {"properties": {"text": {"type": "text", "fields": {"english": {"type": "text", "analyzer": "english" "search_analyzer": "english"} # use _ analyze to test the splitter GET my_index/_analyze {"field": "text", "text": "The quick Brown Foxes."} GET my_index/_analyze {"field": "text.english", "text": "The quick Brown Foxes."} build Mapping mode

We know that Mapping can automatically generate indexes from the documents we insert, but there may still be some problems. For example, the generated field type is incorrect, and the additional properties of the field do not meet our needs. This can be solved by explicit Mapping. There are two ways:

Refer to the official website api, handwritten only

Build a temporary index; write some sample data; query the dynamic Mapping definition of a temporary file through Maping API; use this configuration to create an index after modification; delete a temporary index

The second one is recommended, which is not easy to make mistakes and is efficient.

Automatic type recognition

The automatic recognition of ES type is based on JSON format. If JSON is a string and date format is entered, ES will be automatically set to Date type. When the input string is a number, ES will be treated as a string by default and can be converted to an appropriate type by setting. If you enter a Text field, ES will automatically add keyword sub-fields, and some automatic recognition is shown in the following figure:

Demo:

# write to the document and view MappingPUT mapping_test/_doc/1 {"firstName": "Chan",-- Text "lastName": "Jackie",-- Text "loginDate": "2018-07-24T10:29:48.103Z"-- Date} # Dynamic Mapping Type of inferred field PUT mapping_test/_doc/1 {"uid": "123",-- Text" isVip ": false,-- Boolean" isAdmin ":" true ",-- Text" age ": 19,-- Long" heigh ": 180-- Long} # View Dynamic MappingGET mapping_test/_mapping mapping parameters

Field definition selection in mappings:

"field": {"type": "text", / / text type "index": "false" / /, set to false Fields will not be indexed "analyzer": "ik" / / specify word splitter "boost": 1.23 false// / field-level score weighted "doc_values": false// is on by default for not_analyzed fields, analyzed fields cannot be used, and sorting and aggregation can improve performance greatly. Save memory, and if you are sure that you do not need to sort or aggregate fields, or access field values from script, you can disable the docvalue to save disk space: the default behavior of "fielddata": {"loading": "eager"} / / Elasticsearch loading memory fielddata is to delay loading. When Elasticsearch queries a field for the first time, it will fully load the inverted indexes in all Segment of the field into memory so that future queries can achieve better performance. "fields": {"keyword": {"type": "keyword", "ignore_above": 256}} / / can provide multiple index modes for a field. The value of the same field, a participle, a non-participle "ignore_above": 100 / / text with more than 100 characters will be ignored. Not indexed "include_in_all": ture// sets whether this field is included in the _ all field. The default is true, unless index is set to the no option "index_options": "docs" / / 4 optional parameters docs (index document number), freqs (document number + word frequency), positions (document number + word frequency + position, usually used for distance query). Offsets (document number + word frequency + position + offset, usually used in highlighted fields) the default word segmentation field is position, and the other defaults are docs "norms": {"enable": true, "loading": "lazy"} / / word segmentation field default configuration, no word segmentation field: default {"enable": false}, storage length factor and index boost, recommended for the need to participate in the score field Additional memory consumption "null_value": "NULL" / / set the initialization values of some missing fields. Only string can be used, and the null value of the participle field will also be participle "position_increament_gap": 0 stroke / influence distance query or approximate query. It can be set on the data fire participle field of the multi-valued field, and the slop interval can be specified when querying. Default is 100 "store": whether false// sets whether this field is stored separately and separated from the _ source field. The default is false, which can only be searched, but cannot get the value "search_analyzer": "ik" / / sets the word splitter when searching, which is the same as ananlyzer by default. For example, standard+ngram is used when index is used. Search with standard to complete the automatic prompt function "similarity": "BM25" / / default is TF/IDF algorithm, specify a field scoring strategy, only valid for string and word segmentation type "term_vector": "no" / / default does not store vector information, support parameters yes (term storage), with_positions (term+ location), with_offsets (term+ offset) With_positions_offsets (term+ position + offset) can improve the performance of fast highlighting fast vector highlighter, but opening it will increase the index volume, so it is not suitable for large amount of data.

To sum up:

Parameters related to domain data formats and constraints: normalizer,format,ignore_above,ignore_malformed,coerce

Parameter related to the index: index,dynamic,enabled

Parameters related to storage policy: store, fielddata,doc_values

Parser related parameters: analyzer,search_analyzer

Other parameters: boost,copy_to,null_value

The description of these parameters is mainly based on the author's understanding and may be inaccurate. In fact, these parameters are closely related to the implementation mechanism of ES (such as storage structure, index structure), and can only be realized slowly in practical application.

Field data type

The ES field type is similar to the field type in MySQL. The main ES field types are: core type, complex type, geographical type and special type. The specific data types are shown below:

Core type

From the figure, you can see that core types can be divided into string types, numeric types, date types, Boolean types, BASE64-based binary types, and range types.

String type

Among them, there are two string types in ES 7.x: text and keyword, and the string type is no longer supported after ES 5.x.

Text type is suitable for fields that need to be searched in full text, such as news body and email content. Text type will be processed as word items by Lucene word Segmentation (Analyzer) and stored using Lucene inverted index. Text field cannot be used for sorting. If you need to use this type of field, you only need to specify the type of the corresponding field in JSON as text when defining the mapping.

Keyword is suitable for short, structured strings, such as hostnames, names, trade names, etc., and can be used for filtering, sorting, aggregate retrieval, or precise queries.

Numeric type

The number type is divided into long, integer, short, byte, double, float, half_float, scaled_float.

Fields of numeric types should choose a narrow range of data types as far as possible to meet the requirements. The shorter the length of the field, the higher the search efficiency. For floating-point numbers, you can give priority to using the scaled_float type, which can be accurately floating-point by scaling factors. For example, 12.34 can be converted to 1234 for storage.

Date Typ

In ES, the date can be in the following form:

Formatted date strings, such as 2020-03-17 00:00, 2020-03-17 timestamp (and the difference between 1970-01-01 00:00:00 UTC), milliseconds or seconds, even if it is a formatted date string, the underlying ES is still stored in the form of a timestamp.

Boolean type

Boolean types also exist in JSON documents, but JSON string types can also be converted by ES to Boolean types, as long as the string value is true or false, and Boolean types are often used for filtering conditions in retrieval.

Binary type

The binary type binary accepts BASE64-encoded strings, the default store property is false, and cannot be searched.

Range Typ

Range types can be used to express an interval of data, which can be divided into five categories: integer_range, float_range, long_range, double_range, and date_range.

Complex type

Compound types are mainly object types (object) and nested types (nested):

Object Typ

JSON strings allow you to nest objects, and a document can nest multiple, multi-tier objects. Secondary documents can be stored by object type, but because Lucene does not have the concept of internal objects, ES flattens the original JSON document, such as the document:

{"name": {"first": "wu", "last": "px"}}

ES actually converts it to the following format and stores it through Lucene, even if name is of type object:

{"name.first": "wu", "name.last": "px"} nested type

A nested type can be thought of as a special object type that allows an array of objects to be retrieved independently, such as documents:

{"group": "users", "username": [{"first": "wu", "last": "px"}, {"first": "hu", "last": "xy"}, {"first": "wu", "last": "mx"}]}

The username field is an JSON array, and each array object is a JSON object. If username is set to the object type, ES converts it to:

{"group": "users", "username.first": ["wu", "hu", "wu"], "username.last": ["px", "xy", "mx"]}

You can see that the association between first and last in the converted JSON document is lost. If you try to search a document whose first is wu,last for xy, then you will successfully retrieve the above documents, but wu and xy do not belong to the same JSON object in the original JSON document, so they should not match, that is, you can not retrieve any results.

The nested type is to solve this problem. The nested type stores each JSON object in the array as an independent hidden document, and each nested object can be searched independently, so although there is only one document on the surface, in the above case, it actually stores four documents.

Geographical type

Geographical type fields are divided into two types: latitude and longitude type and geographic area type:

Latitude and longitude type

Latitude and longitude type field (geo_point) can store longitude and latitude related information. Through the field of geographic type, it can be used to achieve requirements such as finding relevant documents in a specified geographic area, sorting according to distance, modifying scoring rules according to geographical location, and so on.

Geographical area type

Latitude and longitude type can express a point, while geo_shape type can express a geographical area. The shape of the region can be arbitrary polygon, or it can be point, line, surface, multi-point, multi-line, multi-facet and other geometric types.

Special type

Special types include IP type, filter type, Join type, alias type and so on. Here we briefly introduce IP type and Join type. Other special types can view the official documents.

IP Typ

Fields of type IP can be used to store IPv4 or IPv6 addresses. If you need to store fields of type IP, you need to define the mapping manually:

{"mappings": {"properties": {"my_ip": {"type": "ip"} Join type

The Join type is introduced by ES 6.x to replace the obsolete _ parent meta-field and is used to implement one-to-one and one-to-many relationships in documents, mainly for parent-child queries.

The Mapping of type Join is as follows:

PUT my_index {"mappings": {"properties": {"my_join_field": {"type": "join", "relations": {"question": "answer"}

Where my_join_field is the name of the field of type Join; relations specifies the relationship: question is the parent class of answer.

For example, define a parent document with an ID of 1:

PUT my_join_index/1?refresh {"text": "This is a question", "my_join_field": "question"}

Next, define a child document that specifies that the parent document ID is 1:

PUT my_join_index/_doc/2?routing=1&refresh {"text": "This is an answer", "my_join_field": {"name": "answer", "parent": "1"}}

Join reference: https://www.elastic.co/guide/en/elasticsearch/reference/current/parent-join.html

After reading the above, do you have any further understanding of Springboot2.x 's process of integrating ElasticSearch7.x? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report