In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly introduces "introduction of elasticsearch 5.x data types and mapping". In daily operation, I believe many people have doubts about the introduction of elasticsearch 5.x data types and mapping. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful to answer the doubts of "introduction to elasticsearch 5.x data types and mapping". Next, please follow the editor to study!
In the previous article, we created a document structure with an index of bank, but when we created 1000 documents, we did not indicate the data type of each of their attributes. Creating a document without a data type mapping definition is allowed in ES because ES helps us map data types automatically. However, in our project, we must first define the data type of the document, and then manipulate the document, because we need to specify the properties of the data according to the needs of the business, such as whether a full-text index is needed, whether a word segmentation is needed, and what the word splitter is.
Let's take a look at what the index bank automatically creates the document data type mapping to?
{"bank": {"mappings": {"account": {"properties": {"account_number": {"type": "long"}, "address": {"type": "text" "fields": {"keyword": {"type": "keyword", "ignore_above": 256}, "age": {"type": "long"} "balance": {"type": "long"}, "city": {"type": "text", "fields": {"keyword": {"type": "keyword" "ignore_above": 256}, "email": {"type": "text", "fields": {"keyword": {"type": "keyword" "ignore_above": 256}, "employer": {"type": "text", "fields": {"keyword": {"type": "keyword" "ignore_above": 256}, "firstname": {"type": "text", "fields": {"keyword": {"type": "keyword" "ignore_above": 256}, "gender": {"type": "text", "fields": {"keyword": {"type": "keyword" "ignore_above": 256}, "lastname": {"type": "text", "fields": {"keyword": {"type": "keyword" "ignore_above": 256}, "state": {"type": "text", "fields": {"keyword": {"type": "keyword" "ignore_above": 256}}
The above specifies the type of data through type. Next, we will begin to recognize these data types
I. data type
Overview of Field Typ
First-level classification, second-level classification, specific type, core type, string type, text,keyword integer type, integer,long,short,byte floating point type, double,float,half_float Scaled_float logical type boolean date type date range type range binary type binary compound type array type array object type object nested type nested geographic type geographic coordinate type geo_point geographic map geo_shape special type IP type ip range type completion token count type token_count attachment type attachment extraction type percolator1, string type
Text type: when a field is to be searched full-text, such as Email content, product description, the text type should be used. When the text type is set, the field contents are parsed, and the string is divided into a term by the parser before the inverted index is generated. Fields of type text are not used for sorting and are rarely used for aggregation.
The keyword:keyword type is suitable for indexed structured fields, such as email addresses, hostnames, status codes, and labels. If the fields need to be filtered (such as finding articles in published blogs whose status attribute is published), sorting, aggregating. Fields of type keyword can only be searched by exact values.
2. The value range of integer type is byte-128~127short-32768~32767integer-231~231-1short-263~263-1
Select a small range of data types as far as possible to meet the requirements. For example, if the maximum value of a field will not exceed 100, you can choose the byte type. The maximum human age recorded by Guinness so far is 134. for the age field, short is sufficient. The shorter the length of the field, the more efficient the index and search.
3. Floating-point type value range doule64 bit double-precision IEEE 754 floating-point type float 32-bit single-precision IEEE 754 floating-point type half_float16 bit semi-precision IEEE 754 floating-point type scaled_float scale floating-point number
For float, half_float, and scaled_float,-0.0 and + 0.0 are different values, using the term query to find-0.0 does not match + 0.0. similarly, the upper boundary of the range query does not match + 0.0, and the lower boundary of + 0.0 does not match-0.0.
Among them, scaled_float, for example, the price only needs to be accurate to the point. The scaling factor of the field with a price of 57.34 is 100, and the saved value is 5734.
Priority is given to the use of scaled_float floating-point types with scaling factors.
4. Date type
The date type representation format can be as follows:
(1) A string in date format, such as "2018-01-13" or "2018-01-13 12:10:30"
(2) milliseconds of long type (milliseconds-since-the-epoch,epoch refers to the UTC time at which UNIX was born at 00:00:00 on January 1, 1970)
(3) seconds of integer (seconds-since-the-epoch)
5. Boolean type true and false6, binary type
Binary field refers to the use of base64 to represent the binary data stored in the index, which can be used to store binary data, such as images. By default, fields of this type are only stored and not indexed. Only index_name attributes are supported for binary types.
7. Array type
(1) character array: ["one", "two"]
(2) Integer array: productid: [1,2]
(3) Array of objects (documents): "user": [{"name": "Mary", "age": 12}, {"name": "John", "age": 10}]
Note: lasticSearch does not support elements of multiple data types: [10, "some string"]
8. Object type
JSON objects, the document will contain nested objects
9. Ip type
Fields of type p are used to store the address of IPv4 or IPv6
II. Mapping supports attributes
Some important mapping attributes are explained below
Index
To set whether this field can be queried is to decide whether or not to put the field in the inverted index.
If index is set to true (default is true), this field will be put into the inverted index. If it is text, it will be put into the index after participle. If it is keyword or integer..., it will be put into the index. Just put the whole paragraph in the index.
If index is set to false, the field is not placed in the inverted index, so the field cannot be queried (because it does not exist in the inverted index)
Usually this field, which is set to false, can be imagined as belonging to an attached field, which cannot be queried by match or term, but when the document is found by other search criteria, it can be found together because they belong to the same document.
For example, define the mapping of user indexes
PUT / user {"mappings": {"doc": {"properties": {"name": {"type": "keyword" "index": false}, "uid": {"type": "integer"} "nickname": {"type": "text", "analyzer": "standard"}
When searching for uid, name will be found together.
/ user/_search {"query": {"term": {"uid": 1}
If you search for the name property, an error will be reported
GET / user/_search {"query": {"term": {"name": "hugo"}
The error message is as follows
"caused_by": {"type": "illegal_argument_exception", "reason": "Cannot search on field [name] since it is not indexed."} analyzer
It is mainly used in text type fields, that is, to set which word splitter to use to build the index.
You can use a built-in word splitter or a custom word splitter
You can use the / _ analyze test and analysis tool to understand what the sentence segmentation looks like, which can help us understand what is going on inside the Elasticsearch index.
GET 127.0.0.1:9200/_analyze {"analyzer": "standard", "text": "Text to analyze"} boost
Official suggestion: index time boost is deprecated. Instead, the field mapping boost is applied at query time.
In other words, it is officially recommended to specify boost when querying.
We can control the relative weight of each query clause by specifying a boost value, which defaults to 1. A boost greater than 1 increases the relative weight of the query clause. The boost parameter is used to increase the relative weight of a clause (when boost is greater than 1) or to decrease the relative weight (when boost is between 0 and 1), but the increase or decrease is not linear. In other words, setting boost to 2 does not double the final _ score.
POST / bank/_search?pretty {"query": {"match": {"address": {"query": "mill", "boost": 2}
Query result
{"took": 1, "timed_out": false, "_ shards": {"total": 5, "successful": 5, "skipped": 0, "failed": 0}, "hits": {"total": 4, "max_score": 8.620199, "hits": [{"_ index": "bank", "_ type": "account" "_ id": "472", "_ score": 8.620199, "_ source": {"account_number": 472, "balance": 25571, "firstname": "Lee", "lastname": "Long", "age": 32, "gender": "F", "address": "288 Mill Street" "employer": "Comverges", "email": "leelong@comverges.com", "city": "Movico", "state": "MT"}}, {"_ index": "bank", "_ type": "account", "_ id": "136", "_ score": 8.532413 "_ source": {"account_number": 136, "balance": 45801, "firstname": "Winnie", "lastname": "Holland", "age": 38, "gender": "M", "address": "198Mill Lane", "employer": "Neteria" "email": "winnieholland@neteria.com", "city": "Urie", "state": "IL"}}, {"_ index": "bank", "_ type": "account", "_ id": "970", "_ score": 7.723722 "_ source": {"account_number": 970, "balance": 19648, "firstname": "Forbes", "lastname": "Wallace", "age": 28, "gender": "M", "address": "990 Mill Road", "employer": "Pheast" "email": "forbeswallace@pheast.com", "city": "Lopezo", "state": "AK"}}, {"_ index": "bank", "_ type": "account", "_ id": "345", "_ score": 7.723722 "_ source": {"account_number": 345, "balance": 9812, "firstname": "Parker", "lastname": "Hines", "age": 38, "gender": "M", "address": "715 Mill Avenue", "employer": "Baluba" "email": "parkerhines@baluba.com", "city": "Blackgum", "state": "KY"}]}} 3. Update the mapping
When you create an index for the first time, you can specify a type mapping, but suppose you want to add a new mapping field later, you can use / _ mapping to add the new field to the mapping mapping
A new mapping can be added, but the existing mapping cannot be modified because the mapping may be used by documentation. If you change the type of mapping, it may lead to an error in the data of the index, so you can only add new fields and cannot be modified.
Concrete examples
Add a new keyword named tag to the doc type in the user map
PUT / user/_mapping/doc {"properties": {"tag": {"type": "keyword",} this is the end of the study on "introduction to elasticsearch 5.x data types and mapping", hoping to solve everyone's doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.