Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Programming essay-ElasticSearch knowledge Map (3): mapping

2025-03-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

1. What is mapping?

The Mapping in    ES is essentially the definition of the object structure of the document, that is, the description of the elements in the document. Defining a mapping in ES is like defining the XML Schema of an XML document.

Mappings in    ES define document schemas (just as relational schemas are defined in relational databases), and document schemas determine the format, structure, and data types of fields of documents that exist in ES. You can understand the structure of the document by looking at the mapping of an index so that you can use query language (Query DSL) to build query commands that better meet our requirements.

two。 Start with an example

  , let's first look at the following example of a document about a bank account:

{"account_number": 1, "balance": 39225, "firstname": "Amber", "lastname": "Duke", "age": 32, "gender": "M", "address": "880Holmes Lane", "employer": "Pyrami", "email": "amberduke@pyrami.com", "city": "Brogan", "state": "IL"}

   ES's automatically generated mapping of this document looks like this:

{"bank": {"mappings": {"account": {"properties": {"account_number": {"type": "long"}, "address": {"type": "text" "fields": {"keyword": {"type": "keyword", "ignore_above": 256} "age": {"type": "long"}, "balance": {"type": "long"} "city": {"type": "text", "fields": {"keyword": {"type": "keyword" "ignore_above": 256}}, "email": {"type": "text" "fields": {"keyword": {"type": "keyword", "ignore_above": 256} "employer": {"type": "text", "fields": {"keyword": {"type": "keyword" "ignore_above": 256}}, "firstname": {"type": "text" "fields": {"keyword": {"type": "keyword", "ignore_above": 256} "gender": {"type": "text", "fields": {"keyword": {"type": "keyword" "ignore_above": 256}}, "lastname": {"type": "text" "fields": {"keyword": {"type": "keyword", "ignore_above": 256} "state": {"type": "text", "fields": {"keyword": {"type": "keyword" "ignore_above": 256}}

   can be seen from this automatically generated mapping: ES automatically maps account_number, balance, and age attributes to long types, while other attributes map to text types. Properties of the text type are often used for full-text search, but are not indexed in memory, so the text type is not available for aggregation and sorting (the system will error: "Fielddata is disabled on text fields by default. Set fielddata=true on [address] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead.").

   ES allows you to define multiple fields (fields) for an object property, each of which is a facet of the property (I've been thinking about it for a long time, but I still think this word is the most appropriate). For example, the "address" attribute type is text, and a field is defined as keyword, which is of type "keyword" and is not parsed by the analyzer, but can be used for sorting, aggregation, and precise lookup (note the attribute ignore_above). Limits the number of valid characters used for keyword).

When    queries in DSL query language, when using "address", after parser analysis, "880Holmes Lane" may be decomposed into "880s", "Holmes" and "Lane" into full-text search. Look at the following two query commands:

Curl-iXGET 'localhost:9200/bank/_search?pretty'-H' Content-Type: application/json'-d' {"query": {"match": {"address.keyword": "880 Holmes Lane"}'

There is only one result from the    query, which exactly matches "880 Holmes Lane".

Curl-iXGET 'localhost:9200/bank/_search?pretty'-H' Content-Type: application/json'-d' {"query": {"match": {"address": "880 Holmes Lane"}'

   queries multiple results, the query condition "880Holmes Lane" is analyzed and retrieved by the parser, and "591Nolans Lane" is also retrieved (including a parser decomposed Lane).

   summarizes the relevant knowledge points with a picture:

3. The data type of the domain

The main data types for domains in    ES are shown in the following table, and can also be extended by some plug-ins (not discussed here):

Data type classification text, keyword string long, integer, short, byte, double, float, half_float, scaled_float numeric date date boolean Boolean binary binary integer_range, float_range, long_range, double_range, date_range interval type Array, object, nested complex data type geo_point, geo_shape geographic data type binary binary ip, completion,token_count,percolator,join, alias special data type 3.1. Core data type

   core data types are similar to those in strongly typed languages that we often use, and can be divided into the following categories:

String type: including text, keyword. Numeric type: contains long, integer, short, byte, double, float, half_float, scaled_float. It's lazy here. For each number type, see the screenshot description in Ref. 1.

Date type: date. Can be expressed as a string type, such as "2015-01-01" or "12:10:30 on 2015-01-01", or as an integer in seconds or milliseconds. Boolean type: boolean. The true value can be expressed as true (corresponding to the Boolean value in json), "true". Binary type: binary. The value is represented by an base64-encoded string. Interval type: contains integer_range, float_range, long_range, double_range, date_range.

   for numeric interval types, examples are as follows: / / Type is integer_range "expected_attendees": {"gte": 10, "lte": 20}

   for date range types, the example is as follows:

/ / Type is date_range "time_frame": {"gte": "2015-10-31 12:00:00", "lte": "2015-11-01"} 3.2. Complex data type

   complex data types can be used to express the semantics between objects, including Array, object, nested and so on.

Array type: in ES, there is no special "array" type for explicit definition, and any field can contain zero or more values, that is, any field can be an array, but unlike an array in JSON, the elements of an array in ES must be of the same type. Object type: JSON objects can be a nested structure, which is often used to express the hierarchical relationship between objects in real applications.

Let's look at the definition of the following mapping: "manager": {"properties": {"age": {"type": "integer"}, "name": {"properties": {"first": {"type": "text"} "last": {"type": "text"}}

The corresponding objects of    are:

"manager": {"age": 30, "name": {"first": "John", "last": "Smith"}}

   where the domain manager is an object type, and name is its child object. For object types, the default setting is "type" to "object", so there is no need to explicitly define "type".

   for the above object types, ES converts them to flat key such as "manager.age" and "manager.name.first" when indexing, so you can also use such a flat key as a field when querying.

Nested type: is an extension of the object type and is mainly used for object arrays. Still use the example in Ref. 1:

For object: "user": [{"first": "John", "last": "Smith"}, {"first": "Alice", "last": "White"}]

If    uses dynamic mapping, it will be indexed by ES as follows:

"user.first": ["alice", "john"], "user.last": ["smith", "white"]

An index form such as    loses the association between "first" and "last" in an object when querying.

   if you map user to the following form:

"user": {"type": "nested"}

   ES preserves the association between the object fields when indexing and finds the correct object when querying.

   cannot find any hit objects if it uses the following query (there is no such object as "Alice Smith"):

{"query": {"nested": {"path": "user", "query": {"bool": {"must": [{"match": {"user.first": "Alice"}}, {"match": {"user.last": "Smith"} 3.3. Geographic data type

   geographic data types can be used in LBS applications, including:

Geo_point type: the latitude and longitude that can be used to store a geographic coordinate. The example is as follows: / / location is geo_point type "location": {"lat": 41.12, "lon":-71.34} geo_shape type: used to store geographic polygon shapes, interested readers can refer to 1. 3.4. Special data type

   special data types include:

Ip type: used to represent the address completion type of IPv4 and IPv6: provides automatic input association completion function, such as the common baidu contraction box. Token_count type: used to calculate the length of the string token, using a definition of "analyzer". Percolate type: fields defined as percolate type are parsed as a query by ES and saved, and can be used in subsequent queries on the document. Percolate can be understood as a preset query. Alias type: defines an alias for an existing domain. Join type: this type defines the parent-child relationship between document objects (multiple layers can be defined to form a hierarchical tree), that is, multiple document objects in the same index can have dependencies, such as the relationship between blog posts and responses, questions and answers, which are common in Internet applications. See the following example in reference 1:

Define mapping fields: {"my_join_field": {"type": "join", "relations": {"question": "answer"}

   my_join_field defines the relationship between "question" and "answer" as a parent-child relationship.

   observes that for a document instance of the map, the path is "my_index/_doc/1":

{"text": "This is a question", "my_join_field": "question"}

   an example of a child document object of this document is as follows. In my_join_field, you need to define the parent's ID (here, according to the parent instance above, it is 1):

{"text": "This is an answer", "my_join_field": {"name": "answer", "parent": "1"}}

What    needs to note is that a parent document can have multiple child documents, and parent-child documents should be deployed on the same shard. Therefore, when submitting parent-child documents to ES, you should use the same routing parameters in URI.

The    join type defines parent-child dependencies between documents, which can be used in query and aggregation operations.

4. Mapping parameter

   JSON is a serialized string of JS objects. ES receives a document object in the form of a JSON string, which is essentially stored in a JS object. JS defines data types such as objects, arrays, strings, numbers, Boolean and null.

Domain data types in    ES can be regarded as extensions to JS object data types, such as join, interval types, etc. are represented as js objects.

When    defines domain mapping, ES defines the relevant mapping parameters, which are briefly enumerated and described here. For more information, please see reference 1.

Parameter description analyzer defines a parser for text data normalizer normalizes boost for text data to enhance the weight of field search coerce when it is false, the forced input value must conform to the mapped domain data type copy_to copies the value of the current domain to another domain doc_values when the domain does not participate in the sort domain aggregation operation, it can be set to false so that Doc value (document values stored in columns) is not stored on disk to save disk space. The default is truedynamic this parameter controls whether new domains detected in the object (not defined in the mapping) are joined to the domain, and when it is false or strict, the new domain is not joined to the mapping. The default is that trueenabled applies mainly to fields of type object, and when set to false, the domain is not indexed. The default is truefielddata for fields of type text, and if this parameter is set to true, the data for that domain will be loaded in memory the first time it is used. The default is falseformat to define the format of the field data. If ignore_malformed is set to true, the valid length of the string defined by the date type ignore_above will not throw an error when the field value is inconsistent with the mapping definition. The default is falseindex default to true, when set to false, the field is not indexed, not searchable null_value defines the format when the field is empty, such as using the string search_analyzer such as "NULL" to define the analyzer when searching, which can be different from the analyzer used when defining the mapping. When the value is set to true, the original value of the current domain is also stored (outside _ source). Default is false

   summed up:

Parameters related to domain data formats and constraints: normalizer,format,ignore_above,ignore_malformed,coerce index related parameters: index,dynamic,enabled storage policy related parameters: store, fielddata,doc_values parser related parameters: analyzer,search_analyzer other parameters: boost,copy_to,null_value

   's description of these parameters is mainly based on the author's understanding and may be inaccurate. In fact, these parameters are closely related to the implementation mechanism of ES (such as storage structure, index structure), and can only be realized slowly in practical application. 5. A design example

   designs an index mapping in ES in the same way as designing relational schemas in relational databases, ER models, and XML Schema in XML. It is necessary to fully include domain knowledge and meet the constraints between the data.

   in this section we explore an example of using ES to build a video image information database.

5.1. Data Model Analysis of GA/T 1400.3

   video image information database (hereinafter referred to as view library) is defined based on GA/T 1400.3 standard, which is used to store basic objects such as video and image (binary data) and attribute objects analyzed (can be automatically) by these basic objects.

   defines the interface to access the video image information database in GA/T 1400.4. These interfaces are defined in the form of restful based on HTTP and transmit data in JSON format. Therefore, using ES as the storage container of video image information database can make use of ES's JSON document object storage and.

   defines the data model of video image information database in GA/T 1400.3. In this data model, more than 30 domain objects are defined, and the objects are related to each other. The object definition in the view library mainly contains the following characteristics:

Each object in the view library has a unique ID defined for identification. There is a parent-child (or causal) relationship between objects in such a way that the child object contains the ID of the parent object. For example, the lane object includes the ID; alarm object corresponding to the bayonet, and the ID; person, vehicle and object object of the layout control object includes the corresponding source image ID and the acquisition device ID. Nested sub-objects in objects: for example, people, cars, and objects all contain sub-image list objects.

   generally speaking, the relationship between data objects in the view library model is relatively simple and relatively independent.

The properties (fields) of objects in the view library are constrained to three types: r (required), RPX O (optional condition, which must exist when a condition is met), and O (optional).

Unlike RDBMS,ES,    does not have the option to indicate whether the value is not empty when mapping the attributes of an object, so it is necessary to verify the validity of the object data before submitting the object data to ES. When designing an ES-based view library architecture, such data validity verification services (modules) can be placed between API gateways (or load balancer gateways) and ES, and whether to enable them is decided according to the efficiency and integrity requirements of the actual scenario. 5.2. Data Type Analysis of GA/T 1400.3

The    View Library specification defines the data types of object fields as follows:

Basic data types: contains strings, integers, long integers, floating-point numbers, dates and times, arrays, objects. Can correspond to the data type of ES. Extended data types: domain knowledge is defined and some constraints are made to the basic data types to form extended data types. For example, the road type (SceneType) is defined as an enumeration type represented by a string with a maximum length of 2. 5.3. Mapping Analysis of File object

   takes the File object in the view library (Appendix A.7 of GA/T 1400.3) as an example, let's take a look at how to define its mapping.

   in GA/T 1400.3, its XML Schema is defined as follows:

   an object example of a file object is shown below.

{"FileObject": {"FileID": "31000000001190000138022019021416121100001", "InfoKind": 1, "Source": "3", "FileName": "tollgate_3_lane_4_20190214161211.jpg", "StoragePath": "/ tollgate/3/lane/4/images", "FileHash": "38b8c2c1093dd0fec383a9d9ac940515", "FileFormat": "Jpeg", "Title": "tollgate_3_lane_4_20190214161211", "SecurityLevel": "3" "SubmiterName": "zhangkai", "SubmiterOrg": "pudong", "EntryTime": "20190214161214", "FileSize": 94208}}

   analyzes the property fields in the object and sorts out the following table:

Data type definition in field name standard ES corresponding type remarks FileIDstring (41) type:keyword

Doc_values:false

Ignore_above: 41 does not participate in sorting and aggregating InfoKindinttype: integer

Coerce: falseSourcestring (2) type:keyword

Ignore_above: 2FileNamestring (0.256) type:keyword

Ignore_above: 256StoragePathstring (256) type:keyword

Doc_values:false

Ignore_above: 256does not participate in sorting and aggregating FileHashstring (32) type:keyword

Doc_values:false

Ignore_above: 32 does not participate in sorting and aggregating FileFormatstring (32) type:keyword

Ignore_above: 32Titlestring 128type:keyword

Ignore_above: 128SecurityLevelString (1) type:keyword

Ignore_above: 1SubmiterNamestring (0.50) type:keyword

Ignore_above: 50SubmiterOrgstring (0.100) type:keyword

Ignore_above: 100EntryTimedateTimetype: date

Format:yyyyMMddHHmmss format is: YYYYMMDDhhmmssFileSizeinttype: integer

Coerce: false5.4. Mapping definition of File object

   We use the following command to create the index file in ES (notice that the index.mapping.coerce here is set to false):

Curl-iXPUT 'localhost:9200/file?pretty'-H "Content-type: application/json"-d' {"settings": {"number_of_shards": 3, "number_of_replicas": 1, "index.mapping.coerce": false}}'

   uses the following command to modify the mapping of the file index:

Curl-iXPUT 'localhost:9200/file/_mapping/object?pretty'-H "Content-type: application/json"-d' {"properties": {"FileObject": {"properties": {"FileID": {"type": "keyword", "doc_values": false "ignore_above": 41}, "InfoKind": {"type": "integer", "coerce": false}, "Source": {"type": "keyword" "ignore_above": 2}, "FileName": {"type": "keyword", "ignore_above": 256}, "StoragePath": {"type": "keyword" "doc_values": false, "ignore_above": 256}, "FileHash": {"type": "keyword", "doc_values": false, "ignore_above": 32} "FileFormat": {"type": "keyword", "ignore_above": 32}, "Title": {"type": "keyword", "ignore_above": 128} "SecurityLevel": {"type": "keyword", "ignore_above": 1}, "SubmiterName": {"type": "keyword", "ignore_above": 50} "SubmiterOrg": {"type": "keyword", "ignore_above": 100}, "EntryTime": {"type": "date", "format": "yyyyMMddHHmmss"} "FileSize": {"type": "integer", "coerce": false}}'

   uses the following command to view the mapping information for file:

Curl-iXGET 'localhost:9200/file/_mapping?pretty'

   can see the mapping information returned:

{"file": {"mappings": {"object": {"properties": {"properties": {"EntryTime": {"type": "date", "format": "yyyyMMddHHmmss"} "FileFormat": {"type": "keyword", "ignore_above": 32}, "FileHash": {"type": "keyword", "doc_values": false, "ignore_above": 32} "FileID": {"type": "keyword", "doc_values": false, "ignore_above": 41}, "FileName": {"type": "keyword", "ignore_above": 256} "FileSize": {"type": "integer", "coerce": false}, "InfoKind": {"type": "integer", "coerce": false} "SecurityLevel": {"type": "keyword", "ignore_above": 1}, "Source": {"type": "keyword", "ignore_above": 2} "StoragePath": {"type": "keyword", "doc_values": false, "ignore_above": 256}, "SubmiterName": {"type": "keyword", "ignore_above": 50} "SubmiterOrg": {"type": "keyword", "ignore_above": 100}, "Title": {"type": "keyword" "ignore_above": 128}}

   We can now submit data objects to the file index, using the following command:

Curl-iXPOST 'localhost:9200/file/object/31000000001190000138022019021416121100001?pretty'-H "Content-type: application/json"-d' {"FileObject": {"FileID": "31000000001190000138022019021416121100001", "InfoKind": 1, "Source": "3", "FileName": "tollgate_3_lane_4_20190214161211.jpg", "StoragePath": "/ tollgate/3/lane/4/images" "FileHash": "38b8c2c1093dd0fec383a9d9ac940515", "FileFormat": "Jpeg", "Title": "tollgate_3_lane_4_20190214161211", "SecurityLevel": "3", "SubmiterName": "zhangkai", "SubmiterOrg": "pudong", "EntryTime": "20190214161214", "FileSize": 94208} Summary

The fields of objects in the    view library can be stored in a relational database without full-text retrieval, but the corresponding fields of JSON data need to be de-serialized and parsed into the database, and multiple fields need to be serialized into JSON data when querying out of the library. Although ORM and JSON serialization middleware can be used to do the job when programming, efficiency can be affected under a large number of requests. Using ES, we can make use of the restful interface of ES and the natural characteristics of JSON storage format to meet the requirements of the specification.

   has some custom constraints in the view library specification, and these services involving data validation should be deployed before the ES is imported into the library. In this example, ES is more likely to be used as a Nosql database.

6. References https://www.elastic.co/guide/en/elasticsearch/reference/current/index.htmlClinton Gormley & Zachary Tong, Elasticsearch: The Definitive Guide,2015GA/T 1400.3 Public Security Video Image Information Application system part 3: database Technical requirements, 2017

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report