What is the analysis and principle of ElasticSearch? 04/28 Update SLTechnology News&Howtos

What is the analysis and principle of ElasticSearch?

2025-04-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

Today, I will talk to you about the analysis and principle of ElasticSearch, which may not be well understood by many people. in order to make you understand better, the editor has summarized the following contents for you. I hope you can get something according to this article.

1. Introduction to ElasticSearch Cluster

a. What is ElasticSearch?

1. Concept:

Index (index): is where ElasticSearch stores data

Document: is the main entity stored in ElasticSearch

Document types: document types can distinguish between different objects

Nodes and clusters: ElasticSearch supports running on multiple servers that work together

Sharding: when the computing power or hardware limitation of a node is insufficient, the data can be sliced. Each part is a separate Apache Lucene index, called shard.

Replica: in order to improve query throughput or achieve high availability, you can enable a fragmented copy, which is an exact copy of the original shard

two。 Status View:

Http://localhost:9200/

Http://localhost:9200/_cluster/health?pretty

3. Operation: manipulate data through REST, GET, POST, PUT, DELETE

Second, search data

a. The process of query and indexing

1. Indexing process: the process of preparing a document that is weighed to ES and storing it in the index

two。 Search process: the process of matching documents that meet the query criteria

3. Analysis process: the process of preparing the contents of a field and converting it into a term that can be written into a Lucene index

Lexicalization: the input text is converted into an entry stream by the word splitter.

Filtering: several filters process entries in a stream of entries

4. Parser: a word splitter with zero or more filters

b. Query ElasticSearch

1. Enclose multiple simple queries as a JSON format object and send them to ElasticSearch, called query DSL

two。 Syntax:

Curl-XGET 'localhost:9200/library/book/_search?q=title:crime&pretty=true'

Curl-XGET 'localhost:9200/library/book/_search?pretty=true'-d' {"query": {"term": {"title": "crime"}}'

Curl-XGET 'localhost:9200/library/book/_search?pretty=true'-d @ query.json

c. Basic query

1.term: matches a document with an entry in a given field

2.terms: matches documents containing certain lexical items

3.match: extract hard-to-write values in parameters, analyze these values, and build appropriate queries based on them

4.multi_match: similar to match, except that fields configuration can act on multiple fields

5.query_string: supports all query syntax of Apache Lucene

A simplified version of the 6.field:query_string query

7.ids: filter the returned document to get only the document containing the specified identifier, which is used in the _ uid field

8.prefix: find a document where a field begins with a given prefix

9.fuzzy_like_this: query all documents that are similar to a given content, based on fuzzy strings, and select the best distinguishing terms generated by them

10.fuzzy_like_this_field: similar to fuzzy_like_this, except that it only acts on a single field and does not support the fields attribute.

11.fuzzy: the third kind of fuzzy query obtains the result by calculating the editing distance between the given term and the document, which consumes CPU resources and is useful for scenarios that need fuzzy matching.

12.match_all: a query that matches all documents in the index

13.wildcard: allows us to use the characters * and? in the content we want to query, which is very similar to term in the query body, with poor performance

14.more_like_this: wait for a document that is similar to the text provided

15.more_like_this_field: similar to more_like_this, except that it only acts on half a single field and does not support the fields attribute.

16.range: you can find documents on numeric and string fields within a certain range, acting only on a single field, and the query parameters are encapsulated in the field name.

d. Filter query results

1. Add the filter field under the query property to use the filter in any search

2.range: limit the search to documents whose field values are within a given limit

3.exists: select only documents with specified fields

4.missing: in contrast to exists, you can also specify which values are treated as null values

5.script: filter the document using a calculated value

6.type: returns all documents of the specified type

7.limit: limits the number of documents returned for each shard of a given query

8.ids: suitable for scenarios where some specific documents need to be filtered

9.bool, and, or and not can be combined with filters

10. Use "_ name" to name the filter

e. Compound query

1.bool:should can match or not match, must must match, must_no must not match

2.boosting: encapsulates two queries together and reduces the score of the document returned by one of the queries

3.constant_score: used to encapsulate another query (filter). Each document returned by a closed query (filter) gets a constant score, allowing us to strictly control the score assigned to each document matched by the query or filter.

4.indices: useful when you need to execute a query on multiple indexes

5.custom_filters_score: allows us to encapsulate a query and several filters

6.custom_boost_factor: allows us to encapsulate another query and multiply the score of the document returned by that query by a specified factor

7.custom_score: customize the score for another query through script

f. Data sorting

1. "sort": [{"_ score": "desc"}], default is the one with the highest score

g. Use script

1.script: contains script code; lang: indicates the language used by the script; default mvel;params: object containing parameters

two。 Available objects: doc, which accesses the current document found based on calculated scores or field values; _ source, which accesses the source of the current document and the values defined in it; and _ fields, which accesses the values of fields in the document

III. Extended structure and search

1. Turn off dynamic mapping: dynamic:false

two。 Spatial index: geo_point

Fourth, search optimization

The weight of 1.boost affects the sorting result

two。 Synonym filter synonym

3. Span query: span_term, span_first, span_near, span_or, span_not, which refers to the location of the entry that begins and ends in a field

V. combined indexing, analysis and search

1. Parent-child mapping: _ parent

two。 Get data from other systems: river

VI. Outside of search

1. Statistics: query statistics, filter statistics, terms statistics, range statistics, histogram statistics, statistical statistics, terms_stats statistics, geo_distance statistics

two。 Similar

3. Reverse check

7. Manage clusters

a. Monitor cluster status and health status

1. Health status: curl http://localhost:9200/_cluster/health?pretty

two。 Index statistics: curl http://localhost:9200/library/_stats?pretty

b. Instance and cluster status diagnosis tool

1.Bigdesk plug-in

2.elasticsearch-head plug-in

3.elasticsearch-paramedic plug-in

4.SPM tool

c. Gateway

1. You can use local, hadoop, Amazon S3

d. Node exploration

1. Zen exploration (zen discovery) is allowed by default, and two exploration methods, multicast and unicast, are provided.

VIII. Problem handling

1. Rebalancing is the process of moving shards between different nodes in a cluster

two。 Preheat: _ warmer

After reading the above, do you have any further understanding of the analysis and principle of ElasticSearch? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.