Authoritative Guide to Elasticsearch search and tuning (2amp 3) 07/09 Update SLTechnology News&Howtos

Authoritative Guide to Elasticsearch search and tuning (2amp 3)

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

This article was first posted on the official account of Wechat, vivo Internet technology.

Original English text: https://qbox.io/blog/elasticsearch-search-tuning-part-2

Author: Adam Vanderbush

Translator: Yang Zhentao

Catalogue

Pre-indexed data mapping avoids using scripts to force merging read-only indexes

The authoritative guide to Elasticsearch search and tuning is one of a series of articles published by QBOX on its blog. This article is the second in this series, which mainly introduces some tuning methods related to search performance, such as index preprocessing, mapping establishment, avoiding the use of scripts, index segment merging and so on.

This article is the second in a series of 3 articles in the Elasticsearch search tuning series, the first of which is referenced here (click). This series of tutorials is designed to further discuss search tuning techniques, strategies, and recommendations for Elasticsearch version 5.0 and above.

1. Pre-indexed data

In order to optimize the way the data is indexed, some patterns should be preset in the query. For example, if all documents have a price field called price, and most queries perform range aggregation on a fixed-scope list, you can speed up the aggregation by pre-indexing the range into the index and using a terms aggregation.

For example, there are the following documents:

Curl-XPUT 'ES_HOST:ES_PORT/index/type/1?pretty'-H' Content-Type: application/json'-d'{"designation": "bowl", "price": 13}'

And the following search request:

Curl-XGET 'ES_HOST:ES_PORT/index/_search?pretty'-H' Content-Type: application/json'-d'{"aggs": {"price_ranges": {"range": {"field": "price", "ranges": [{"to": 10}, {"from": 10, "to": 100} {"from": 100}]}'

You can then add a price_range field during the indexing phase, which should be mapped to a keyword:

Curl-XPUT 'ES_HOST:ES_PORT/index?pretty'-H' Content-Type: application/json'-d'{"mappings": {"type": {"properties": {"price_range": {"type": "keyword"} 'curl-XPUT' ES_HOST:ES_PORT/index/type/1?pretty'- H 'Content-Type: application/json'-d'{"designation": "bowl" "price": 13, "price_range": "10-100"}'

The search request can then aggregate the new field instead of performing a range aggregation on the price field.

Curl-XGET 'ES_HOST:ES_PORT/index/_search?pretty'-H' Content-Type: application/json'-d'{"aggs": {"price_ranges": {"terms": {"field": "price_range"}'2. Mapping

In fact, some numerical data does not always mean that it has to be mapped to a numeric field. Typically, fields that are stored as identifiers such as ISBN, or any fields that identify numbers recorded in another database, may be better mapped to keywords than to an integer or long type.

Keyword types are used to index structured content, such as email address, host name, status code, zip code, or label.

Typically used for filtering (such as finding all published blog posts), sorting, and aggregating. Keyword fields can only be found by searching for their exact values.

If you need to index full-text content such as email content or product descriptions, you may want to use a text field.

The following is an example of a key field mapping:

Curl-XPUT 'ES_HOST:ES_PORT/my_index?pretty'-H' Content-Type: application/json'-d'{"mappings": {"my_type": {"properties": {"tags": {"type": "keyword"}

Indexes imported from version 2.x do not support keywords; instead, they try to downgrade the keyword type to the string type. This supports merging new and old mappings. Long-standing indexes must be rebuilt before upgrading to version 6.x, but mapping degradation provides an opportunity to rebuild as planned.

3. Avoid using scripts

In general, you should avoid using scripts as much as possible; if you have to, Painless and expression engines are preferred.

Painless is a simple and secure scripting language designed specifically for use in Elasticsearch. It is the default scripting language for Elasticsearch and can be safely used for inlining and storing scripts. For a more detailed description of Painless syntax and language features, refer to the Painless language specification.

Refer to "Painless Scripting in Elasticsearch" for a more in-depth understanding of the Painless scripting language guide.

Lucene expression language

Lucene expressions compile an javascript expression into bytecode, designed for high-performance custom rating and sorting functions, and support inline and default storage scripts.

Performance

Expressions perform better than custom Lucene code; they have lower single-document costs than other script engines: expressions are "ahead".

This allows you to execute very quickly, especially much faster than your own local scripts.

Grammar

Expressions support a subset of ja**vascript** syntax: a separate expression. See the documentation for the expression module for supported operators and functions.

The variables that are accessible in the expression script are:

Document fields, such as doc ['myfield']. Value

Variables and methods supported by fields, such as doc ['myfield']. Empty

Parameters passed to the script, such as mymodifier

Current document score, _ score (valid only when used in script_score)

Expression scripts can be used for script_score, script_fields, sort scripts, and numeric aggregation scripts by simply setting parameters into expressions.

4. Force merge read-only indexes

Read-only indexes will benefit greatly when merged into a single segment. Typically, time-based indexes: only the index of the current time window becomes a new document, while the old index becomes read-only.

Force merge API supports forcing one or more indexes to be merged through API. Merging is related to the number of segments in the Lucene index in each shard. The force merge operation supports reducing the number of segments by merging.

The call will be blocked until the merge is complete. If the http connection is disconnected, the request will continue in the background and all new requests will be blocked before the previous forced merge is completed.

Curl _ XPOST 'ES_HOST:ES_POST/twitter/_forcemerge?pretty'

Force the merge API to accept the following request parameters:

Max_num_segments-the number of segments to be merged. To fully merge the index, set it to 1. By default, a merge is simply checked to see if it needs to be performed, and if so, it is performed.

Only_expunge_deletes-whether the merge process erases only the segments that contain deletions. In Lucene, a document is not deleted directly from a segment, but is marked for deletion. In the process of merging a segment, a new segment may be created, and the new segment does not contain those deletions. This tag parameter supports merging only deleted segments, and defaults to false. Note that this does not override the threshold index.merge.policy.expunge_deletes_allowed.

Flush-forces whether to execute flush after a merge, defaults to true.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.