What are several things you must know about ES performance tuning 07/02 Update SLTechnology News&Howtos

What are several things you must know about ES performance tuning

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

About ES performance tuning a few things must know what, for this problem, this article details the corresponding analysis and solution, hoping to help more want to solve this problem of small partners to find a simpler and easier way.

(Zero) ElasticSearch Architecture Overview

ElasticSearch is a big data engine at the forefront of technology. The common combination is ES+Logstash+Kibana as a mature logging system, where Logstash is an ETL tool and Kibana is a data analysis display platform. ES amazing is his strong search related ability and disaster recovery strategy, ES open some interfaces for developers to develop their own plug-ins, ES combined with Chinese word segmentation plug-ins will give ES search and analysis play a great role. ElasticSearch is indexing and searching using the open source full-text search library Apache Lucene, saying that the architecture has to deal with something about Lucene.

About Lucene:

Apache Lucene organizes all the information written to the index into an Inverted Index structure, a data structure that maps terms to documents. It works differently from traditional relational databases in that inverted indexing is, roughly speaking, term-oriented rather than document-oriented. Lucene index also stores a lot of other information, such as word vectors, etc., each Lucene is composed of multiple segments, each segment will only be created once but will be queried many times, once the segment is created, it will not be modified. Multiple segments are merged together at the segment merge stage, which is determined by Lucene's inherent mechanism. The number of segments merged becomes smaller, but the corresponding segments themselves become larger. The process of segment merging is very I/O consuming and at the same time some information that is no longer in use is cleaned up. In Lucene, the process of converting data into an inverted index and complete strings into terms that can be searched is called parsing. Text analysis is performed by Analyzer, which consists of Tokenizer, Filter and Character Mapper, and its functions are obvious. Lucene has its own complete query language to help us search and read.

[Note] The index in ES refers to a field in URI when querying/addressing, such as: [host]:[port (9200)]/[index]/[type]/[ID]? [option], while indexes in Lucene correspond more to the concept of fragmentation in ES.

Returning to ElasticSearch, ES architecture follows a design philosophy with several characteristics:

1. Reasonable default configuration: Just modify the Yaml configuration file in the node and you can configure it quickly. This is similar to the simplification of configuration in Spring4.

2. Distributed working mode: ES's powerful Zen discovery mechanism supports not only group broadcast but also point unicast, and has the wonderful idea of "knowing a little is knowing the world."

3. Peer-to-peer architecture: automatic backup fragments between nodes, and make the fragments themselves and samples as far away as possible to avoid single points of failure. Master nodes and Data nodes are almost identical.

4. Easy to add new nodes to the cluster: greatly simplifies R & D or operations to add new nodes to the cluster.

5. No restrictions are placed on the data structures in the index: ES supports multiple data types in an index.

6. Quasi-real-time: Search and version synchronization. Since ES is a distributed application, one of the major challenges is consistency, both for index and document data, but ES has proven to perform well.

(i) Fragmentation strategy

Select the appropriate number of shards and copies. ES shards are divided into two types, Primary shards and replicas. By default, ES creates five shards per index, even in a standalone environment. This redundancy is called over allocation, which seems unnecessary at present, adding more complexity to the process of scattering documents into shards and processing queries. Fortunately, ES's excellent performance masks this. Suppose an index consists of a fragment, then when the index size exceeds the capacity of a single node, ES cannot divide the index into multiple fragments, so the number of fragments required must be specified when creating the index. All we can do is create a new index and specify in the initialization that this index has more shards. On the other hand, if the allocation is excessive, it increases the complexity of Lucene in merging the fragment query results, thus increasing the time consumption, so we get the following conclusion:

We should use the fewest shards!

The following relationship exists between the number of primary shards, replicas, and the maximum number of nodes:

number of nodes

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.