What is the architecture and principle of Elasticsearch? 07/04 Update SLTechnology News&Howtos

What is the architecture and principle of Elasticsearch?

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article introduces the relevant knowledge of "what is the architecture and principle of Elasticsearch". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

I. introduction

With the rapid development of mobile Internet, Internet of things, cloud computing and other information technology, the amount of data is growing explosively. Nowadays, we can easily find the information we want from the huge amount of data, without the help of search engine technology.

As the number one open source search engine, Elasticsearch allows us to achieve basic full-text retrieval functions without in-depth understanding of the complex information retrieval principles behind it. When the amount of data reaches 1 billion, the scale of 10 billion can still return search results in seconds.

The practical problems concerned by users, such as system disaster recovery, data security, scalability and maintainability, can also be effectively solved on Elasticsearch.

II. Introduction to Elasticsearch

Elasticsearch (ES) is an open source distributed search and analysis engine based on Lucene, which can index and retrieve data in near real time. It has the characteristics of high reliability, easy to use and active community, and is widely used in full-text retrieval, log analysis, monitoring analysis and other scenarios.

Due to the high scalability, the cluster can be expanded to 100 nodes to deal with PB-level data. Write, query, cluster management and other operations can be achieved through a simple RESTful API.

In addition to retrieval, it also provides rich statistical analysis functions. And the official function extension package XPack meets other needs, such as data encryption, alarm, machine learning and so on.

In addition, specific functional requirements can be met through custom plug-ins, such as COS backup, QQ participle, etc.

1. Elasticsearch architecture and principle

Basic concepts:

Cluster "cluster": consists of ES nodes deployed on multiple machines to handle large datasets and achieve high availability

Node "Node": an ES process on a machine that can be configured with different types of nodes

Master Node "master node": used to select the master of the cluster. One of the nodes acts as the master node and is responsible for cluster metadata management, such as index creation, node leaving to join the cluster, etc.

Data Node "data node": responsible for index data storage

Index "index": a logical collection of index data, which can be compared to the DataBase of relational data

Shard "sharding": index a subset of data and achieve data scale-out by assigning shards to different nodes in the cluster. To solve the problem of insufficient CPU, memory and disk processing capacity of a single node

Primary Shard "master shard": data sharding adopts master-slave mode, and the index operation is received by the shard.

Replica Shard "replica fragmentation": a copy of the main shard to improve query throughput and achieve high data reliability. When the master shard is abnormal, one of the replica shards is automatically promoted to the new master shard.

To make it easier for you to understand the data model in ES, compare it to the relational database MySQL:

As you can see from the architecture diagram above, the ES architecture is very concise. The built-in automatic discovery implements Zen discovery. When a node is started, you can join the cluster by contacting the list of cluster members.

One of the nodes acts as the master node, which is used for cluster metadata management and maintains the distribution relationship between the nodes. When a new node joins the cluster, the Master node will automatically migrate part of the shard to the new node to balance the cluster load.

It is hard to avoid node failures in distributed clusters. The master node regularly detects the survival status of other nodes in the cluster. when the node fails, it moves the node out of the cluster and automatically recovers the fragments on the failed node on other nodes.

When the main shard fails, one of the replicas will be promoted to be the primary shard. Other nodes will also activate the master node. When the master node fails, it will trigger the built-in Raft-like protocol to select the master, and avoid the cluster brain fissure by setting the minimum number of candidate master nodes.

In addition to cluster management, index data reading and writing is also an important part of our concern. ES adopts peer-to-peer architecture, and each node stores a full amount of fragmented routing information, that is, each node can receive users to read and write.

If a write request is sent to node 1, by default, the write request determines which main shard to write to by the Hash value of the document ID, which is assumed to be written to shard 0.

After writing the main shard P0, the write request is forwarded to the node where the replica shard R0 is located, and when the node where the replica shard is located confirms the success of the write, it returns to the client to report the success of the write, so as to ensure the data security. And before writing, it ensures the number of copies of the number of quorum to avoid inconsistent write data caused by network partitions.

The query uses distributed search. For example, after the request is sent to node 3, the request will be forwarded to the node where the main shard or replica shard of the index is located.

Of course, if the write and query have routing field information. The request will only be sent to part of the fragment to avoid full fragment scanning. After completing the query, these nodes return the results to the requesting node, and the requesting node aggregates the results of each node to return to the client.

2. Lucene principle

After introducing the basic principles of ES cluster, the following is a brief introduction to the underlying storage engine Lucene of ES.

First of all, Lucene is a high-performance information retrieval library, which provides basic functions of indexing and retrieval. On this basis, ES solves the problems of reliability and distributed cluster management, and finally forms a product-oriented full-text retrieval system.

The core problem solved by Lucene is full-text retrieval. Different from the traditional retrieval methods, full-text retrieval avoids scanning all the contents during the query.

For example, after the data is written, it will first segment the contents of the written document field to form a dictionary table and the inverted table associated with it. In the query, the keyword segmentation results directly match the contents of the dictionary table, and obtain the list of related documents, and quickly obtain the result set. And through the sorting rules, give priority to display documents with high matching degree.

In order to speed up the index speed, Lucene adopts the LSM Tree structure, which caches the index data in memory. When the memory space is high or after a certain period of time, the data in memory will be written to disk to form a data segment file (segment). The segment file contains dictionaries, inverted tables, field data, and so on.

To be compatible with write performance and data security, such as avoiding data loss in the memory buffer due to machine failure. ES also writes the transaction log Translog while writing to memory. The data in memory will generate new segment files periodically, and the file system cache with lower writing overhead can be opened and read to achieve near-real-time search.

3. ElasticSearch application scenarios

Typical usage scenarios of ES include log analysis, time series analysis, full-text retrieval and so on.

1. Log real-time analysis scenario

Log is a broad-based data form in the Internet industry. Typical logs include operation logs used to locate business problems, such as slow logs, exception logs, business logs used to analyze user behavior, such as user clicks, access logs, and audit logs for security behavior analysis.

Elastic Ecology provides a complete logging solution. Through simple deployment, a complete real-time log analysis service can be built. ES ecology perfectly solves the requirements of real-time log analysis scenarios, which is also an important reason for the rapid development of ES in recent years.

The log is generally at the level of 10s from generation to accessibility, which is very timely compared with the tens of minutes and hours of the traditional big data solution.

The underlying layer of ES supports data structures such as inverted index and column storage, which makes it possible to take advantage of ES's very flexible search and analysis capabilities in log scenarios. With ES interactive analysis capabilities, the log search response time is seconds even in the case of trillions of logs.

The basic flow of log processing includes: log collection-> data cleaning-> storage-> visual analysis. Elastic Stack helps users to complete the full link management of log processing through a complete logging solution.

Where:

Log collection: read business log files in real time through lightweight log collection component FileBeat, and send data to downstream components such as Logstash.

Text parsing: regular parsing and other mechanisms are used to convert log text data into structured data. You can use a separate Logstash service or Elasticsearch's built-in lightweight data processing module Ingest Pipeline to complete data cleaning and conversion.

Data storage: persistent data storage through Elasticsearch search and analysis platform, providing full-text search and analysis capabilities.

Visual analysis: through the rich graphical interface, you can search and analyze the log data, such as the visual component Kibana.

two。 Time series analysis scenario

Time series data is the data that records the state changes of equipment and system in chronological order. Typical time series data include traditional server monitoring index data, application system performance monitoring data, intelligent hardware, industrial Internet of things sensor data and so on.

As early as 2017, we also explored the scenario of timing analysis based on ES. The time series analysis scenario has the characteristics of high concurrent writing, low query delay and multi-dimensional analysis.

Because ES has the capabilities of cluster expansion, batch writing, read-write band routing, data slicing and so on, it has been realized that the maximum size of an online single cluster can reach 600 + nodes, the write throughput of 1000w/s, and the query delay of a single curve or single timeline can be controlled in 10ms.

ES provides flexible and multi-dimensional statistical analysis capabilities to enable viewing and monitoring to carry out statistical analysis flexibly according to regions and business modules. In addition, ES supports column storage, high compression ratio, and the number of copies can be adjusted on demand to achieve lower storage costs. Finally, the time series data can also be easily visualized through Kibana components.

3. Search service scenarios

Typical search service scenarios include product search in JD.com, pinduoduo and Mogujie, APP search in app stores, and in-site searches such as forums and online documents.

Users in such scenarios focus on high performance, low latency, high reliability, search quality, and so on. For example, a maximum of 10w + QPS is required for a single service, the average response time for requests is less than 20ms, the query burr is lower than 100ms, and the availability is high. For example, search scenarios usually require the availability of 4 9s, and support single server room failure recovery.

Currently, Elasticsearch service on cloud supports disaster recovery in multiple availability zones and recovery in minutes of failure. Through ES efficient inverted index, as well as custom scoring, sorting capabilities and rich word segmentation plug-ins to achieve full-text retrieval requirements. In the field of open source full-text retrieval, ES has been ranked number one in the DB-Engines search engine category for many years.

4. Tencent ElasticSearch service

There are a large number of demand scenarios for real-time log analysis, time series data analysis and full-text retrieval both inside and outside Tencent.

At present, we have joined Elastic to provide kernel enhanced ES cloud service, referred to as CES, on Tencent Cloud. Kernel enhancements include Xpack business suite and kernel optimization.

There are also many problems and challenges within the service company and in the process of public cloud customers, such as super-large-scale clusters, 10 million-level data writing, and rich usage scenarios of cloud users.

The following will describe our optimization measures at the kernel level in terms of availability, performance, cost, and so on.

1. Usability optimization

Usability problems are manifested in three aspects:

(1) the robustness of ES kernel system is insufficient.

This is also a common problem of distributed systems. For example, abnormal queries and pressure overload clusters are prone to avalanches. The cluster is not scalable enough, for example, if the number of cluster fragments exceeds 10w, there will be obvious metadata management bottleneck. And when the capacity of the cluster is expanded and the node is added back to the cluster after an exception, there is a problem of uneven data between nodes and multiple hard disks.

(2) lack of disaster recovery plan

It is necessary to ensure the reliability and data security problems such as rapid recovery of services in case of network failure in the computer room, prevention of data loss under natural disasters, and rapid recovery of data after misoperation.

(3) system defects

In addition, it also includes some ES system defects found in the process of operation, such as Master node congestion, distributed deadlock, slow rolling restart and so on.

In view of the above problems, in terms of system robustness, we tolerate service instability caused by machine network failures and abnormal queries through service flow restriction.

By optimizing the cluster metadata management logic, we can improve the cluster scalability by an order of magnitude, and support thousands of node clusters and millions of shards. In the aspect of cluster balancing, the pressure balance of large-scale clusters is ensured by optimizing the slicing balance between nodes and multiple hard disks.

In terms of disaster recovery solution, we implement data backup and rollback by extending the plug-in mechanism of ES, which can back up the data of ES to COS to ensure data security; through the construction of management and control system, we support disaster recovery across availability zones, and users can deploy multiple availability zones as needed to tolerate single server room failures. The trash can mechanism is used to ensure that the cluster data can be quickly recovered in scenarios such as arrears and misoperation.

In terms of system defects, we have fixed a series of Bug, such as rolling restart, Master blocking, distributed deadlock and so on. Among them, the rolling restart optimization can accelerate the node restart speed by 5 + times. For the Master congestion problem, we optimized it with the official version of ES 6.x.

two。 Performance optimization

Performance problems, such as timing scenarios such as logging and monitoring, require very high write performance, and write concurrency can reach 1000w/s. However, we found that ES performance degraded 1 + times when writing with a primary key.

It is found that the CPU can not be fully utilized in the stress test scenario. In general, search services have very high requirements for query, generally requiring 20w QPS, the average response time is less than 20ms, and try to avoid query burrs caused by GC and poor execution plans.

To solve these problems. In terms of writing, for the scenario of removing duplicates of primary keys, we speed up the process of removing duplicates of primary keys by using the maximum and minimum values recorded on segment files to speed up the process of removing duplicates, and the writing performance is improved by 45%. For more information, please see Lucene-8980 [1].

For the problem that CPU cannot be fully utilized in stress testing scenarios, optimize the lock granularity when ES refreshes Translog to avoid resource preemption and improve performance by 20%. For more information, please see ES-45765 / 47790 [2]. We are also trying to optimize write performance through vectorization execution, which is expected to double by reducing branch jumps and instruction Miss.

In terms of query, we optimize the segment file merging strategy, which automatically triggers the merge for inactive segment files and converges the number of segment files to reduce resource overhead and improve query performance.

Query pruning is carried out according to the maximum and minimum values recorded on each segment file, which improves query performance by 40%. The CBO strategy is used to avoid 10 + times query burrs caused by caching expensive Cache operations. For more information, please see Lucene-9002 [3].

It also includes optimizing performance issues in Composite aggregation, implementing true page flipping, and optimizing aggregation with sorting scenarios to improve performance by 3-7 times. In addition, we are also trying to optimize performance through some new hardware, such as Intel's AEP, Optane, QAT and so on.

3. Cost optimization

The cost is mainly reflected in the consumption of machine resources in the timing scenarios represented by logs and monitoring. Combined with the typical online log and timing business statistics, it is found that the cost ratio of hard disk, memory and computing resources is close to 8:4:1.

It can be concluded that hard disk and memory are the main contradiction, followed by calculation cost. This kind of timing scenario has obvious access characteristics, that is, the data has hot and cold characteristics.

Time series data access has the characteristics of nearly more or less, for example, the proportion of data visits in the past 7 days can reach more than 95%, while historical data access is less, and usually access statistical information.

In terms of hard disk cost, because the data has obvious hot and cold characteristics, we use hot and cold separation architecture and hybrid storage scheme to balance cost and performance.

Since historical data is usually just access to statistics, we use precomputed Rollup in exchange for storage and query performance, similar to materialized views. For those who do not use historical data at all, they can also be backed up to cheaper storage systems such as COS. Other optimization methods include multi-disk strategy compatible with data throughput and data disaster recovery, and periodic deletion of expired data through lifecycle management.

In terms of memory cost, we find that only 20% of the storage resources are insufficient, especially for large storage models. In order to solve the problem of insufficient memory, we use Off-Heap technology to improve the utilization of memory in the heap, reduce GC overhead, and improve the ability of a single node to manage disks.

Move the FST which accounts for a large amount of memory to out-of-heap management, and avoid copying data inside and outside the heap by storing the address of objects outside the heap. The memory recovery of out-of-heap objects is realized through Java weak reference mechanism, which further improves the memory utilization.

The implementation of 32GB heap memory can manage about 50 TB of disk space, which is 10 times higher than the original version, and the performance is the same, while the advantage of GC is significantly improved.

In addition to the optimization at the kernel level, the platform layer supports cloud service resource management, instance instance management and other service hosting through the management and control platform. It is convenient and quick to create instances and adjust specifications.

Ensure the service quality through the monitoring system and maintenance tools in the operation and maintenance support platform. And through the intelligent diagnosis platform under construction to find the potential problems of services, to achieve a stable and reliable internal and external ES services.

This is the end of the content of "what is the Architecture and principles of Elasticsearch". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.