Example Analysis of Elasticsearch Parameter configuration 07/03 Update SLTechnology News&Howtos

Example Analysis of Elasticsearch Parameter configuration

2025-07-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

Editor to share with you a sample analysis of Elasticsearch parameter configuration, I believe most people do not know much about it, so share this article for your reference, I hope you can learn a lot after reading this article, let's go to know it!

There are two configuration files in the config folder of Elasticsearch: elasticsearch.yml and logging.yml. The first is the basic configuration file of es, and the second is the log configuration file. Es also uses log4j to log.

Cluster.name: gh-cluster

Configure the cluster name of es. The default is elasticsearch. In a cluster, each node has the same custer.name.

Node.name: "gh-cluster-node-01"

Node name, which is randomly assigned to a name in the name list by default. In the same cluster, the names of each node should remain unique.

Node.master: true

Specify whether the node is eligible to be elected as node. The default is true,es. The default is that the first machine in the cluster is master. If this machine crashes, master will be re-elected.

Node.data: true

Specifies whether the node stores index data, which defaults to true.

Index.number_of_shards: 5

Set the default number of index shards, which defaults to 5.

Index.number_of_replicas: 1

Sets the default number of index copies, which defaults to 1 copy.

Path.conf: / path/to/conf (recommended modification)

Sets the storage path for configuration files, which defaults to the config folder under the es root directory.

Path.data: / path/to/data (recommended modification)

Set the storage path for index data. The default is the data folder under the es root directory. You can set multiple storage paths separated by commas, for example:

Path.data: / path/to/data1,/path/to/data2

Path.work: / path/to/work (recommended modification)

Set the storage path for temporary files. The default is the work folder under the es root directory.

Path.logs: / path/to/logs (recommended modification)

Set the storage path for log files. The default is the logs folder under the es root directory.

Path.plugins: / path/to/plugins (recommended modification)

Set the storage path of the plug-in. The default is the plugins folder under the es root directory.

Bootstrap.mlockall: true

Set to true to lock the memory. Because es is less efficient when jvm starts swapping, to ensure that it is not swap, you can set the ES_MIN_MEM and ES_MAX_MEM environment variables to the same value, and ensure that the machine has enough memory allocated to es. At the same time, the process of elasticsearch should be allowed to lock the memory. Under linux, you can use the command `ulimit-l unlocked ted`.

Network.bind_host: 192.168.0.1 (it is recommended to change it to the ip of the server where it belongs)

Set the bound ip address, which can be ipv4 or ipv6. The default is 0.0.0.0.

Network.publish_host: 192.168.0.1

Set the ip address where other nodes interact with this node. If you do not set it, it will automatically determine that the value must be a real ip address.

Network.host: 192.168.0.1

This parameter is used to set both bind_host and publish_host parameters at the same time.

Transport.tcp.port: 9300

Set the tcp port for interaction between nodes. The default is 9300.

Transport.tcp.compress: true

Sets whether to compress the data transferred by tcp. The default is false, which is not compressed.

Http.port: 9200

Set the http port for external services. The default is 9200.

Http.max_content_length: 100mb

Sets the maximum capacity of content. Default is 100mb.

Http.enabled: false

Whether to use http protocol to provide services. Default is true. Enable.

There may be an overall restart of the ES cluster, such as the need to upgrade hardware, operating systems, or major versions of ES. A problem that may be caused by restarting all nodes: some nodes may join the cluster before others, and the nodes that join the cluster first may already be able to elect master and immediately start the process of recovery. Because the data of the whole cluster is incomplete, master will instruct some nodes to start replicating data with each other. For those late nodes, once it is found that the local data has been copied to other nodes, the local "invalid" data is deleted directly. When the whole cluster is restored, the data distribution is uneven, which is obviously uneven. Master will trigger the rebalance process and move the data between nodes. The whole process needlessly consumes a lot of network traffic; reasonable setting of recovery-related parameters can prevent the occurrence of this problem.

Gateway.expected_nodes

Gateway.expected_master_nodes

Gateway.expected_data_nodes

The above three parameters mean that the recovery process starts as soon as there are many nodes in the cluster. The difference is that the first parameter refers to either master or data, while the last two parameters refer to master and data node, respectively.

Before the expected node count condition is met, the recovery process will wait such a long time for gateway.recover_after_time (default is 5 minutes). Once the wait times out, it will determine whether to start based on the following conditions:

Gateway.recover_after_nodes

Gateway.recover_after_master_nodes

Gateway.recover_after_data_nodes

For example, for a cluster with 10 data node, if you have the following settings:

Gateway.expected_data_nodes: 10

Gateway.recover_after_time: 5m

Gateway.recover_after_data_nodes: 8

If the cluster joins 10 data node within 5 minutes, or more than 8 data node join 5 minutes later, the recovery process will be started immediately.

Cluster.routing.allocation.cluster_concurrent_rebalance:2

Specifies the number of shards used for concurrent rebalancing. The setting of this property depends on the hard disk conditions, such as the number of CPU, IO performance, and so on. If this property is not set properly, it will affect the performance of the ElasticSearch index

Cluster.routing.allocation.node_initial_primaries_recoveries: 4

When initializing data recovery, the number of concurrent recovery threads defaults to 4.

Cluster.routing.allocation.node_concurrent_recoveries: 2

The number of concurrent recovery threads when adding or deleting nodes or load balancers. The default is 4.

Indices.recovery.max_size_per_sec: 0

Set the limited bandwidth for data recovery. For example, if you enter 100mb, the default is 0, that is, there is no limit.

Indices.recovery.concurrent_streams: 5

Set this parameter to limit the maximum number of concurrent streams that can be opened at the same time when recovering data from other shards. The default is 5.

Discovery.zen.minimum_master_nodes: 1

Set this parameter to ensure that the nodes in the cluster know about the other N master qualified nodes. The default is 1, and for large clusters, you can set a larger value (2-4).

Discovery.zen.ping.timeout: 3s (recommended modification)

Set the ping connection timeout when other nodes are automatically discovered in the cluster. The default is 3 seconds. For poor network environments, you can set a higher value to prevent errors during automatic discovery.

Discovery.zen.ping.multicast.enabled: false

Sets whether to turn on the multicast discovery node. The default is true.

Discovery.zen.ping.unicast.hosts: ["host1", "host2:port", "host3"]

Set the initial list of master nodes in the cluster, which can be used to automatically discover new nodes to join the cluster.

Here are some slow log parameter settings for queries

Index.search.slowlog.level: TRACE

Index.search.slowlog.threshold.query.warn: 10s

Index.search.slowlog.threshold.query.info: 5s

Index.search.slowlog.threshold.query.debug: 2s

Index.search.slowlog.threshold.query.trace: 500ms

Index.search.slowlog.threshold.fetch.warn: 1s

Index.search.slowlog.threshold.fetch.info: 800ms

Index.search.slowlog.threshold.fetch.debug:500ms

Index.search.slowlog.threshold.fetch.trace: 200ms

The above is all the contents of the article "sample Analysis of Elasticsearch Parameter configuration". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.