ES Learning Notes-the study of ClusterState 04/18 Update SLTechnology News&Howtos

ES Learning Notes-the study of ClusterState

2025-04-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

I studied the overall idea of ES's get api earlier, which can be used as a reference when writing ES plug-ins. The focus at that time was on understanding the overall flow, mainly the invocation logic within the method of shardOperation (), which weakened the shards () method. In fact, the shards () method is more useful in understanding the structure of ES. Let's start with get api to understand shards ().

Let's review the process of using get api:

Add document to ES: curl-XPUT 'http://localhost:9200/test1/type1/1'-d' {"name": "hello"} 'read data according to document ID: curl-XGET' http://localhost:9200/test1/type1/1'

It's easy to use. But if you consider the distribution, the logic behind it is not simple. If the ES cluster has three nodes, the index in which the data resides also has three shards, one copy for each shard. That is, index is set as follows:

{"test1": {"settings": {"index": {"number_of_replicas": "1", "number_of_shards": "3"}

Which shard should a doc with an id of 1 be distributed to? This question needs to be answered by a detailed blog post. Let's give a brief conclusion here:

By default, ES calculates a hash value based on the document id, using Murmur3HashFunction, and then takes the module based on the id and the number of fragments. The implementation code is MathUtils.mod (hash, indexMetaData.getNumberOfShards ()); the final result is the shard id where the document is located, so the shard label of ES starts at 0.

If you don't know how to save, how can you know how to take.

Reorganize the core process of fetching data:

S1: locate the shard where the data is located according to the document id. Because it can be set to multiple copies, a tile is mapped to multiple nodes. S2: according to the mapping information of the sharding node, select a node to obtain the data. The focus here is on the selection of nodes. In short, we need load balancing, otherwise there is no point in setting up replicas.

Both of the above steps are associated with a core data structure, ClusterState, which we can view using _ cluster/state?pretty:

# http://localhost:9200/_cluster/state?pretty{ "cluster_name": "elasticsearch", "version": 4, "state_uuid": "b6B739p5SbanNLyKxTMHfQ", "master_node": "KnEE25tzRjaXblFJq5jqRA", "blocks": {}, "nodes": {"KnEE25tzRjaXblFJq5jqRA": {"name": "Mysterio", "transport_address": "127.0.0.1 version" 9300 " "attributes": {}, "metadata": {"cluster_uuid": "ZIl7g86YRiGv8Dqz4DCoAQ", "templates": {}, "indices": {"test1": {"state": "open", "settings": {"index": {"creation_date": "1553995485603" Uuid: "U7v5t_T7RG6rNU3JlGCCBQ", "number_of_replicas": "1", "number_of_shards": "1", "version": {"created": "2040599"}, "mappings": {} "aliases": []}}, "routing_table": {"indices": {"test1": {"shards": {"0": [{"state": "STARTED", "primary": true, "node": "KnEE25tzRjaXblFJq5jqRA" "relocating_node": null, "shard": 0, "index": "test1", "version": 2, "allocation_id": {"id": "lcSHbfWDRyOKOhXAf3HXLA"}, {"state": "UNASSIGNED" "primary": false, "node": null, "relocating_node": null, "shard": 0, "index": "test1", "version": 2, "unassigned_info": {"reason": "INDEX_CREATED" "at": "2019-03-31T01:24:45.845Z"}, "routing_nodes": {"unassigned": [{"state": "UNASSIGNED", "primary": false, "node": null, "relocating_node": null, "shard": 0 "index": "test1", "version": 2, "unassigned_info": {"reason": "INDEX_CREATED", "at": "2019-03-31T01:24:45.845Z"}}], "nodes": {"KnEE25tzRjaXblFJq5jqRA": [{"state": "STARTED", "primary": true "node": "KnEE25tzRjaXblFJq5jqRA", "relocating_node": null, "shard": 0, "index": "test1", "version": 2, "allocation_id": {"id": "lcSHbfWDRyOKOhXAf3HXLA"}

The whole structure is quite complicated, so we take it apart slowly and break it one by one. The idea of disassembly is to start with the use of the scene.

The study of IndexMetaData

The format of metaData is as follows: "metadata": {"cluster_uuid": "ZIl7g86YRiGv8Dqz4DCoAQ", "templates": {}, "indices": {"test1": {"state": "open", "settings": {"index": {"creation_date": "1553995485603", "uuid": "U7v5t_T7RG6rNU3JlGCCBQ", "number_of_replicas": "1" "number_of_shards": "1", "version": {"created": "2040599"}}, "mappings": {}, "aliases": []}

That is, metadata stores the number of fragments and replicas of each index in the cluster, the status of the index, the mapping of the index, the alias of the index, and so on. The function provided by this structure is to obtain index metadata according to the index name, as follows:

# OperationRouting.generateShardId () IndexMetaData indexMetaData = clusterState.metaData (). Index (index); if (indexMetaData = = null) {throw new IndexNotFoundException (index);} final Version createdVersion = indexMetaData.getCreationVersion (); final HashFunction hashFunction = indexMetaData.getRoutingHashFunction (); final boolean useType = indexMetaData.getRoutingUseType ()

Here we focus on the code clusterState.metaData (). Index (index), which implements the function of getting index metadata based on the index name. By combining the number of shards in the metadata with the document id, we can locate the shards in which the document is located. This function is required in Delete, Index, and Get API. Here we can also understand why the number of index shards in ES cannot be modified: if so, the hash function cannot correctly locate the shards where the data is located.

IndexRoutingTable learning "routing_table": {"indices": {"test1": {"shards": {"0": [{"state": "STARTED", "primary": true, "node": "KnEE25tzRjaXblFJq5jqRA", "relocating_node": null, "shard": 0 "index": "test1", "version": 2, "allocation_id": {"id": "lcSHbfWDRyOKOhXAf3HXLA"}}, {"state": "UNASSIGNED", "primary": false, "node": null "relocating_node": null, "shard": 0, "index": "test1", "version": 2, "unassigned_info": {"reason": "INDEX_CREATED" "at": "2019-03-31T01:24:45.845Z"}]}

Routing_table stores the shard information of each index, and through this structure, we can clearly understand the following information:

1. The distribution of index fragments in each node 2. Whether the index fragment is the main part or not

If a shard has two copies, and each is assigned to different nodes, then get api has a total of three data nodes to choose from. Which one should be chosen? The preference parameter is not considered here for the time being.

In order to enable each node to be selected fairly and achieve the purpose of load balancing, random numbers are used here. Reference RotateShuffer

/ * Basic {@ link ShardShuffler} implementation that uses an {@ link AtomicInteger} to generate seeds and uses a rotation to permute shards. * / public class RotationShardShuffler extends ShardShuffler {private final AtomicInteger seed; public RotationShardShuffler (int seed) {this.seed = new AtomicInteger (seed);} @ Override public int nextSeed () {return seed.getAndIncrement ();} @ Override public List shuffle (List shards, int seed) {return CollectionUtils.rotate (shards, seed);}}

That is to say, use ThreadLocalRandom.current (). NextInt () to generate a random number as a seed, and then rotate it in turn.

The effect of Collections.rotate () can be demonstrated with the following code:

Public static void main (String [] args) {List list = Lists.newArrayList ("a", "b", "c"); int a = ThreadLocalRandom.current () .nextInt (); List L2 = CollectionUtils.rotate (list, a); List L3 = CollectionUtils.rotate (list, axi1); System.out.println (L2); System.out.println (L3) }-[b, c, a] [c, a, b]

For example, if the list of nodes obtained by request An is [bMagnec _ a], then the list of nodes obtained by request B is [cmeno _ a _ b]. In this way, the purpose of load balancing is achieved.

DiscoveryNodes's study.

Because the id of the node is stored in routing_table, you also need to know the configuration information such as the ip and port of the node when sending the request to the target node. This information is stored in nodes. "nodes": {"KnEE25tzRjaXblFJq5jqRA": {"name": "Mysterio", "transport_address": "127.0.0.1 name 9300", "attributes": {}

Once the node information is obtained through this nodes, you can send the request, and the communication of all ES internal nodes is based on transportService.sendRequest ().

To sum up, this article combs several core structures in ES's ClusterState based on get api: metadata,nodes, routing_table. There is another routing_nodes that is not used here. After sorting out the usage scene, record it.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.