Etcd Super-complete solution: principle explanation and best practices for deployment settings 07/09 Update SLTechnology News&Howtos

Etcd Super-complete solution: principle explanation and best practices for deployment settings

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/02 Report--

Jieshao

Etcd is an open source distributed key store developed by the CoreOS team and now managed by Cloud Native Computing Foundation. The word is pronounced "et-cee-dee" and means to distribute the Unix system's "/ etc" directory on multiple machines, which contains a large number of global configuration files. It is the backbone of many distributed systems and provides a reliable way to store data across server clusters. It is suitable for a variety of operating systems, including Linux, BSD, and OS X.

Etcd has the following properties:

Full replication: every node in the cluster can use a full archive

High availability: Etcd can be used to avoid a single point of hardware failure or network problems

Consistency: each read returns the latest writes across multiple hosts

Simple: includes a well-defined, user-oriented API (gRPC)

Security: automated TLS with optional client certificate authentication

Fast: base speed of 10000 writes per second

Reliable: rational distribution of storage is achieved by using Raft algorithm

How Etcd works

Before we understand how Etcd works, let's define three key concepts: leaders, elections, and terms. In a Raft-based system, the cluster uses election to select leader for a given term.

Leader handles all client requests that require cluster consensus negotiation. Requests that do not require consensus negotiation, such as reads, can be processed by any cluster member. Leader is responsible for accepting the new changes, copying the information to the follower node, and committing the changes after follower verifies the acceptance. Each cluster can have only one leader at any given time.

If the leader dies or no longer responds, the other nodes will open a new term to create a new election after a predetermined timeout. Each node maintains a random election timer that represents the amount of time the node needs to wait before invoking a new election and selecting itself as a candidate.

If the node does not receive a message from the leader before the timeout occurs, the node starts the new election by launching a new term, marking itself as a candidate, and asking other nodes to vote. Each node votes for the first candidate requesting it to vote. If the candidate gets votes from most of the nodes in the cluster, it becomes the new leader. However, if there are multiple candidates and the same number of votes are obtained, the existing election term will end without leader, and the new term will start with a new random election timer.

As mentioned above, any changes must be connected to the leader node. Instead of accepting and committing the changes immediately, Etcd uses the Raft algorithm to ensure that most nodes agree to the changes. Leader sends the proposed new value to each node in the cluster. The node then sends a message confirming that the new value has been received. If most nodes acknowledge receipt, leader submits the new value and sends a message to each node submitting the value to the log. This means that each change needs to be arbitrated by the cluster node before it can be submitted.

Etcd in Kubernetes

Since becoming part of Kubernetes in 2014, the Etcd community has grown exponentially. CoreOS, Google, Redhat, IBM, Cisco, Huawei and so on are all contributing members of Etcd. Among them, large cloud providers such as AWS, Google Cloud platform and Azure successfully use Etcd in the production environment.

The work of Etcd in Kubernetes is to store critical data securely for distributed systems. It is best known as Kubernetes's master data store, which stores configuration data, state, and metadata. Because Kubernetes usually runs on a cluster of several machines, it is a distributed system that requires distributed data storage such as Etcd.

Etcd makes it easier to store data and monitor changes across clusters, allowing any node from the Kubernetes cluster to read and write data. Kubernetes uses the watch function of Etcd to monitor changes in the actual (actual) state or expected (desired) state of the system. If the two states are different, Kubernetes makes some changes to reconcile the two states. Each read of the kubectl command is retrieved from the data stored in Etcd, any changes made (kubectl apply) create or update entries in Etcd, and each crash triggers a modification of the value in etcd.

Deployment and hardware recommendations

Etcd can be run on laptops or lightweight clouds for testing or development purposes. However, when running an Etcd cluster in a production environment, we should consider the guidance provided by the official Etcd documentation. It provides a good starting point for a good and stable production deployment. It is important to note that:

Etcd writes data to disk, so SSD is highly recommended

Always use an odd number of clusters because the status of the cluster needs to be updated through arbitration

For performance reasons, the cluster usually has no more than 7 nodes

Let's review the steps required to deploy an Etcd cluster in Kubernetes. After that, we will demonstrate some basic CLI commands and API calls. We will deploy with the concepts of Kubernetes, such as StatefulSets and PersistentVolume.

Prepare in advance

Before continuing with demo, we need to prepare:

An account for a Google cloud platform: free tier should be enough. You can also choose most other cloud providers with only a few modifications.

A server running Rancher

Start the Rancher instance

Start the Rancher instance on the server you control. Here is a very simple and intuitive guide to getting started: https://rancher.com/quick-start/

Using Rancher to deploy GKE clusters refer to this guide to use Rancher to set up and configure Kubernetes clusters in your GCP account:

Https://rancher.com/docs/rancher/v2.x/en/cluster-provisioning/hosted-kubernetes-clusters/gke/

Install the Google Cloud SDK and kubelet commands on the same server running the Rancher instance. Follow the link provided above to install SDK and install kubelet through Rancher UI.

Using gcloud init and gcloud auth login, make sure that the gcloud command has access to your GCP account.

After the cluster deployment, enter the following command to check the basic kubectl functionality:

Before deploying the Etcd cluster (importing the YAML file through kubectl or in Rancher's UI), we need to configure some items. In GCE, the default persistent disk is pd-standard. We will configure pd-ssd for Etcd deployment. This is not mandatory, but according to Etcd's advice, SSD is a very good choice. Check this link to learn about the storage classes of other cloud providers:

Https://kubernetes.io/docs/concepts/storage/storage-classes/

Let's examine the available storage classes provided by GCE. As expected, we see a default result called standard:

Apply the following YAML file and update the value of zone to match your preferences so that we can use SSD storage:

Once again, we can see that in addition to the default standard class, ssd can also be used:

Now we can continue to deploy the Etcd cluster. We will create a StatefulSet with three copies, each with a dedicated volume for ssd storageClass. We also need to deploy two services, one for internal cluster communication and one for external access to the cluster through API.

When building the cluster, we need to pass some parameters to the Etcd binaries and then to the data store. The Listen-client-urls and listen-peer-urls options specify the local address that the Etcd server uses to accept incoming connections. Specifying 0.0.0.0 as the IP address means that Etcd will listen for connections on all available interfaces. The Advertise-client-urls and initial-advertise-peer-urls parameters specify the address that should be used when the Etcd client or other Etcd member contacts the etcd server.

The following YAML file defines our two services and the Etcd StatefulSe diagram:

# etcd-sts.yaml---apiVersion: v1kind: Servicemetadata:name: etcd-clientspec:type: LoadBalancerports:- name: etcd-clientport: 2379protocol: TCPtargetPort: 2379selector:app: etcd---apiVersion: v1kind: Servicemetadata:name: etcdspec:clusterIP: Noneports:- port: 2379name: client- port: 2380name: peerselector:app: etcd---apiVersion: apps/v1beta1kind: StatefulSetmetadata:name: etcdlabels:app: etcdspec:serviceName: etcdreplicas: 3template:metadata:name: etcdlabels:app: etcdspec:containers:- name: etcdimage: quay.io/ Coreos/etcd:latestports:- containerPort: 2379name: client- containerPort: 2380name: peervolumeMounts:- name: datamountPath: / var/run/etcdcommand:- / bin/sh--c-| PEERS= "etcd-0= http://etcd-0.etcd:2380, Etcd-1= http://etcd-1.etcd:2380, Etcd-2= http://etcd-2.etcd:2380"exec etcd--name ${HOSTNAME}\-listen-peer-urls http://0.0.0.0:2380\-listen-client-urls http://0.0.0.0:2379\-advertise-client-urls http://${HOSTNAME}.etcd:2379\-initial-advertise-peer-urls http://${HOSTNAME}:2380\-initial-cluster-token etcd-cluster-1\ -- initial-cluster ${PEERS}\-initial-cluster-state new\-data-dir / var/run/etcd/default.etcdvolumeClaimTemplates:- metadata:name: dataspec:storageClassName: ssdaccessModes: ["ReadWriteOnce"] resources:requests:storage: 1Gi

Enter the following command to apply YAML:

After applying the YAML file, we can define resources in different tabs provided by Rancher:

There are two main ways to interact with Etcd and Etcd: using the etcdctl command or directly through RESTful API. We will briefly introduce these two methods, but you can also find more in-depth information and examples by visiting the complete documentation here and here.

Etcdctl is a command line interface that interacts with the Etcd server. It can be used to perform various operations, such as setting, updating or deleting keys, verifying cluster health, adding or removing Etcd nodes, and generating database snapshots. By default, etcdctl uses v2 API to communicate with the Etcd server for backward compatibility. If you want etcdctl to communicate with Etcd using v3 API, you must set the version to 3 through the ETCDCTL_API environment variable.

For API, each request sent to the Etcd server is a gRPC remote procedure call. This gRPC gateway provides a RESTful proxy that converts HTTP/JSON requests into gRPC messages.

Let's find the external IP required for the API call:

We should be able to find three more pods names so that we can use the etcdctl command:

We check the Etcd version. To do this, we can use API or CLI (v2 and v3). Depending on the method you choose, the output will be slightly different.

Use this command to contact API directly:

Check the etcdctl client with API version v2 and enter:

To check the etcdctl client whose API version is v3, enter:

Next, list the cluster members, as we did above:

{"members": [{"id": "2e80f96756a54ca9", "name": "etcd-0", "peerURLs": ["http://etcd-0.etcd:2380"],"clientURLs":["http://etcd-0.etcd:2379"]},{"id":"7fd61f3f79d97779","name":"etcd-1","peerURLs":["http://etcd-1.etcd:2380"],"clientURLs":["http://etcd-1.etcd:2379"]}," {"id": "b429c86e3cd4e077", "name": "etcd-2", "peerURLs": ["http://etcd-2.etcd:2380"],"clientURLs":["http://etcd-2.etcd:2379"]}]}"

V2 version of etcdctl:

V3 version of etcdctl:

Set and retrieve values in Etcd

The final example we will introduce below is to create a key and check its value on all three pods in the Etcd cluster. Then we will kill leader, which in our scenario is etcd-0, and then see how the new leader is selected. Finally, after the cluster is restored, we will verify the values of the previously created keys on all members. We will see that there is no data loss, and the cluster is just a new leader.

We can verify that the cluster is initially healthy by entering the following command:

Next, verify the current leader. The last field indicates that etcd-0 is the leader in our cluster:

Using this API, we will create a key named message and assign a value to it. Remember to replace the IP address with the address you got in the cluster with the following command:

No matter which member is queried, the key has the same value. This helps us verify that the value has been copied to other nodes and submitted to the log.

Demonstrate high availability and recovery. Next, we can kill the Etcd cluster leader. So we can see how the new leader is selected and how the cluster recovers from the degraded state. Delete the pod associated with the Etcd leader found above:

Let's check the health of the cluster:

Failed to check the health of member 2e80f96756a54ca9 on http://etcd-0.etcd:2379: Get http://etcd-0.etcd:2379/health: dial tcp: lookup etcd-0.etcd on 10.15.240.10:53: no such hostmember 2e80f96756a54ca9 is unreachable: [http://etcd-0.etcd:2379] are all unreachablemember 7fd61f3f79d97779 is healthy: got healthy result from http://etcd-1.etcd:2379member b429c86e3cd4e077 is healthy: got healthy result from http://etcd-2.etcd:2379cluster is degradedcommand The information on terminated with exit code 5 indicates Due to the loss of the leader node, the cluster is in a degrade state.

Once Kubernetes responds to the deleted pod,Etcd cluster by launching a new instance, it should recover:

Enter the following instructions and we can see that the new leader has been selected:

In our example, the etcd-1 node is selected as leader

If we check the value of the message key again, we will find that there is no data loss:

Conclusion theory

Etcd is a very powerful, highly available, and reliable distributed key-value store designed specifically for specific use cases. Common examples include storing data cry connection details, cache settings, feature tags, and so on. It is designed to be sequentially consistent, so each event is stored in the same order throughout the cluster.

We learned how to set up and run an etcd cluster with Kubernetes with the help of Rancher. After that, we can use some basic Etcd commands to operate. To better understand the project, how keys are organized, how to set TTLs for keys, or how to back up all data, it would be a good choice to refer to the official Etcd repo:

Https://github.com/etcd-io/etcd/tree/master/Documentation

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.