How to build Elasticsearch Cluster in Kubernetes 07/12 Update SLTechnology News&Howtos

How to build Elasticsearch Cluster in Kubernetes

2025-07-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/03 Report--

This article is a tutorial on Kubernetes building Elasticsearch clusters. I think it's very practical, so I share it with you. The following information is about building Elasticsearch.

Elasticsearch is a real-time, distributed, scalable search engine that allows full-text, structured search, typically used to index and search large amounts of log data, but also to search many different types of documents.

Elasticsearch is usually deployed with Kibana, a powerful Elasticsearch data visualization Dashboard that allows you to browse Elasticsearch log data through a web interface.

Fluentd is a popular open source data collector and we will install Fluentd on Kubernetes cluster nodes by taking container log files, filtering and transforming log data, then passing the data to Elasticsearch cluster where it will be indexed and stored.

Let's configure to start a scalable Elasticsearch cluster, then create a Kibana app in the Kubernetes cluster, and finally run Fluentd through DaemonSet so that it can run a Pod on each Kubernetes worker node.

1. Create an Elasticsearch cluster

Before creating the Elasticsearch cluster, we create a namespace.

Create a kube-efk.yaml

kubectl apply -f kube-efk.yaml

kubectl get ns Check if there is a namespace for this efk

Here we use three Elasticsearch Pods to avoid the "split brain" problem that occurs in a multi-node cluster under high availability, when one or more nodes cannot communicate with other nodes, and several master nodes may appear.

A key point is that you should set the parameter discover.zen.minimum_master_nodes=N/2+1, where N is the number of nodes in the Elasticsearch cluster that match the master node, for example, we have 3 nodes here, which means N should be set to 2. This way, if one node is temporarily disconnected from the cluster, the other two nodes can choose a new master node, and the cluster can continue to run when the last node tries to rejoin. Be sure to keep this parameter in mind when expanding the Elasticsearch cluster.

First create a headless service called elasticsearch and create a new file elasticsearch-svc.yaml with the following contents:

Define a Service named elasticsearch, specify the tag app=elasticsearch, when we associate Elasticsearch StatefulSet with this service, the service will return DNS A records for Elasticsearch Pods with the tag app=elasticsearch, and then set clusterIP=None to set the service to headless. Finally, we define ports 9200 and 9300 for interacting with the REST API and for inter-node communication, respectively.

And then we create this headless service

kubectl apply -f elasticsearch-svc.yaml

Now that we have a headless service and a stable domain name.elasticsearch.logging.svc.cluster.local for our Pod, let's create a concrete Elasticsearch Pod app via StatefulSet.

Kubernetes StatefulSet allows us to assign a stable ID and persistent storage for Pods. Elasticsearch needs stable storage to ensure that the data of Pods remains unchanged after rescheduling or restarting, so StatefulSet is needed to manage Pods.

We use a StorageClass object named es-data-db, so we need to create this object in advance. We use NFS as the storage backend here, so we need to install a corresponding provisioner driver.

Let's start by creating elasticsearch-storageclass.yaml

Then we create pvc to correspond to this storageclass

elasticsearch-pvc.yaml

Finally, we create this statefulset.

elasticsearch-statefulset.yaml

Then we create it using kubectl

kubectl apply -f elasticsearch-storageclass.yaml

kubectl apply -f elasticsearch-pvc.yaml

kubectl apply -f elasticsearch-statefulset.yaml

And then we look at pod behavior.

Once the Pods are deployed, we can check whether the Elasticsearch cluster is working properly by requesting a REST API. Forward local port 9200 to the port corresponding to the Elasticsearch node (e.g. es-cluster-0) using the following command:

Then we open another window.

Normally, there should be such information.

Seeing the above information indicates that our Elasticsearch cluster named k8s-logs successfully created 3 nodes: es-cluster-0, es-cluster-1, and es-cluster-2, and the current master node is es-cluster-0.

2. Creating Kibana Services

Elasticsearch cluster started successfully. Next, we can deploy Kibana service and create a new file named kibana.yaml. The corresponding file content is as follows:

Above we define two resource objects, a Service and Deployment. For testing convenience, we set Service to NodePort type. The configuration in Kibana Pod is relatively simple. The only thing to note is that we use ELASTICSEARCH_URL to set the endpoint and port of Elasticsearch cluster. Just use Kubernetes DNS directly. The corresponding service name of this endpoint is elasticsearch. Because it is a headless service, So the domain resolves to a list of 3 IP addresses for Elasticsearch Pods.

And then we create this service.

kubectl apply -f kibana.yaml

After a while, our kibana service was up.

If the Pod is already Running, it proves that the application has been successfully deployed. Then you can access Kibana through NodePort. Open http://:30245 in your browser. If you see the following welcome interface, it proves that Kibana has been successfully deployed to the Kubernetes cluster.

3. Deploy Fluentd

Fluentd is an efficient log aggregator, written in Ruby, and well extensible. Fluentd is efficient enough and consumes relatively few resources for most enterprises. Fluent-bit is lighter and takes up less resources, but plug-ins are not rich enough than Fluentd, so overall Fluentd is more mature and widely used, so we also use Fluentd as a log collection tool here.

working principle

Fluentd grabs log data from a given set of data sources, processes it (converts it into a structured data format), and forwards it to other services, such as Elasticsearch, object storage, and so on. Fluentd supports over 300 log storage and analytics services, so it's very flexible in this regard. The main operation steps are as follows:

First Fluentd gets data from multiple log sources

Structuring and labeling this data

Data is then sent to multiple target services based on matching tags

Log Source Configuration

For example, in order to collect all container logs on Kubernetes nodes, we need to configure the log source as follows:

routing configuration

Here is the configuration of the log source, let's see how to send log data to Elasticsearch:

match: identifies a target label followed by a regular expression matching the log source, we want to capture all logs and send them to Elasticsearch, so we need to configure it as **.

id: A unique identifier of the target.

type: supported output plug-in identifier, we want to output to Elasticsearch here, so configure it as Elasticsearch, which is a built-in plug-in of Fluentd.

log_level: Specifies the log level to capture, which we configured as info here, meaning that any logs at or above that level (INFO, WARNING, ERROR) will be routed to Elsasticsearch.

host/port: Define the address of Elasticsearch, and you can also configure authentication information. Our Elasticsearch does not require authentication, so you can directly specify host and port here.

logstash_format: The Elasticsearch service searches the log data by building a reverse index. Set logstash_format to true and Fluentd will forward the structured log data in logstash format.

Buffer: Fluentd allows caching when the target is unavailable, for example, if there is a network failure or Elasticsearch is unavailable. Buffer configuration also helps reduce IO to disk.

4. installation

To collect logs for a Kubernetes cluster, deploy the Fluentd application directly with the DasemonSet controller so that it can collect logs from Kubernetes nodes, ensuring that a Fluentd container is always running on each node in the cluster. Of course, you can use Helm directly for one-click installation, in order to understand more implementation details, we still use manual method to install here.

First, we specify the Fluentd configuration file through the ConfigMap object, and create a new fluentd-configmap.yaml file with the following contents:

In the above configuration file, we configure the docker container log directory and the collection of docker and kubelet application logs. The collected data is processed and sent to elasticsearch:9200 service.

Then create a new file called fluentd-daemonset.yaml, which reads as follows:

We mount the ConfigMap object created above into the Fluentd container through volumes. In addition, in order to flexibly control which nodes 'logs can be collected, we also add a nodSelector attribute here:

In addition, since our cluster is built using kubeadm, the master node is tainted by default, so if you want to collect the logs of the master node, you need to add intolerance:

Then we create the configmap object above and the daemonset service

We can see that the pod is working properly.

Then we go to kibana's page and click discover.

Here you can configure the Elasticsearch index we need. In the Fluentd configuration file above, the logs we collect use logstash format. Here, you only need to enter logstash-* in the text box to match all log data in the Elasticsearch cluster, and then click Next to enter the following page:

In this page, configure which field to use to filter log data by time. In the drop-down list, select the @timestamp field, and then click Create index pattern. After creation, click Discover in the left navigation menu, and then you can see some histograms and recently collected log data: