How to implement an ESaaS Architecture in Kubernetes 04/08 Update SLTechnology News&Howtos

How to implement an ESaaS Architecture in Kubernetes

2025-04-08 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

Kubernetes in how to achieve an ESaaS architecture, many novices are not very clear about this, in order to help you solve this problem, the following editor will explain in detail for you, people with this need can come to learn, I hope you can gain something.

In an ES cluster, what should we do after a data/master/client node dies?

PS: in my environment, all data nodes are client and client is not deployed independently.

If you deploy client independently, then client can hang up a new recreate, and be careful to clean up the old / data data.

Master / data saves cluster metadata. A master down can recreate a new one. The new master node can actually take advantage of the old master node / data data or not. There is no need for intervention in scheduling. However, it is recommended to clean up / data data.

After the data node is hung up, it is not allowed to restart with the original data. You need to recreate a new blank data node. Scheduling intervention is needed to ensure that the new data node does not use the old data node / data data. In fact, delete the old data Node hung up, need to clean the / data directory, the new data node can not take advantage of the old data. However, in order to prevent the failure of deleting data, it is recommended to schedule to a different server.

Do both client and master have persistent data in the / data directory?

Client / data also has metadata information, which is used as a "smart router" to accept and forward cluster requests.

The / data directory of master also has cluster metadata and cannot share the / data directory with data node.

How to guarantee the HA of data node? Across servers, across racks.

First Lable the server with the cabinet rack (esaas.hosts.rack=$ {rack_id})

Anti-compatibility of Pod scheduling through Pod Anti-affinity

PodAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution:-weight: 100podAffinityTerm: labelSelector: matchExpressions:-key: es-cluster-name operator: In values:-${es-cluster-name} topologyKey: esaas.hosts.rack=$ {rack_id}

Note that when using pod anti-affinity of type requiredDuringSchedulingIgnoredDuringExecution, k8s admission controller disable LimitPodHardAntiAffinityTopology is required, otherwise topologyKey: kubernetes.io/hostname will be used by default instead of custom topologyKey.

Cross-rack deployment is done online as above, and cross-server deployment is fine in the development test environment, using topologyKey: kubernetes.io/hostname.

How do I set the vm property vm.max_map_count of the ES container?

Set through init container, and the privilege of init contaienr is required to be true.

All types of node (client, master, data) use the same vm configuration by default.

InitContainers:-name: init-sysctl image: busybox imagePullPolicy: IfNotPresent command: ["sysctl", "- w", "vm.max_map_count=262144"] securityContext: privileged: true

How do I turn off the configuration of ES swap?

Method 1: close the swap of the physical server

Method 2: configure the configuration item bootstrap.mlockall: true for each ES node, requiring the es container to add the two Linux Capacity of CAP_IPC_LOCK,SYS_RESOURCE.

SecurityContext: privileged: false capabilities: add:-IPC_LOCK-SYS_RESOURCE

The ES configuration item minimum_master_nodes setting is also injected through env, and then elasticsearch.yml is regenerated in the container startup script run.sh.

Injecting environment variable NUMBER_OF_MASTERS into POD

If the cluster scale up/down, you may need to set this environment variable dynamically

Personally, I think that the minimum_master_nodes configuration item of the ES cluster is not artificially set, and the ES cluster will dynamically adjust according to the number of master node when the election is triggered.

Modify file descriptors, recommend 212644 (64K).

To modify the / etc/security/limits.conf in the container, you should also need privilege or the corresponding Linux capability.

Es data is deployed through K8S StatefulSet. Each es data Pod creates a corresponding PV through volumeClaimTemplates, and mounts the / data/$ {es_cluster_name} on the host to the / data directory in the es data container. When the container drifts or recreate, es junk data on the old server needs to be cleaned up.

HostPath PV supports Reclain Policy like Recycle. (currently only HostPath and NFS support Recycle)

Recycle-> basic scrub (rm-rf / thevolume/*)

How to operate scale down and scale up respectively?

The monitoring status of the ES cluster is green

Make sure that after downsizing, max (index1.replicas, index2.replicas, …) + 1 < data-node-num

Other inspection

Scale down/up es-clients, scale directly according to HA's idea without any other operation.

Scale down/up es-masters. After scale according to HA, you need to call the ES API to modify the minimum_master_nodes.

Scale down/up es-datas, after scale up according to HA, no other operations are required; when scale down, you need to clean up the directories (data) on the corresponding hostpath. Only one es data node,scale down can be reduced each time: check before operation:

What should I do if a physical server wants to go offline?

The third point refers to the HA scheme, which ensures that there will be at most a single cient/master/data node in an ES cluster on each server, so the server offline will only down a single cient/master/data node of the ES cluster at most, which will not affect the regular ES cluster. In this case, kubectl drain is executed directly on the server to drive deployment,StatefulSet pods out, and a new cient/master/data node is automatically recreate on other appropriate servers. The ES cluster works throughout the process.

If the user deploys a single-node ES instance, then following the above steps will inevitably lead to the loss of the user's data and make the ES service unavailable for a long time. Therefore, users need to be warned of the risk, and it is recommended to expand the capacity first, and then wait until the synchronization data of the new node is completed before killing the original es instance.

Some server suddenly down, what's the automatic process next?

After the server down, since HA is taken into account in scheduling deployment, it will not affect the use of regular ES clusters.

Then about 5min (set via pod-eviction-timeout) time, the new client/master/data container will be recreate on the new Node. In this way, the size of the original ES cluster will be maintained.

What is the installation process of the ES plug-in?

ElastcSearch Plugin repository is provided within the CaaS cluster to store common plug-ins for download

When initializing the deployment of an ES cluster, users can choose to install plugins (support site and jar types), download plug-in files in init container and load these plugins automatically when plugin-volume,ES starts.

If you want to install plugins during the use of ES cluster, for plugin of site type, call the API Kubernetes exec to download the corresponding Site plugin file to the corresponding plug-in directory; for plugin of jar type, plug-in to the corresponding plugin-volume directory first. Since you need to restart the ES instance, restart the ES container by executing kubectl exec POD_NAME-c CONTAINER_NAME reboot or docker kill $containerName instead of recreate Pod.

Since multiple ES cannot share the same plugin directory, it is necessary to divide each ES instance into a separate plugin-volume and mount a different hostpath on the host.

For the ES management class plugin, you need to specify which ES node the plug-in is deployed to (it is recommended to deploy on a master node). Later, the plugin can only be accessed by accessing the plugin API of the ES node.

For the ES functional class plugin (such as the ik word splitter), you need to install the plugin on all ES cluster nodes (client, master, data).

Before installing the jar plug-in to restart the es node, you must check that the health status of the es cluster is green.

When the es node is restarted, pay attention to the Fault Detection mechanism of ES, and configure discovery.zen.fd.ping_interval (1s), ping_timeout (30s) and ping_retries (3), that is, if there is no ping connection within 90s by default, the node is considered to have failed, and then multipart migration is carried out. We are concerned about this configuration and can scale it up appropriately if necessary.

The discussion decided to support only plug-ins of jar type at first, and then consider plug-ins of sites class (for example, sites plugin is stored locally through ESaaS, which is shared by all ES clusters)

How can the self-developed ES monitoring tool interface with the ES cluster?

The monitoring tool supports API to dynamically add ES cluster information. You only need to add the IP and Port (9200) of any node (client/master/data) in the cluster to the monitoring tool. If there is client, give the information to client. If there is only data node, give 9200 of data node to the monitoring tool.

When ESaaS creates an ES cluster, whether the configuration is automatically added to the monitoring platform. If so, call back the "ADD_ES_CLUSTER" interface of the monitoring platform after the deployment of the ES cluster.

The ESaaS page provides a jump link to the monitoring platform, and the monitoring platform manages the rights of the monitoring information. When you add an ES cluster, you need to take the user's ID with you.

How does the deployment of the Kibana service interface with the ES cluster?

When initializing the deployment of an ES cluster, users can check whether to automatically create the corresponding Kibana service.

It also provides a page entry to create a separate Kibana service, requiring users to select a docking ES cluster.

By injecting the environment variable ELASTICSEARCH_URL: http://es-client.namespace-1.svc.pro.es::9200

All the discovery hosts in the elasticsearch.yaml of ES nodes only need to configure the domain name or IP list of master nodes. Of course, if you configure all (client, data, master nodes) IP lists, there will be no exception.

When initializing the creation of an ES cluster, be careful to ensure that all master nodes are started successfully before creating the data nodes. The order in which client nodes is created is not required.

ES node's JVM memory configuration, which supports 8Magi 16jie 32g. The maximum is not allowed to exceed 32g, otherwise the performance will be even worse because of GC problems.

Log collection of ES cluster and access to EFK log system. At the same time, it provides the web console capability of each container of the ES cluster, so that users can view (slow) logs and other operations in web console.

ES zen discovery We use the domain name (K8S Service Name) to resolve es master IP List. We need to pay attention to hosts.resolve_timeout. Default is 5s.

ES 2.x and 5.x node 6.x are different in role division. So far, I have only tested version 2.x, and 5.x and 6.x need to be adjusted appropriately.

Guide inspection

Startup error message: system call filters failed to install; check the logs and fix your configuration or disable system call filters at your own risk

Solution: elasticsearch.yml plus bootstrap.system_call_filter: false

Startup error: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]

Solution: vi / etc/sysctl.conf

Vm.max_map_count=262144 sysctl-p

Startup error: max number of threads [1024] for user [push] is too low, increase to at least [2048]

Solution: vi / etc/security/limits.d/90-nproc.conf

Soft nproc 2048

Startup error: max file descriptors [65535] for elasticsearch process likely too low, increase to at least [65536]

Solution: ulimit-n 65536 or vi / etc/security/limits.conf

Soft nofile 65536 hard nofile 65536

Heap size check heap size check

File descriptor check file descriptor (file handle) check

Memory lock check memory Lock check

Maximum number of threads check thread maximum check

Maximum size virtual memory check virtual memory maximum check

Maximum map count check map count maximum check

Client JVM check JVM client check

Use serial collector check

System call filter check

OnError and OnOutOfMemoryError checks

Early-access check

G1GC check G1 garbage Collector check

When users apply for an ES cluster on ESaaS Portal, they need to specify not only the cpu,mem of each nodes, but also the amount of local storage space they need. Kubernetes 1.9 needs to consider whether the server storage space can meet the needs when scheduling.

Is it helpful for you to read the above content? If you want to know more about the relevant knowledge or read more related articles, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.