How to understand the shared memory between Pod in Kubernetes 07/17 Update SLTechnology News&Howtos

How to understand the shared memory between Pod in Kubernetes

2025-07-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

How to understand the shared memory between Pod in Kubernetes? in view of this problem, this article introduces the corresponding analysis and solution in detail, hoping to help more partners who want to solve this problem to find a more simple and feasible method.

In the process of pursuing performance, some common service components are so tightly coupled with the business that these basic components will be packaged when creating a basic image. Therefore, when the business image is started, there are a lot of processes in the container, which makes Kubernetes's management of Pod very dangerous. In order to reduce the size of the business container, but also to make the management of the basic components more independent and convenient, the basic components are stripped from the business image and deployed as DaemonSet containers. However, some basic components Agent and business Pod communicate through shared memory, and the shared memory scheme across Pod in the same Node is the first problem to be solved.

First, why to deploy the common basic component Agent to DaemonSet

Self-developed common basic components, such as service routing components, security components, etc., are usually deployed on Node in a process manner and provide services for all businesses on Node at the same time. After micro-services and containerization, the number of services increases by hundreds of thousands. If you deploy in the way of sidecar or packaged into business Image to continue Per Pod Per Agent, then the pressure on the server side of the basic components may also grow by hundreds, and the risk is great. Therefore, we want to be able to deploy the Agents of these components as DaemonSet.

Let's start with the popularity of Kubernetes today, what are the problems if these basic components are not stripped from the business Pod:

There are a lot of processes in the business container. When we apply for resources (cpu/mem request and limit) for Pod, we should consider not only the resource consumption of the business application itself, but also the resource consumption of these basic components. And once some Agent have Bug, such as memory leaks, this will cause the Pod implications to be rebuilt, and even Cgroup OOM may kill the business process when it is in the kill process.

It violates the best practice of Kubernetes& micro-service deployment: Per Process Per Contaienr, and the business process runs in the foreground to make it live and die with the container, otherwise Kubernetes will not be able to associate to the container state according to the business process state, and then carry on the high availability management.

If you run 10 Pod on a Node, you will have the number of basic components of x10 on the Node. Before containerization, a Node only needs to deploy a component process. After containerization, the number of component Agents in the cluster will grow dozens of times. If the business is split into micro services, this index will be even greater. It is unknown whether the server of these basic components can withstand communication requests that are dozens or hundreds of times higher than in the past.

If you want to upgrade a basic component Agent across the network, you may go crazy, you need to redo all the business images, and then the network-wide business needs to be grayscale upgraded. Because of an Agent upgrade, you have to rebuild your business Pod. You may say that the basic components Agents will have their own hot upgrade scheme, and we can upgrade through their solution, then you will introduce a lot of trouble: because the hot upgrade of Agents cannot be perceived by Kubernetes, it will cause data inconsistency in the cluster in Kubernetes, so we really have to go back to the deployment of virtual machines or physical machines. Of course, we also want to achieve this kind of demand through Operator, but the cost is too high, and it is not CloudNative!

All of the above problems can be solved by divesting the underlying component Agents from the business Pod, and the benefits of architectural decoupling are needless to say. And we can manage these basic components Agents through Kubernetes and enjoy the benefits of self-healing, rolling upgrades and so on.

2. Linux shared memory mechanism

However, the ideal is beautiful and the reality is cruel. The first problem to solve is that some components Agent and business Pod communicate through shared memory, which runs counter to the best practices of Kubernetes& microservices.

As we all know, Kubernetes shares IPC within a single Pod, and you can share the same memory Volume by mounting EmptyDir Volume with Medium as Memory.

First, let's take a look at two mechanisms of Linux shared memory:

POSIX shared memory (shm_open (), shm_unlink ())

System V shared memory (shmget (), shmat (), shmdt ())

Among them, System V shared memory has a long history, general UNIX systems have this mechanism, while POSIX shared memory mechanism interface is more convenient to use, generally combined with memory mapping mmap.

The main differences between mmap and System V shared memory are:

Sysv shm is persistent and remains in memory until the system shuts down unless explicitly deleted by a process

The memory of the mmap mapping is not persistent, and if the process shuts down, the mapping fails unless it has been mapped to a file in advance

/ dev/shm is the default mount point of sysv shared memory under Linux

POSIX shared memory is implemented based on tmpfs. In fact, further, not only PSM (POSIX shared memory), but also SSM (System V shared memory) is implemented in the kernel based on tmpfs.

You can see from here that tmpfs has two main functions:

For SYSV shared memory, as well as anonymous memory mapping; this part is managed by the kernel and is invisible to users

For POSIX shared memory, the user is responsible for mount, and generally mount to / dev/shm; depends on CONFIG_TMPFS

Although both System V and POSIX share memory through tmpfs, they are subject to different limitations. That is, / proc/sys/kernel/shmmax only affects SYS V shared memory, and / dev/shm only affects Posix shared memory. In fact, System V and Posix share memory using two different tmpfs instances (instance).

The memory space that can be used by SYS V shared memory is limited only by / proc/sys/kernel/shmmax, while users mount / dev/shm, which defaults to 1 and 2 of physical memory.

To sum up:

Both POSIX shared memory and SYS V shared memory kernel are implemented through tmpfs, but corresponding to two different tmpfs instances, they are independent of each other.

The maximum value of SYS V shared memory can be limited through / proc/sys/kernel/shmmax, and the maximum value of POSIX shared memory (the sum of all) can be limited with / dev/shm.

III. Boast of Pod's shared memory scheme on the same Node

After the basic component Agents DaemonSet is deployed, Agents and business Pod have different Pod on the same Node, so how can Kubernetes support these two types of shared memory mechanisms?

Of course, security sacrifices have been made, but IPC isolation is not available before de-containment, so this is acceptable.

Fourth, gray level on the line

For the inventory business in the cluster, the Agents and the business were all packaged in the same docker image, so a grayscale online solution is needed to ensure that the inventory business is not affected.

First, create the corresponding Kubernetes ClusterRole, SA, ClusterRoleBinding, and PSP Object. For more information about PSP, please refer to the official documentation to introduce pod-security-policy.

Select any part of Node in the cluster and type Node with Label (AgentsDaemonSet:YES) and Taint (AgentsDaemonSet=YES:NoSchedule).

$kubectl label node $nodeName AgentsDaemonSet=YES

$kubectl taint node $nodeName AgentsDaemonSet=YES:NoSchedule

(Android can swipe left and right to see all the code.)

Deploy the DaemonSet corresponding to Agent (note that DaemonSet needs to add the corresponding NodeSelector and Toleration, Critical Pod Annotations), Sample as follows:

ApiVersion: apps/v1

Kind: DaemonSet

Metadata:

Name: demo-agent

Namespace: kube-system

Labels:

K8s-app: demo-agent

Spec:

Selector:

MatchLabels:

Name: demo-agent

Template:

Metadata:

Annotations:

Scheduler.alpha.kubernetes.io/critical-pod: ""

Labels:

Name: demo-agent

Spec:

Tolerations:

-key: "AgentsDaemonSet"

Operator: "Equal"

Value: "YES"

Effect: "NoSchedule"

HostNetwork: true

HostIPC: true

NodeSelector:

AgentsDaemonSet: "YES"

Containers:

-name: demo-agent

Image: demo_agent:1.0

VolumeMounts:

-mountPath: / dev/shm

Name: shm

Resources:

Limits:

Cpu: 200m

Memory: 200Mi

Requests:

Cpu: 100m

Memory: 100Mi

Volumes:

-name: shm

HostPath:

Path: / dev/shm

Type: Directory

Deploy the business Pod without the basic component Agent on the Node, and check whether all the basic components and businesses are working properly. If so, select the remaining Nodes in batches, plus Label (AgentsDaemonSet:YES) and Taint (AgentsDaemonSet=YES:NoSchedule), and DaemonSet Controller will automatically create these DaemonSet Agents Pod in these Nodes. In this way, the gray level of the basic component Agents in the cluster is completed batch by batch.

In the case of high concurrency business, especially the basic components implemented in the code of C _ Kubernetes Pod, shared memory communication mechanism is often used to pursue high performance. The editor puts forward a compromise scheme of Posix/SystemV shared memory between CPUs, at the expense of security, please know. Of course, if there is no pressure on the server side of the basic service after the micro-service / containerization transformation, it is recommended to deploy the Agents of the basic service and the business Container in the same Pod as SideCar Container, and make use of the shared IPC feature of Pod and Memory Medium EmptyDir Volume to share memory.

This is the answer to the question about how to understand the shared memory between Pod in Kubernetes. I hope the above content can be of some help to you. If you still have a lot of doubts to be solved, you can follow the industry information channel for more related knowledge.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.