Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Case Analysis of Kubernetes Container isolation problem

2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

This article mainly explains the "Kubernetes container isolation problem case analysis", the content of the article is simple and clear, easy to learn and understand, now please follow the editor's train of thought slowly in depth, together to study and learn "Kubernetes container isolation problem case analysis"!

Background

The / proc pseudo file system information seen in the container is the host's / proc, and not isolating / proc means that the proc information related to the processes in the container cannot be obtained. In addition, some applications that need to read proc information will get the wrong data. / proc/meminfo,/proc/cpuinfo, / proc/stat, / proc/uptime, / proc/loadavg, etc.

Mapping of user UID/GID, resulting in processes in the container having permissions of the same uid/gid user on the host

Requirements Analysis solution Survey similar issues have been discussed in the Docker community, https://github.com/docker/docker/issues/8427, which can be implemented through kernel patch or bind mount / proc. Taobao team sent a kernel patch https://github.com/alibaba/taobao-kernel/blob/master/patches.taobao/overlayfs-0005-vfs-introduce-clone_private_mount.patch a few years ago.

There are mainly the following options for discussion in the industry:

# # option 1.-modify the proc file system directly-https://lkml.org/lkml/2012/5/28/299- mount-t proc-o meminfo-from-cgroup none / path/to/container/proc disadvantage: it is impossible to merge into the kernel # # option 2.-Procg scheme: Mount the file system into the container Replace the original proc file system-replace the original / proc/meminfo information by reading the information specified by cgroup-https://github.com/fabiokung/procg/ disadvantages: there is no open read memory data function interface in cgroup of the kernel. Third, upgrade scheme of docker container procps software package based on lxcfs-modify free, top, uptime and other source code shortcomings: it is not a widely accepted scheme. Not to mention the bug and cost of modifying the command yourself. Different versions of linux may require different patches. Solution

The FUSE file system prepared for LXC provides the following features:

* a cgroupfs compatible view for unprivileged containers* a set of cgroup-aware files: * cpuinfo * meminfo * stat * uptime user space file system (Filesystem in Userspace, FUSE)

The user space file system is a concept in the operating system, which refers to the file system implemented entirely in the user mode.

Linux currently supports this through kernel modules. Some file systems such as ZFS,glusterfs are implemented using FUSE.

How FUSE works is shown in the figure above. Suppose the FUSE-based user-mode file system hello is mounted in the / tmp/fuse directory. When the application layer program wants to access the files under / tmp/fuse, the system calls are made through the functions in glibc, and the functions in VFS that handle these system calls call the FUSE file system in the kernel; the FUSE file system in the kernel sends the user's request to the user-mode file system hello; user-mode file system, processes the request, and returns the result to the FUSE file system in the kernel. Finally, the FUSE file system in the kernel returns the data to the user-mode program.

The Linux kernel supports FUSE module from 2.6.14.

Implement a file system in user space

Libfuse: a fuse library in user space that is accessible to non-privileged users.

LXCFS-user Space File system based on FUSE

From the point of view of the file system: by calling the libfuse library and the kernel FUSE module interaction

Two basic functions

Let each container have its own view of the cgroup file system, similar to Cgroup Namespace

Provide virtual proc file system inside the container

LXCFS perspective

As you can see from the main function, the initialization process includes:

Mount the runtime working directory / run/lxcfs/controllers/ to the tmpfs file system

Remount each group subsystem of the current system to the / run/lxcfs/controllers/ directory

Call the main function fuse_main () of the libfuse library to specify the target directory of the user-mode file system-/ var/lib/lxcfs/

Use the ops method of struct fuse_operations to interact with the FUSE module in the kernel. Lxcfs.c:701

Container perspective

Mount the virtual proc file system to the docker container

Users read / proc/meminfo, cpuinfo and other information in the container

Realize reading meminfo in proc_meminfo_read operation

Process: get the pid of the meminfo process and pass it to lxcfs-- > get the cgroup packet of pid-- > the cgroup subsystem information of the process corresponding to the / cgroup directory of host

The existing problems and how to solve them

What is the deployment problem of LXCFS and what is the impact and cost? See the supplementary LxcFS-k8s practice below

Fault recovery, how to automatically remount? If the lxcfs process is restarted, the / proc/cpuinfo and so on in the container will report to transport connected failed because / var/lib/lxcfs will be deleted and rebuilt, and the inode has changed. So refer to Douban's practice, share mount events, and remount the container.

Https://github.com/lxc/lxcfs/issues/193

Https://github.com/alibaba/pouch/issues/140

User Namespace

Solve what problem?

Mapping of user UID/GID, resulting in processes in the container having permissions of the same uid/gid user on the host

Docker has supported user namespace isolation since version 1.10. Usage parameter: DOCKER_OPTS= "--userns-remap=default"

Links to related documents:

Analysis of docker Container isolation enhanced by LXCFS on December 9, 2015

Docker container display problems and repair 2017-03-23

Detailed Annotation Analysis of lxc-1.0.9 lxcfs-2.0.0 fuse-2.8.7 Source Code

Road to Kubernetes 2-using LXCFS to improve visibility of Container Resources

Kubernetes Initializers

See how Big Ali pouch solves the remount problem.

Kubernetes practice of lxcfs

Note: the following is from link 4 of the document with minor modifications

First, we will install and start lxcfs on the cluster node, and we will run the lxcfs FUSE file system using Kubernetes, container and DaemonSet.

All sample code in this article can be obtained from Github at the following address

Git clone https://github.com/denverdino/lxcfs-initializercd lxcfs-initializer

The manifest file is as follows

ApiVersion: apps/v1beta2kind: DaemonSetmetadata: name: lxcfs labels: app: lxcfsspec: selector: matchLabels: app: lxcfs template: metadata: labels: app: lxcfsspec: hostPID: true tolerations:-key: node-role.kubernetes.io/master effect: NoSchedule containers:-name: lxcfs image: dockerhub.nie.netease.com/whale/lxcfs:2.0.8 ImagePullPolicy: Always securityContext: privileged: true volumeMounts:-name: rootfs mountPath: / host volumes:-name: rootfs hostPath: path: /

Note: since lxcfs FUSE needs to share the PID namespace of the system and requires privileged mode, we have configured the corresponding container startup parameters.

Isn't it easy to automatically install and deploy lxcfs on all cluster nodes with the following command? : -)

Kubectl create-f lxcfs-daemonset.yaml

So how do you use lxcfs in Kubernetes? As above, we can add the definition of volume (file volume) and volumeMounts (file volume mount) to the definition of Pod for the files under / proc. However, this makes the K8S application deployment files become more complex, is there any way to let the system automatically complete the mounting of the corresponding files?

Kubernetes provides an Initializer extension mechanism that can be used to intercept and inject resource creation, and we can use it to gracefully automate the mount of lxcfs files.

The manifest file is as follows

ApiVersion: rbac.authorization.k8s.io/v1kind: ClusterRolemetadata: name: lxcfs-initializer-default namespace: kube-systemrules:- apiGroups: ["*"] resources: ["deployments"] verbs: ["initialize", "patch", "watch" "list"]-apiVersion: v1kind: ServiceAccountmetadata: name: lxcfs-initializer-service-account namespace: kube-system---kind: ClusterRoleBindingapiVersion: rbac.authorization.k8s.io/v1metadata: name: lxcfs-initializer-role-bindingsubjects:- kind: ServiceAccount name: lxcfs-initializer-service-account namespace: kube-systemroleRef: kind: ClusterRole name: lxcfs-initializer-default apiGroup: rbac.authorization.k8s.io---apiVersion: apps/v1beta1kind: Deploymentmetadata: initializers: pending: [] labels: App: lxcfs-initializer name: lxcfs-initializerspec: replicas: 1 template: metadata: labels: app: lxcfs-initializer name: lxcfs-initializerspec: serviceAccountName: lxcfs-initializer-service-account containers:-name: lxcfs-initializer image: dockerhub.nie.netease.com/whale/lxcfs-initializer:0.0.2 imagePullPolicy: Always args:-" -annotation=initializer.kubernetes.io/lxcfs "-"-require-annotation=true "- apiVersion: admissionregistration.k8s.io/v1alpha1kind: InitializerConfigurationmetadata: name: lxcfs.initializerinitializers:-name: lxcfs.initializer.kubernetes.io rules:-apiGroups: -" * "apiVersions: -" * "resources:-deployments

Note: this is a typical Initializer deployment description. First, we create a service account lxcfs-initializer-service-account and grant it permissions to find, change, and so on "deployments" resources. Then we deploy an Initializer named "lxcfs-initializer", and use the above SA to start a container to handle the creation of "deployments" resources. If the deployment contains a comment with initializer.kubernetes.io/lxcfs as true, the container in the application will be mounted with files.

We can execute the following command, and after the deployment is complete, we can play happily.

Kubectl apply-f lxcfs-initializer.yaml

Let's deploy a simple Apache application, allocate 256MB memory to it, and declare the following annotation "initializer.kubernetes.io/lxcfs": "true"

The manifest file is as follows

ApiVersion: apps/v1beta1kind: Deploymentmetadata: annotations: "initializer.kubernetes.io/lxcfs": "true" labels: app: web name: replicas: 1 template: metadata: labels: app: web name: webspec: containers:-name: web image: httpd:2 imagePullPolicy: Always resources: requests: memory "256Mi" cpu: "500m" limits: memory: "256Mi" cpu: "500m"

We can deploy and test in the following ways

$kubectl create-f web.yaml deployment "web" created$ kubectl get podNAME READY STATUS RESTARTS AGEweb-7f6bc6797c-rb9sk 1 Running 0 32s $kubectl exec web-7f6bc6797c-rb9sk free total used free shared buffers cachedMem: 262144 2876 259268 2292 0 304 buffers/cache + buffers/cache: 2572 259572Swap: 0 000

We can see that the total memory returned by the free command is the container resource capacity we set.

We can check the configuration of the above Pod, and sure enough, all the relevant procfs files have been mounted correctly.

$kubectl describe pod web-7f6bc6797c-rb9sk... Mounts: / proc/cpuinfo from lxcfs-proc-cpuinfo (rw) / proc/diskstats from lxcfs-proc-diskstats (rw) / proc/meminfo from lxcfs-proc-meminfo (rw) / proc/stat from lxcfs-proc-stat (rw).

In Kubernetes, a similar function can be achieved through Preset, and the space is limited. I won't repeat it in this article.

Thank you for your reading. the above is the content of "case Analysis of Kubernetes Container isolation problem". After the study of this article, I believe you have a deeper understanding of the case analysis of Kubernetes container isolation problem, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report