How to understand cgroups in docker 07/01 Update SLTechnology News&Howtos

How to understand cgroups in docker

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

This article will explain in detail how to understand cgroups in docker. The content of the article is of high quality, so the editor shares it for you as a reference. I hope you will have a certain understanding of the relevant knowledge after reading this article.

Understand docker, mainly from namesapce,cgroups, federated files, runtime (runC), network several aspects. Next we will take some time to introduce them separately.

Namesapce is mainly used for isolation, cgroups is mainly for resource restrictions, federated files are mainly used for image hierarchical storage and management, and runC is run-time, following the oci interface, generally based on libcontainer. The network is mainly docker stand-alone network and multi-host communication mode.

Introduction to cgroups what is cgroups?

Cgroup is the abbreviation of control group, which is a feature provided by the Linux kernel, which is used to restrict and isolate the use of system resources by a group of processes, that is, to do resource QoS, which mainly includes CPU, memory, block Imax O and network bandwidth. Cgroup has entered the kernel mainline since 2.6.24, and the Cgroup feature is turned on by default in all major distributions.

Cgroups provides the following four main functions:

Resource limits (Resource Limitation): cgroups can limit the total amount of resources used by process groups. If you set the upper limit of memory used when the application is running, OOM (Out of Memory) will be issued as soon as this quota is exceeded.

Priority allocation (Prioritization): through the number of CPU time slices allocated and the size of the hard disk IO bandwidth, it actually controls the priority of the process to run.

Resource Statistics (Accounting): cgroups can count the resource usage of the system, such as CPU usage, memory usage, and so on. This function is very suitable for billing.

Process control (Control): cgroups can perform suspending, resuming, and other operations on a process group.

Three components in Cgroups

Cgroup control group. Cgroup is a mechanism for grouping and managing processes. A cgroup contains a group of processes, and the configuration of various parameters of Linux subsystem can be added to this cgroup to associate a group of processes with a set of subsystem system parameters.

Subsystem subsystem. Subsystem is a set of resource control modules. This one will be described in more detail below.

Hierarchy hierarchical tree. The function of hierarchy is to string a set of cgroup into a tree structure, one such tree is a hierarchy, through this tree structure, Cgroups can be inherited. For example, my system limits the utilization of CPU through cgroup1 for a group of scheduled task processes, and then a process with a scheduled dump log also needs to limit disk IO. In order to avoid affecting other processes, you can create an IO that cgroup2 inherits from cgroup1 and limits the disk, so that cgroup2 inherits the restrictions of CPU in cgroup1 and increases the restrictions of disk IO without affecting other processes in cgroup1.

Cgroups subsystem

The subsystems implemented in cgroup and their functions are as follows:

Devices: device permission control.

Cpuset: allocates specified CPU and memory nodes.

Cpu: controls CPU occupancy.

Cpuacct: statistics of CPU usage.

Memory: limit the upper limit of memory usage.

Freezer: freeze (pause) processes in Cgroup.

Net_cls: work with tc (traffic controller) to limit network bandwidth.

Net_prio: sets the priority of network traffic for the process.

Huge_tlb: limit the use of HugeTLB.

Perf_event: allows Perf tools to do performance monitoring based on Cgroup grouping.

There are more detailed settings under the directory of each subsystem, such as:

Cpu

In addition to limiting the use of CPU, cgroup can also bind tasks to specific CPU so that they run only on those CPU, which is the function of cpuset subresources. In addition to CPU, you can also bind memory nodes (memory node).

Before adding a task to cpuset's task file, the user must set the cpuset.cpus and cpuset.mems parameters.

Cpuset.cpus: sets the CPU that can be used by tasks in cgroup, formatted as a list separated by commas (,), and the minus sign (-) can indicate a range. For example, 0-2 CPU 7 denotes 0th focus 1, and 7 cores.

Cpuset.mems: sets the memory nodes that tasks can use in cgroup, in the same format as cpuset.cpus.

Memory:

Memory.limit_bytes: the maximum memory usage is forcibly limited in k, m and g units. If you enter-1, there is no limit.

Memory.soft_limit_bytes: soft limit, which is meaningful only for hours greater than the value set by the forced limit. The format is the same as above. When the overall memory is tight, the memory acquired by task is limited to the soft limit to ensure that there are not too many processes starving because of memory. As you can see, the addition of memory resource restrictions does not mean that there is no resource competition.

Memory.memsw.limit_bytes: sets the usage limit for the sum of the maximum memory and the swap area memory. The format is the same as above.

Here we specifically talk about the parameters related to monitoring and statistics, such as those collected by cadvisor.

Memory.usage_bytes: reports the current total amount of memory (in bytes) used by processes in this cgroup.

Memory.max_usage_bytes: reports the maximum amount of memory used by processes in this cgroup.

How docker uses cgroup

Create a container

# Run a container that will spawn 300 processes.docker run cirocosta/stress pid-n 300Starting to spawn 300 blocking children [1] Waiting for SIGINT# Open another window and see that we have 30 PIDSdocker statsCONTAINER... MEM USAGE / LIMIT PIDSa730051832... 21.02MiB / 1.951GiB 300

Verify that Docker has placed some cgroup for this container

# let's get the ID of the container. Docker uses that ID# to name things in the host to we can probably use it to# find the cgroup created for the container# under the parent docker cgroupdocker psCONTAINER ID IMAGE COMMAND a730051832e7 cirocosta/stress "pid-n 300" # Having the prefix in hands, let's search for it under the # mountpoint for cgroups in our system find / sys/fs/cgroup/-name "a730051832e7*" / sys/fs/cgroup/cpu Cpuacct/docker/a730051832e7d776442b2e969e057660ad108a7d6e6a30569398ec660a75a959/sys/fs/cgroup/cpuset/docker/a730051832e7d776442b2e969e057660ad108a7d6e6a30569398ec660a75a959/sys/fs/cgroup/devices/docker/a730051832e7d776442b2e969e057660ad108a7d6e6a30569398ec660a75a959/sys/fs/cgroup/pids/docker/a730051832e7d776442b2e969e057660ad108a7d6e6a30569398ec660a75a959/sys/fs/cgroup/freezer/docker/a730051832e7d776442b2e969e057660ad108a7d6e6a30569398ec660a75a959/sys/fs/cgroup/perf_event/docker/a730051832e7d776442b2e969e057660ad108a7d6e6a30569398ec660a75a959/sys/fs/cgroup/blkio/docker/a730051832e7d776442b2e969e057660ad108a7d6e6a30569398ec660a75a959/sys/fs/cgroup/memory/docker/a730051832e7d776442b2e969e057660ad108a7d6e6a30569398ec660a75a959/sys/fs/cgroup/net_cls Net_prio/docker/a730051832e7d776442b2e969e057660ad108a7d6e6a30569398ec660a75a959/sys/fs/cgroup/hugetlb/docker/a730051832e7d776442b2e969e057660ad108a7d6e6a30569398ec660a75a959/sys/fs/cgroup/systemd/docker/a730051832e7d776442b2e969e057660ad108a7d6e6a30569398ec660a75a959# There they are! Docker creates a control group with the name# being the exact ID of the container under all the subsystems.# What can we discover from this inspection? We can look at the# subsystem that we want to place contrainst on (PIDs), for instance: tree / sys/fs/cgroup/pids/docker/a7300518327d.../sys/fs/cgroup/pids/docker/a73005183... ├── cgroup.clone_children ├── cgroup.procs ├── notify_on_release ├── pids.current ├── pids.events ├── pids.max └── tasks# Which means that, if we want to know how many PIDs are in use right# now we can look at 'pids.current', to know the limits 'pids.max' and# to know which processes have been assigned to this control group,# look at tasks. Lets do it:cat / sys/fs/cgroup/pids/docker/a730...c660a75a959/tasks 53295371537253735374537553765377 (...) # continues until the 300th entry-as we have 300 processes in this container# 300 pidscat / sys/fs/cgroup/pids/docker/a730051832e7d7764...9/pids.current300# no max setcat / sys/fs/cgroup/pids/docker/a730051832e7d77.../pids.max maxPS

Generally speaking, the following errors are often encountered during the installation of k8s:

Create kubelet: misconfiguration: kubelet cgroup driver: "cgroupfs" is different from docker cgroup driver: "systemd"

In fact, the error message here is very clear, that is, the cgroup driver specified by docker and kubelet is not the same. Docker

Both systemd and cgroupfs drivers are supported. You can understand it more intuitively through the runc code.

Cgroup can only restrict the use of CPU, not the use of CPU. That is, use the

Cpuset-cpus, which allows the container to run on a specified CPU or core, but there is no guarantee that it monopolizes these CPU;cpu-shares

Is a relative value that works only when CPU is insufficient. That is, when there is enough CPU for each container, when there is not enough CPU; for each container, CPU will be allocated among multiple containers according to the specified proportion.

For memory, cgroups can limit the maximum amount of memory used by the container. Use the-m parameter to set the maximum amount of memory that can be used.

Code interpretation

You can click on the code section of cgroups in runc to read it in detail. We will only give you a general idea here.

First of all, the creation of container is realized by factory calling the create method, while related to cgroup, factory implements the method of creating a new CgroupsManager according to the configuration items driven by cgroup drive in the configuration file, systemd and cgroupfs:

/ / SystemdCgroups is an options func to configure a LinuxFactory to return// containers that use systemd to create and manage cgroups.func SystemdCgroups (l * LinuxFactory) error {l.NewCgroupsManager = func (config * configs.Cgroup, paths Map [string] string) cgroups.Manager {return & systemd.Manager {Cgroups: config, Paths: paths }} return nil} / / Cgroupfs is an options func to configure a LinuxFactory to return containers// that use the native cgroups filesystem implementation to create and manage// cgroups.func Cgroupfs (l * LinuxFactory) error {l.NewCgroupsManager = func (config * configs.Cgroup, paths map [string] string) cgroups.Manager {return & fs.Manager {Cgroups: config, Paths: paths,}} return nil}

Abstract cgroup manager interface. The API is as follows:

Type Manager interface {/ / Applies cgroup configuration to the process with the specified pid Apply (pid int) error / / Returns the PIDs inside the cgroup set GetPids () ([] int, error) / / Returns the PIDs inside the cgroup set & all sub-cgroups GetAllPids () ([] int, error) / / Returns statistics for the cgroup set GetStats () (* Stats Error) / / Toggles the freezer cgroup according with specified state Freeze (state configs.FreezerState) error / / Destroys the cgroup set Destroy () error / / The option func SystemdCgroups () and Cgroupfs () require following attributes: / / Paths map [string] string / / Cgroups * configs.Cgroup / / Paths maps cgroup subsystem to path at which it is mounted. / / Cgroups specifies specific cgroup settings for the various subsystems / / Returns cgroup paths to save in a state file and to be able to / / restore the object later. GetPaths () map [string] string / / Sets the cgroup as configured. Set (container * configs.Config) error}

During the creation of the container, the method of the above interface is called. For example:

In container_linux.go

Func (c * linuxContainer) Set (config configs.Config) error {c.m.Lock () defer c.m.Unlock () status, err: = c.currentStatus () if err! = nil {return err}. If err: = c.cgroupManager.Set (& config); err! = nil {/ / Set configs back if err2: = c.cgroupManager.Set (c.config); err2! = nil {logrus.Warnf ("Setting back cgroup configs failed due to error:% v, your state.json and actual configs might be inconsistent.", err2)} return err}.}

Next, let's focus on the implementation of fs.

In fs, basically each subsystem is a file, as shown in the figure above.

Focus on memory.go, the memory subsystem, and other subsystems are similar.

Key methods:

Func (s * MemoryGroup) Apply (d * cgroupData) (err error) {path, err: = d.path ("memory") if err! = nil & &! cgroups.IsNotFound (err) {return err} else if path = "" {return nil} if memoryAssigned (d.config) {if _, err: = os.Stat (path) Os.IsNotExist (err) {if err: = os.MkdirAll (path, 0755); err! = nil {return err} / / Only enable kernel memory accouting when this cgroup / / is created by libcontainer, otherwise we might get / / error when people use `cgroupsPath` to join an existed / / cgroup whose kernel memory is not initialized. If err: = EnableKernelMemoryAccounting (path); err! = nil {return err}} defer func () {if err! = nil {os.RemoveAll (path)}} () / / We need to join memory cgroup after set memory limits, because / / kmem.limit_in_bytes can only be set when the cgroup is empty. Err = d.join ("memory") if err! = nil & &! cgroups.IsNotFound (err) {return err} return nil}

Find the memory path of cgroup through d.path ("memory")

Func (raw * cgroupData) path (subsystem string) (string, error) {mnt, err: = cgroups.FindCgroupMountpoint (subsystem) / / If we didn't mount the subsystem, there is no point we make the path. If err! = nil {return ", err} / / If the cgroup name/path is absolute do not look relative to the cgroup of the init process. If filepath.IsAbs (raw.innerPath) {/ / Sometimes subsystems can be mounted together as' cpu,cpuacct'. Return filepath.Join (raw.root, filepath.Base (mnt), raw.innerPath), nil} / / Use GetOwnCgroupPath instead of GetInitCgroupPath, because the creating / / process could in container and shared pid namespace with host, and / / proc/1/cgroup could point to whole other world of cgroups. ParentPath, err: = cgroups.GetOwnCgroupPath (subsystem) if err! = nil {return ", err} return filepath.Join (parentPath, raw.innerPath), nil}

D.join ("memory"), write pid to the memory path

Func (raw * cgroupData) join (subsystem string) (string, error) {path, err: = raw.path (subsystem) if err! = nil {return ", err} if err: = os.MkdirAll (path, 0755); err! = nil {return", err} if err: = cgroups.WriteCgroupProc (path, raw.pid) Err! = nil {return ", err} return path, nil} about how to understand cgroups in docker is shared here. I hope the above content can be helpful to you and learn more. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.