Docker container CPU, memory resource restrictions 10/19 Update SLTechnology News&Howtos

Docker container CPU, memory resource restrictions

2025-10-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/02 Report--

Background

When using docker to run containers, by default, docker does not impose hardware resource restrictions on containers. When hundreds of containers are running on a host, these containers are isolated from each other, but the underlying layer uses the same CPU, memory and disk resources. If there is no restriction on the resources used by the container, the containers will affect each other, and the small ones may lead to unfair use of container resources; in large terms, it may lead to the exhaustion of host and cluster resources and the complete unavailability of services.

As the manager of the container, docker naturally provides the function to control the container resources. Just like using kernel namespace to isolate containers, docker also uses kernel cgroups to limit the resources of containers, including CPU, memory and disk, which basically covers common resource quotas and usage controls.

Docker memory control OOME on linxu systems, if the kernel detects that the current host has no available memory, it will throw an OOME (Out Of Memory Exception: memory exception) and open killing to kill some processes.

Once OOME occurs, any process can be killed, including docker daemon. For this reason, docker specially adjusts the OOM_Odj priority of docker daemon to prevent him from being killed, but the priority of the container is not adjusted. After the calculation copied within the system, each system process will have an OOM_Score score. The higher the OOM_Odj, the higher the score. (you can adjust the OOM_Odj when docker run) the highest score is given priority by kill. Of course, you can also specify that some specific important containers are not allowed to be killed by OMM. Use-oom-kill-disable=true to specify when starting the container.

Reference: Docker monitors the utilization of container resources

Introduction to cgroup

Cgroup, the abbreviation of Control Groups, is a mechanism provided by Linux kernel to limit, record and isolate the physical resources used by process groups (such as cpu, memory, disk IO, etc.). It is used to control process resources by LXC, docker and many other projects. Cgroup the Linux kernel function that manages arbitrary processes in groups. Cgroup itself is the infrastructure that provides the function and interface for packetized management of processes. Specific resource management functions such as Imax O or memory allocation control are realized through this function. These specific resource management functions are called cgroup subsystems and are implemented by the following subsystems:

Blkio: sets the input and output control that limits each block device. For example: disk, CD, usb and so on.

Cpu: use the scheduler to provide cpu access to cgroup tasks.

Cpuacct: generate cpu resource reports for cgroup tasks.

Cpuset: if it is a multi-core cpu, this subsystem allocates separate cpu and memory for the cgroup task.

Devices: allow or deny cgroup task access to the device.

Freezer: pauses and resumes cgroup tasks.

Memory: set memory limits for each cgroup and generate memory resource reports.

Net_cls: Mark each network packet for cgroup convenience.

Ns: namespace subsystem.

Perf_event: added the ability to monitor and trace each group, that is, to monitor all threads belonging to a particular group and threads running on a particular CPU.

At present, docker only uses some of these subsystems to control the quota and use of resources.

You can use the stress tool to test CPU and memory. Use the Dockerfile below to create a Ubuntu-based image of the stress tool.

FROM ubuntu:14.04

RUN apt-get update & & apt-get install stress

Key directory of resource monitoring: cat read out

Memory used:

/ sys/fs/cgroup/memory/docker/ Application ID/memory.usage_in_bytes

Total memory allocated:

/ sys/fs/cgroup/memory/docker/ Application ID/memory.limit_in_bytes

Cpu used: unit nanosecond

/ sys/fs/cgroup/cpuacct/docker/ Application ID/cpuacct.usage

Current cpu of the system:

$cat / proc/stat | grep 'cpu' (cycle / time slice / jiffies) # the sum of the numbers / HZ (cat / boot/config- `time-r` | grep'^ CONFIG_HZ='ubuntu 14.04250) is the system time (seconds) # multiplied by 10 seconds is the system time (nanoseconds)

Examples

$cat / proc/statcpu 432661 13295 86656 422145968 171474 233075 2462 23494 105543694 16586 0 4615cpu1 1119174124 23858 105503820 696997 123 371cpu2 103164 3554 21530 105521167 64032 106 334cpu3 94504153 17772 105577285 21158 4 24intr 1065711094 1057275779 92 06 03527 0352700 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000,027,027,027,0000000000000000000000000000000000000000000000000000000000000000000000 The meaning of each parameter (in the first behavior example) of each line of CPU1, CPU2, CPU3 is: parameter interpretation user (432661) accumulates from the start of the system to the current time CPU time in user mode (in jiffies), excluding processes with a Nice value of negative. Nice (13295) accumulated from the start of the system to the current time, the CPU time (in jiffies) system (86656) occupied by the process with a Nice value of negative from the start of the system to the current time, the core time (in jiffies) idle (422145968) accumulated from the start of the system to the current time, and the waiting time except the hard disk IO wait time (unit: jiffies) iowait (171474) accumulated from the start of the system to the current time Hard disk IO wait time (jiffies), irq (233) accumulated from system startup to the current time, hard interrupt time (in jiffies) softirq (5346) accumulated from system startup to current time, soft interrupt time (in jiffies)

Cpu usage:

(2 used-1 used) / (system current 2-system current 1) * 100%

Memory limit

The memory limitations provided by Docker include the following:

The memory and swap partition size that the container can use.

The core memory size of the container.

The swapping behavior of container virtual memory.

The soft limit of container memory.

Whether to kill containers that take up too much memory.

Priority of container being killed

In general, containers that reach the memory limit will be killed by the system after a period of time.

Parameters related to memory limit

All the options related to memory limitations that can be used when executing the docker run command are as follows.

The option description-mmaewry memory memory is limited, the format is numeric plus units, the units can be bmemorie kpjorm pr g. The minimum is 4M--memory-swap memory + total swap partition size limit. The format is the same as above. Must be large set by-m-soft limit of memory-reservation memory. The format is the same as above-- whether oom-kill-disable prevents OOM killer from killing containers is not set by default-- the priority of oom-score-adj containers killed by OOM killer is [- 1000, 1000]. By default, 0--memory-swappiness is used to set the virtual memory control behavior of containers. The value is an integer between 0,100-the kernel-memory core memory limit. The format is the same as above, with a minimum memory limit of 4m users

The user memory limit is to limit the memory that the container can use and the size of the swap partition. There are two intuitive rules to follow when using it: the minimum parameter of the-m _ mai _ mai _ memory option is 4m. -- memory-swap is not a swap partition, but the total size of memory plus swap partition, so-- memory-swap must be larger than-m-ray memory. Under these two rules, there are generally four ways to set up.

You may find an error in the docker run command when doing experiments with memory constraints: WARNING: Your kernel does not support swap limit capabilities, memory limited without swap.

This is because the relevant functions of the host kernel are not turned on. Just follow the settings below.

Step 1: edit the / etc/default/grub file and change the GRUB_CMDLINE_LINUX line to GRUB_CMDLINE_LINUX= "cgroup_enable=memory swapaccount=1"

Step 2: update GRUB, that is, execute $sudo update-grub

Step 3: restart the system.

1. Do not set

By default, the container can use up all the memory and swap partitions of the dormitory machine if you do not set-m memory-swap and-- color memory. Note, however, that if the container occupies all the memory and swap partition of the host for more than a period of time, it will be killed by the host system (if-00m-kill-disable=true is not set).

two。 Set-m _ mam _ my _ memory, but don't set-- memory-swap

Set a value of not less than 4m to-m or-- memory, assuming a, do not set-- memory-swap, or set-- memory-swap to 0. In this case, the memory size that the container can use is a, and the swap partition size that can be used is also a. Because the default container swap partition for Docker is the same size as memory.

If you run a program that requests memory all the time in the container, you will observe that the program will eventually occupy 2a of memory.

For example, $docker run-m 1G ubuntu:16.04, the container can use 1G of memory and can use 1G of swap partition size. The total memory size that can be applied for by processes in the container is 2G.

3. Set-m _ r _ r _

Set a parameter for-m and a parameter b for-- memory-swap. The amount of memory that can be used by the container when a, and b is the amount of memory that can be used by the container + swap partition size. So b must be greater than a. B-an is the swap partition size that the container can use.

For example, $docker run-m 1G-- memory-swap 3G ubuntu:16.04, the container can use 1G of memory and 2G of swap partition size. The total memory size that the processes in the container can apply for is 3G.

4. Set up-m-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-1.

Set the-m parameter to a normal value and-- memory-swap to-1. This situation means limiting the amount of memory that the container can use to a, but not the size of the swap partition that the container can use.

In this case, the memory size that the process in the container can apply for is the swap size of a + host.

Memory reservation

This kind of memory reservation mechanism does not know how to translate more vividly. Memory reservation is a soft limitation that is used to control container memory usage. After setting a value less than-m to-memory-reservation, although the container can use up to the amount of memory used by-m, when the memory resources of the host are tight, the system will reclaim part of the memory pages of the container the next time the system reclaims memory, forcing the container's memory consumption to return to the value set by-- memory-reservation.

When not set (by default)-the value of memory-reservation is the same as the limited value of-m. Setting it to 0 sets a parameter larger than-m equals no setting.

Memory reservation is a soft mechanism that does not guarantee that the memory used by the container at any time will not exceed the-memory-reservation limit, it just ensures that the container will not take up more memory than the-- memory-reservation limit for a long time.

For example:

$docker run-it-m 500m-- memory-reservation 200m ubuntu:16.04 / bin/bash

If the container uses more than 200m but less than 500m of memory, the next time the system reclaims the memory, it will try to lock the container's memory below 200m.

For example:

$docker run-it-- memory-reservation 1G ubuntu:16.04 / bin/bash

The container can use as much memory as possible. -- memory-reservation ensures that the container does not take up too much memory for a long time.

OOM killer

By default, when an out-of-memory (OOM) error occurs, the system kills the processes in the container to get more free memory. The process that kills the process to save memory, let's call it OOM killer. We can prevent OOM killer from killing processes in the container by setting the-- oom-kill-disable option. However, be sure to use-- oom-kill-disable to disable OOM killer only if the-m/--memory option is used. If the-m option is not set, but OOM-killer is disabled, it may cause the system to change memory by killing the host process or getting the out-of-memory error.

The following example limits the container's memory to 100m and disables OOM killer:

$docker run-it-m 100m-- oom-kill-disable ubuntu:16.04 / bin/bash

It's the right way to use it.

The following container does not set a memory limit, but it is very dangerous to disable OOM killer:

$docker run-it-- oom-kill-disable ubuntu:16.04 / bin/bash

The container has no memory limit, which may or may cause the system to have no memory available, and kill the system process when trying to get more available memory.

Generally speaking, there is only one process in a container, and if the only process is killed, the container is killed. We can use the-- oom-score-adj option to set the priority at which the container is killed when the system runs out of memory. Negative values are more impossible to kill, while positive values are more likely to be killed.

Core memory

The difference between core memory and user memory is that core memory cannot be swapped out. The non-swapping feature allows containers to block some system services by consuming too much memory. Core memory includes:

Stack pages (stack page)

Slab pages

Socket memory pressure

Tcp memory pressure

You can constrain this memory by setting core memory limits. For example, each process consumes some stack pages, and by limiting core memory, you can prevent new processes from being created if core memory is used too much.

Core memory and user memory are not independent, and core memory must be limited in the context of user memory restrictions.

Suppose the limit value of user memory is U and the limit value of core memory is K. There are three possible ways to limit core memory:

U! = 0, no core memory limit. This is the default standard setting.

< U，核心内存时用户内存的子集。这种设置在部署时，每个 cgroup 的内存总量被过度使用。过度使用核心内存限制是绝不推荐的，因为系统还是会用完不能回收的内存。在这种情况下，你可以设置 K，这样 groups 的总数就不会超过总内存了。然后，根据系统服务的质量自有地设置 U。 K >

U, because changes in core memory will also lead to changes in user counters, container core memory and user memory will trigger recycling behavior. This configuration allows administrators to look at memory in a unified way. It is also useful for users who want to track core memory usage.

For example:

$docker run-it-m 500m-- kernel-memory 50m ubuntu:16.04 / bin/bash

Processes in the container can use up to 500m of memory, and of these 500m, there is up to 50m of core memory.

$docker run-it-- kernel-memory 50m ubuntu:16.04 / bin/bash

There is no need to set user memory limits, so processes in the container can use as much memory as possible, but up to 50m core memory.

Swappiness

By default, the container's kernel can swap out a certain percentage of anonymous pages. Memory-swappiness is used to set this ratio. -- memory-swappiness can be set from 0 to 100. 0 means to turn off anonymous page exchange. 100 means that all anonymous pages can be exchanged. By default, if-- memory-swappiness is not applied, the value is inherited from the parent process.

For example:

$docker run-it-- memory-swappiness=0 ubuntu:16.04 / bin/bash

Setting-- memory-swappiness to 0 maintains the working set of the container and avoids the performance loss of the exchange agent.

$docker run-tid-name mem1-memory 128m ubuntu:16.04 / bin/bash

$cat / sys/fs/cgroup/memory/docker//memory.limit_in_bytes

Overview of $cat / sys/fs/cgroup/memory/docker//memory.memsw.limit_in_bytesCPU restrictions

The resource restriction and isolation of Docker is based entirely on Linux cgroups. CPU resources are restricted in the same way as cgroups. The CPU resource restriction option provided by Docker limits which vCPU can be utilized by containers on multicore systems. There are two ways to limit the maximum CPU time that can be used by containers: first, when multiple CPU-intensive containers compete for CPU, set the relative proportion of CPU time that can be used by each container. The second is to set the maximum CPU time that the container can use in each scheduling cycle in an absolute way.

Parameters related to CPU restriction

All options related to the docker run command and CPU restrictions are as follows:

Option description-- cpuset-cpus= "" allowed set of CPUs. The value can be 0-3. The CPU shared weight (relative weight) cpu-period=0 limits the cycle of CPU CFS, ranging from 100ms~1s, that is, [1000, 1000000]-- cpu-quota=0 limits the CPU CFS quota, which must be no less than 1ms, that is, > = 1000 MustcpusetMemens = "" memory nodes (MEMs) that are allowed to be executed on the, valid only for NUMA systems.

Where-- cpuset-cpus is used to set the vCPU core that the container can use. -cmam CPU shares is used to set the proportion of CPU time that can be allocated to each container when multiple containers compete for CPU. -- cpu-period and-- cpu-quata are used to absolutely set the time when the container can use CPU.

-- cpuset-mems is not available for the time being. Let's not talk about it here.

CPU set

We can set which CPU cores the container can run on.

For example:

$docker run-it-- cpuset-cpus= "1J 3" ubuntu:14.04 / bin/bash

Indicates that processes in the container can be executed on cpu 1 and cpu 3.

$docker run-it-- cpuset-cpus= "0-2" ubuntu:14.04 / bin/bash

$cat / sys/fs/cgroup/cpuset/docker//cpuset.cpus

Indicates that processes in the container can be executed on cpu 0, cpu 1, and cpu 3.

On the NUMA system, we can set the memory nodes that the container can use.

For example:

$docker run-it-- cpuset-mems= "1J 3" ubuntu:14.04 / bin/bash

Indicates that processes in the container can only use memory on memory nodes 1 and 3.

$docker run-it-- cpuset-mems= "0-2" ubuntu:14.04 / bin/bash

Indicates that processes in the container can only use memory on memory nodes 0, 1, 2.

Relative limitations of CPU resources

By default, all containers get an equal proportion of CPU cycles. When there are multiple containers competing for CPU, we can set the percentage of CPU time that each container can use. This ratio is called shared weight and is set by-c or-- cpu-shares. Docker defaults to a weight of 1024 per container. If you do not set it or set it to 0, this default value will be used. The system allocates CPU time to containers according to the shared weights of each container and the shared weights and proportions of all containers.

Suppose you have three containers that are running, and the tasks in all three containers are CPU-intensive. The cpu share weight of the first container is 1024, and the cpu share weight of the other two containers is 512. The first container will get 50% CPU time, while the other two containers will each get 25% CPU time. If you add a fourth container with a cpu share value of 1024, the CPU time for each container will be recalculated. The CPU time of the first container is 33%, and the CPU time of other containers is 16.5%, 16.5% and 33%, respectively.

It is important to note that this ratio is only useful when performing CPU-intensive tasks. On a quad-core system, assume that there are four single-process containers that can each use 100% CPU time of one core, regardless of their cpu shared weights.

On multi-core systems, CPU time weights are calculated on all CPU cores. Even if the CPU time limit of a container is less than 100%, it can use 100% of the time of each CPU core.

For example, suppose you have a system with more than three cores. Start container {C0} with the option of-clocked 512, and the container has only one process, start container C2 with the startup option of-clocked 1024, and the container has two processes. The distribution of CPU weights may be as follows:

PID container CPU CPU share100 {C0} 0 100% of CPU0101 {C1} 1 100% of CPU1102 {C1} 2 100% of CPU2

$docker run-it-- cpu-shares=100 ubuntu:14.04 / bin/bash

$cat / sys/fs/cgroup/cpu/docker//cpu.shares

Indicates that the process CPU share value in the container is 100.

Absolute limitations of CPU resources

Linux uses CFS (Completely Fair Scheduler, full Fair Scheduler) to schedule the use of CPU by individual processes. The default scheduling cycle for CFS is 100ms.

For more information about CFS, refer to CFS documentation on bandwidth limiting.

We can set the scheduling cycle for each container process and the maximum amount of CPU time each container can use during that cycle. Use-- cpu-period to set the scheduling cycle, and-- cpu-quota to set the CPU time that the container can use in each cycle. The two are generally used together.

For example:

$docker run-it-cpu-period=50000-cpu-quota=25000 ubuntu:16.04 / bin/bash

Set the cycle for CFS scheduling to 50000 and the container's CPU quota for each cycle to 25000, which means that the container can get 50% of the CPU elapsed time per 50ms.

$docker run-it-cpu-period=10000-cpu-quota=20000 ubuntu:16.04 / bin/bash

$cat / sys/fs/cgroup/cpu/docker//cpu.cfs_period_us

$cat / sys/fs/cgroup/cpu/docker//cpu.cfs_quota_us

If the CPU quota of the container is set to twice the CFS cycle, how can the CPU usage time be longer than the cycle? In fact, it is easy to explain, just assign two vCPU to the container. This configuration means that the container can use two vCPU 100% of the time in each cycle.

The valid range of the CFS period is 1ms~1s, and the corresponding-- the numerical range of cpu-period is 1000-1000000. The CPU quota of the container must be not less than 1ms, that is, the value of-- cpu-quota must be > = 1000. You can see that the units for both options are us.

A correct understanding of "absolute"

Note that we actually set an upper limit when we set the CPU time that the container can use during a scheduling cycle with-cpu-quota. This is not to say that the container will necessarily use such a long CPU time. For example, let's start a container and bind it to cpu 1 for execution. Set it-- cpu-quota and-- cpu-period to 50000.

$docker run-- rm-- name test01-- cpu-cpus 1-- cpu-quota=50000-- cpu-period=50000 deadloop:busybox-1.25.1-glibc

The scheduling cycle is 50000, and the container can use up to 50000 cpu time in each cycle.

Using docker stats test01, we can observe that the utilization rate of CPU in this container is about 100%. Then we start another container with the same parameters.

$docker run-- rm-- name test02-- cpu-cpus 1-- cpu-quota=50000-- cpu-period=50000 deadloop:busybox-1.25.1-glibc

The two containers can be observed with docker stats test01 test02, and the utilization rate of cpu in each container is about 50%. Indicates that the container does not use 50000 of the cpu time per cycle.

Use the docker stop test02 command to end the second container, and start it with an argument-c 2048:

$docker run-rm-name test02-cpu-cpus 1-cpu-quota=50000-cpu-period=50000-c 2048 deadloop:busybox-1.25.1-glibc

Using the docker stats test01 command, you can observe that the CPU utilization rate of the first container is about 33%, and the CPU utilization rate of the second container is about 66%. Because the shared value of the second container is 2048 and the default shared value of the first container is 1024, the second container can use twice as much CPU time in each cycle as the first container.

Disk IO quota control

Compared with the quota control of CPU and memory, docker's control of disk IO is relatively immature, and most of them must be used with host devices. It mainly includes the following parameters:

-device-read-bps: limit the read speed (bytes per second) on this device, which can be kb, mb, or gb.

-device-read-iops: limits the read speed of a specified device by reading IO per second.

-device-write-bps: limit the write speed on this device (bytes per second), which can be kb, mb, or gb.

-device-write-iops: limits the write speed of a specified device by writing IO times per second.

-blkio-weight: the weighting value of the container's default disk IO. Valid values range from 10 to 100.

-blkio-weight-device: IO weighted control for specific devices. Its format is DEVICE_NAME:WEIGHT

For parameters related to storage quota control, you can refer to the blkio chapter in the Red Hat documentation for their detailed role.

Disk IO quota control example

Blkio-weight

For-blkio-weight to take effect, you need to ensure that the scheduling algorithm of IO is CFQ. You can view it in the following ways:

Root@ubuntu:~# cat / sys/block/sda/queue/scheduler

Noop [deadline] cfq

Create two containers with different-blkio-weight values using the following command:

Docker run-ti-rm-blkio-weight 100 ubuntu:stress

Docker run-ti-rm-blkio-weight 1000 ubuntu:stress

Execute the following dd command in the container at the same time to test:

Time dd if=/dev/zero of=test.out bs=1M count=1024 oflag=direct

The final output is shown in the following figure:

In my test environment did not achieve the desired test results, through the docker official blkio-weight doesn't take effect in docker Docker version 1.8.1 # 16173, we can find that this problem exists in some environments, but the docker official did not give a solution.

Device-write-bps

Use the following command to create a container and execute the command to verify the write speed limit.

Docker run-tid-name disk1-device-write-bps / dev/sda:1mb ubuntu:stress

Verify the write speed through dd, and the output is shown below:

You can see that the write disk speed of the container is successfully limited to 1MB/s. Other disk IO limit parameters such as device-read-bps can be validated in a similar manner.

Container space size limit

When docker uses devicemapper as the storage driver, the default maximum size for each container and image is 10G. If you need to adjust it, you can use dm.basesize to specify it in the daemon startup parameters. However, it is important to note that changing this value will not only restart the docker daemon service, but also cause all local images and containers on the host to be cleaned up.

There is no such limitation when using other storage drivers such as aufs or overlay.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.