Example Analysis of Security support of Docker 07/04 Update SLTechnology News&Howtos

Example Analysis of Security support of Docker

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article mainly explains the "Docker security support example analysis", the article explains the content is simple and clear, easy to learn and understand, the following please follow the editor's ideas slowly in depth, together to study and learn "Docker security support example analysis" bar!

As one of the container technologies that attach most importance to security, Docker provides strong security default configuration in many aspects, including: container root user Capability capability limit, Seccomp system call filtering, Apparmor MAC access control, ulimit restriction, pid-limits support, image signature mechanism and so on.

Write at the front

Docker uses Namespace to achieve six isolations, which appear to be complete, but in fact, Linux resources are not completely isolated. For example, directories such as / proc, / sys, / dev/sd* are not completely isolated, and all information outside the existing Namespace, such as SELinux, time, syslog, are not isolated. In fact, Docker has also done a lot of work on security, including the following aspects:

1. Linux kernel Capability capability limit

Docker supports setting Capabilities for the container and specifying the permissions open to the container. In this way, the root user in the container has much less permissions than the actual root. Docker supports enabling super permissions for containers after version 0.6, so that containers have host root permissions.

2. Mirror signature mechanism

After Docker version 1.8, an image signature mechanism is provided to verify the source and integrity of the image. This feature needs to be manually enabled, so that the image maker can sign the image before the push image. When pull is mirrored, Docker will not fail pull verification or have no signed image tag.

3. MAC access control of Apparmor

Apparmor can associate the permissions of the process with the capabilities capability of the process to achieve mandatory access control (MAC) to the process. In Docker, we can use Apparmor to restrict users to certain commands, container network, file read and write permissions, and so on.

4. Seccomp system call filtering

Using Seccomp can limit the range of system calls (system call) that processes can call. The default Seccomp configuration file provided by Docker has disabled about 44 system calls exceeding 300 +, meeting the system call demands of most containers.

5. User Namespace isolation

Namespace provides isolation for running processes, restricting their access to system resources, and processes are not aware of these restrictions. The best way to prevent privilege escalation attacks within the container is to configure the container's application to run as an unprivileged user, and for containers whose processes must run as root users in the container, you can remap this user to a less privileged user on the Docker host. The mapped user is assigned a series of UID that runs as a normal UID from 0 to 65536 within the namespace, but has no privileges on the host.

6 、 SELinux

SELinux mainly provides mandatory access control (MAC), that is, access is no longer determined solely on the basis of the owner of the process and the rwx permissions of the file resources. Can add a barrier after the attacker implements a container breakthrough attack. Docker provides support for SELinux.

7. Pid-limits support

Before talking about pid-limits, we need to talk about what is fork bomb (fork bomb). Fork bomb is to create a large number of processes at a very fast speed, which consumes the free space allocated by the system to process and saturates the process table, thus preventing the system from running new programs. Speaking of the limit on the number of processes, you may all know that there is a hole in the nproc configuration of ulimit. Unlike other ulimit options, nproc is a user-based setting option, that is, it adjusts the sum of the maximum number of processes belonging to a user's UID. This section will be introduced in the next article. Docker supports specifying for containers-pids-limit limits the number of processes in the container, which can be used to limit the number of processes in the container.

8. Other kernel security features tool support

Around the container ecology, there are many tools that can support container security, for example, you can use Docker bench audit tool (tool address: https://github.com/docker/docker-bench-security) to check your Docker runtime environment, Sysdig Falco (tool address: https://sysdig.com/opensource/falco/) to detect abnormal activity in the container, GRSEC and PAX to strengthen the system kernel, and so on.

In this sharing, let's take a look at how Docker securely supports the first four items, and the next article will take you to the rest.

1Linux kernel Capability capability limit

Capabilities simply refers to the permissions open to the process, such as allowing the process to access the network, read files, and so on. The Docker container is essentially a process, and by default, Docker removes all capabilities except the necessary capabilities, and you can see the complete list of available capabilities in the Linux man page. Docker version 0.6 supports adding super permissions to the startup parameters-- the privileged option enables super permissions for containers.

Docker supporting Capabilities is of great significance to container security, because we often run as root users in containers, and after using Capability restrictions, the root in the container is much less than the real root user rights. This means that even if the intruder manages to gain root permission within the container, it is difficult to seriously break or gain host root permission.

When we specify the-- privileded option in docker run, docker actually does two things:

Get all the capabilities of the system root user to assign to the container

Scan the host for all device files and mount them into the container.

Let's give you an actual demonstration:

The-- privileded option is not specified when executing docker run

Lynzabo@ubuntu:~$ docker run-rm-name def-cap-con1-d alpine / bin/sh-c "while true;do echo hello; sleep 1 done"

F216f9261bb9c3c1f226c341788b97c786fa26657e18d7e52bee3c7f2eef755c

Lynzabo@ubuntu:~$ docker inspect def-cap-con1-f'{{.State.Pid}}'

43482

Lynzabo@ubuntu:~$ cat / proc/43482/status | grep Cap

CapInh: 00000000a80425fb

CapPrm: 00000000a80425fb

CapEff: 00000000a80425fb

CapBnd: 00000000a80425fb

CapAmb: 0000000000000000

Lynzabo@ubuntu:~$

Lynzabo@ubuntu:~$ docker exec def-cap-con1 ls / dev

Core fd full mqueue null ptmx pts random shm stderr stdin stdout tty urandom zero... A total of 15

Lynzabo@ubuntu:~$

△ swipe left and right to see all △

If the-- privileded option is specified

Lynzabo@ubuntu:~$ docker run-- privileged-- rm-- name pri-cap-con1-d alpine / bin/sh-c "while true;do echo hello; sleep 1 done"

Ad6bcff477fd455e73b725afe914b82c8aa6040f36326106a9a3539ad0be03d2

Lynzabo@ubuntu:~$ docker inspect pri-cap-con1-f'{{.State.Pid}}'

44312

Lynzabo@ubuntu:~$ cat / proc/44312/status | grep Cap

CapInh: 0000003fffffffff

CapPrm: 0000003fffffffff

CapEff: 0000003fffffffff

CapBnd: 0000003fffffffff

CapAmb: 0000000000000000

Lynzabo@ubuntu:~$ docker exec pri-cap-con1 ls / dev

Agpgart autofs bsg btrfs-control bus core cpu_dma_latency cuse dmmidi dri ecryptfs

... Total 186 articles

Lynzabo@ubuntu:~$

△ swipe left and right to see all △

Comparing / proc/$pid/status, you can see the difference in the capability bitmap between the two container processes, plus-- privileged's capability bitmap is the same as the superuser's capability bitmap. Compared with the changes in the files under the directory / dev after the addition of-- privileged, you can see that after adding privileges, all the device files of the host are mounted in the container.

We can see that there are too many permissions granted to the container using the-- privileged parameter, so you need to use it with caution. If you need to mount a specific device, you can use the-- device method to mount only the devices you need to use to the container, instead of mounting all the devices of the host to the container. For example, mount a host sound card in the container:

$docker run-device=/dev/snd:/dev/snd...

In addition, the capacity of the container can be adjusted through the-- add-cap and-- drop-cap parameters to maximize the security of the container.

For example, add a command to the container to modify the system time:

$docker run-cap-drop ALL-cap-add SYS_TIME ntpd / bin/sh

Check the container PID, and execute getpcaps PID to view the process's capabilities. The execution result is as follows:

[root@VM_0_6_centos ~] # getpcaps 652

Capabilities for `652percent: = cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_sys_time,...

[root@VM_0_6_centos ~] #

△ swipe left and right to see all △

You can see that the sys_time capability has been added to the container, so you can modify the system time.

2Docker image signature mechanism

When we execute a docker pull image, after verifying the user's identity, the image repository will first return a manifest.json file containing the image name, tag, all layer layer SHA256 values, and image signature information, and then docker daemon will download these layer layer files in parallel. After Docker 1.8, a digital signature mechanism, content trust, is provided to verify the origin and integrity of the official repository image. To put it simply, the image maker can choose to sign or not sign the image tag (tag) when creating the image. When the image is mirrored, it can be verified by this signature. If it is consistent, the data source is considered reliable, and download the image.

By default, the content trust is turned off, and you need to set an environment variable to turn on the mechanism, that is:

$export DOCKER_CONTENT_TRUST=11

When the content trust mechanism is enabled, docker will not pull failed verification or unsigned image tags. Of course, you can also temporarily remove this restriction by adding-- disable-content-trust to pull.

MAC access Control of 3Apparmor

AppArmor and SELinux are both Linux security modules, which can connect the permissions of the process with the capabilities capability of the process to achieve mandatory access control (MAC) to the process. Because SELinux is a bit complex, it is often shut down directly, while AppArmor is relatively simple. Docker officials also recommend this approach.

Docker automatically generates and loads a default configuration file named docker-default for the container. In Docker 1.13.0 and later, the Docker binary generated the configuration file in tmpfs and then loaded it into the kernel. On Docker versions earlier than 1.13.0, this configuration file will be generated in / etc/apparmor.d/docker. The docker-default configuration file is the default configuration file for running the container. It has moderate protection and provides a wide range of application compatibility.

Note: this configuration file is used for containers rather than Docker daemons. The docker-default policy is used when running the container unless overridden by the security-opt option.

Let's use Nginx as a demonstration to provide a custom AppArmor configuration file:

1. Create a custom configuration file, assuming the file path is / etc/apparmor.d/containers/docker-nginx.

# include

Profile docker-nginx flags= (attach_disconnected,mediate_deleted) {

# include

...

Deny network raw

...

Deny / bin/** wl

Deny / root/** wl

Deny / bin/sh mrwklx

Deny / bin/dash mrwklx

Deny / usr/bin/top mrwklx

...

}

△ swipe left and right to see all △

2. Load the configuration file

$sudo apparmor_parser-r-W / etc/apparmor.d/containers/docker-nginx

3. Run the container using this configuration file

$docker run-security-opt "apparmor=docker-nginx"-p 80:80-d-name apparmor-nginx nginx12

4. Enter the running container and try some operations to test whether the configuration is effective:

$docker container exec-it apparmor-nginx bash2

Root@6da5a2a930b9:~# ping 8.8.8.8

Ping: Lacking privilege for raw socket.

Root@6da5a2a930b9:/# top

Bash: / usr/bin/top: Permission denied

Root@6da5a2a930b9:~# touch ~ / thing

Touch: cannot touch 'thing': Permission denied

Root@6da5a2a930b9:/# sh

Bash: / bin/sh: Permission denied

As you can see, we can protect the container through the apparmor configuration file.

4Seccomp system call filtering

Seccomp is a security mechanism supported by Linux kernel since version 2.6.23, which can be used to limit the range of system calls (system call) that processes can call. In Linux system, a large number of system calls (systemcall) are directly exposed to user-mode programs, but not all system calls are needed, and the abuse of system calls by unsafe code will pose a security threat to the system. With Seccomp, we restrict programs from using certain system calls, which reduces the exposure of the system and puts the program into a "safe" state. When each process makes a system call (system call), kernel checks the corresponding whitelist to determine whether the process has permission to use the system call. Starting with the Docker1.10 version, support for Seccomp has been added to Docker security features.

The premise for using Seccomp is that Docker is built with Seccomp and CONFIG_SECCOMP in the kernel is turned on. You can check if the kernel supports Seccomp using the following methods:

$cat / boot/config- `uname-r` | grep CONFIG_SECCOMP=

CONFIG_SECCOMP=y

The default seccomp configuration file provides a reasonable setting for running the container using seccomp and disables about 44 system calls that exceed 300 +. It has moderate protection and provides a wide range of application compatibility. The default Docker configuration file can be found under the moby source code profiles/seccomp/.

The default seccomp profile snippet is as follows:

{

"defaultAction": "SCMP_ACT_ERRNO"

"archMap": [

{

"architecture": "SCMP_ARCH_X86_64"

"subArchitectures": [

"SCMP_ARCH_X86"

"SCMP_ARCH_X32"

]

}, =

...

]

"syscalls": [

{

"names": [

"reboot"

]

"action": "SCMP_ACT_ALLOW"

"args": []

"comment":

"includes": {

"caps": [

"CAP_SYS_BOOT"

]

}

"excludes": {}

}

...

]

}

Seccomp profile consists of three parts: the default operation, the Linux architecture supported by system calls, and system call specific rules (syscalls). For each call rule, where name is the name of the system call, action is the operation of the seccomp when the system call occurs, and args is the parameter constraint of the system call. For example, the "SCMP_ACT_ALLOW" action above represents the process, the system call is allowed, and the call allows the process to restart the system.

In fact, the profile is a whitelist that blocks access to all system calls by default, and then whitelists specific system calls.

Seccomp helps to run the Docker container with minimal privileges. Changing the default seccomp profile is not recommended.

If the container is not overridden with the-- security-opt option when running the container, the default configuration is used. For example, the following explicitly specifies a policy:

$docker run-rm\

-it\

-- security-opt seccomp=/path/to/seccomp/profile.json\

Hello-seccomp thank you for reading, the above is the content of "example Analysis of Docker Security support". After the study of this article, I believe you have a deeper understanding of Docker security support example analysis of this problem, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.