In addition to Weibo, there is also WeChat
Please pay attention

WeChat public account
Shulou
 
            
                     
                
2025-10-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/02 Report--
Just like the official slogan of Docker: "Build once,Run anywhere,Configure once,Run anything", Docker is labeled as follows: lightweight, second startup, versioning, portability, etc., these advantages make it receive a lot of attention at the beginning of its appearance. Now, Docker is not just a tool used in the development and testing phase, but has been widely used in production environments. Today we introduce to you a "pit" about the isolation of containers. Before that, let's review the underlying implementation of the Docker container.
The underlying implementation of the container
 
As we all know, the underlying implementation principles of virtual machines and containers are different, as compared with the following figure:
 
The method of virtual machine resource isolation is realized by using an independent Guest OS and using Hypervisor to virtualize CPU, memory, IO devices and so on. For example, to virtualize memory, Hypervisor creates a shadow page table, and normally a page table can be used to translate from virtual memory to physical memory. Compared with the virtual machine to realize the isolation of resources and environment, Docker is much more concise. Unlike the virtual machine, it does not reload an operating system kernel. Booting and loading the operating system kernel is a time-consuming and resource-consuming process. Docker is an isolation realized by using the characteristics of the Linux kernel, and the speed of running the container is almost the same as starting the process directly.
The principle of Docker implementation is briefly summarized as follows:
Namespaces is used to isolate the system environment. Namespaces allows a process and its children to obtain an isolated area only visible to themselves from shared host kernel resources (network stack, process list, mount point, etc.), so that all processes under the same Namespace perceive each other's changes and know nothing about external processes, as if they are running in an exclusive operating system.
Use CGroups to limit resource usage in this environment, such as a 16-core 32GB machine that only allows containers to use 2-core 4GB. Using CGroups, you can also set weights for resources, calculate usage, and control the start and stop of tasks (processes or threads), etc.
The use of image management function, the use of Docker image layering, write-time replication, content addressing, joint mount technology to achieve a complete container file system and running environment, combined with the image warehouse, images can be quickly downloaded and shared, convenient for deployment in multiple environments.
Because Docker does not virtualize a Guest OS like a virtual machine, but uses the resources of the host and shares a kernel with the host, the following problems exist:
Note: problems do not necessarily mean security risks. As one of the container technologies that attach most importance to security, Docker provides strong security default configuration in many aspects, including: container root users' Capability capability restrictions, Seccomp system call filtering, Apparmor MAC access control, ulimit restrictions, image signature mechanism and so on.
1. Docker uses CGroups to achieve resource restrictions, which can only limit the maximum value of resource consumption, but cannot isolate other programs from occupying their own resources.
2. The six isolations of Namespace appear to be complete, but in fact, Linux resources are not completely isolated. For example, directories such as / proc, / sys, / dev/sd* are not completely isolated, and all information outside the existing Namespace, such as SELinux, time, syslog, are not isolated.
A pit in which the container is stepped in isolation.
 
You may have encountered these problems when using containers:
1. When you execute top, free and other commands in the Docker container, you will find that the resource usage you see is the resource situation of the host, and what we need is how much CPU is limited to the container, how much memory is limited, and how many processes are used in the container.
2. Modify / etc/sysctl.conf in the container and you will receive the message "sysctl: error setting key 'net.ipv4 ….': Read-only file system"
3. The program runs in the container, calls API to get the system memory and CPU, and gets the resource size of the host.
4. For multi-process programs, you can generally set the number of worker to auto, the number of CPU cores of an adaptive system, but if you set it this way in the container, the number of CPU cores obtained is incorrect, such as Nginx. Other applications may also get incorrect cores and need to be tested.
The essence of these problems is the same. In Linux environment, many commands calculate resource usage by reading files in the / proc or / sys directory. Take the free command as an example:
Lynzabo@ubuntu:~$ strace freeexecve ("/ usr/bin/free", ["free"], [/ * 66 vars * /]) = 0...statfs ("/ sys/fs/selinux", 0x7ffec90733a0) =-1 ENOENT (No such file or directory) statfs ("/ selinux", 0x7ffec90733a0) =-1 ENOENT (No such file or directory) open ("/ proc/filesystems", O_RDONLY) = 3...open ("/ sys/devices/system/cpu/online") O_RDONLY | O_CLOEXEC) = 3...open ("/ proc/meminfo", O_RDONLY) = 3percent + exited with 0 + + lynzabo@ubuntu:~$
Including various languages, such as Java,NodeJS, here take NodeJS as an example:
Const os = require ('os')
Const total = os.totalmem ()
Const free = os.freemem ()
Const usage = (free-total) / total * 100
The implementation of NodeJS is also to get memory information by reading / proc/meminfo file. The same goes for Java.
As we all know, the default maximum Heap size of JVM is 1x4 of system memory. If the physical machine memory is 10G, if you do not specify the Heap size manually, the default Heap size of JVM is 2.5G. Only the JavaSE8 (8u131) version can be used.
When you start a container, Docker will call libcontainer to manage the container, including creating Namespace such as UTS, IPS and Mount to isolate containers and using CGroups to limit the resources of containers. In which, Docker will mount some directories of the host to the container in a read-only manner, including / proc, / dev, / dev/shm, / sys, and set up the following links:
/ proc/self/fd- > / dev/fd
/ proc/self/fd/0- > / dev/stdin
/ proc/self/fd/1- > / dev/stdout
/ proc/self/fd/2- > / dev/stderr
Ensure that there will be no problems with the system IO, which is why the host resources are fetched in the container.
Knowing this, how can we get the instance resource usage in the container? here are two methods.
Read from CGroups
 
Docker mounts the CGroups information assigned to the container into the container after version 1.8, and the program in the container can obtain the container resource information by parsing the CGroups information.
You can run the mount command in the container to view these mount records
.. cgroup on / sys/fs/cgroup/cpuset type cgroup (ro,nosuid,nodev,noexec,relatime,cpuset) cgroup on / sys/fs/cgroup/cpu type cgroup (ro,nosuid,nodev,noexec,relatime,cpu) cgroup on / sys/fs/cgroup/cpuacct type cgroup (ro,nosuid,nodev,noexec,relatime,cpuacct) cgroup on / sys/fs/cgroup/memory type cgroup (ro,nosuid,nodev,noexec,relatime,memory) cgroup on / sys/fs/cgroup/devices type cgroup (ro,nosuid,nodev,noexec,relatime Devices) cgroup on / sys/fs/cgroup/freezer type cgroup (ro,nosuid,nodev,noexec,relatime,freezer) cgroup on / sys/fs/cgroup/blkio type cgroup (ro,nosuid,nodev,noexec,relatime,blkio) cgroup on / sys/fs/cgroup/perf_event type cgroup (ro,nosuid,nodev,noexec,relatime,perf_event) cgroup on / sys/fs/cgroup/hugetlb type cgroup (ro,nosuid,nodev,noexec,relatime,hugetlb).
Here, we will not explain what CGroups limits on CPU and memory, but only describe the computing resource management based on Kubernetes orchestration engine and what supports container CGroups:
When requests is specified for Pod, requests.cpu is passed to the docker run command as the-- cpu-shares parameter value. This parameter takes effect when there are multiple containers competing for CPU resources on a host machine. The larger the parameter value, the easier it is to be assigned to CPU,requests.memory and not passed to Docker as a parameter. This parameter is used in Kubernetes resource QoS management.
When limits is specified for Pod, limits.cpu is passed to the docker run command as the value of the-- cpu-quota parameter. Another parameter in the docker run command, cpu-period, is set to 100000 by default. The maximum number of CPU cores that the container can use is limited by these two parameters, and limits.memory is passed to the docker run command as the-- memory parameter to limit the container memory. Currently, Kubernetes does not support limiting the size of Swap. It is recommended to disable Swap when deploying Kubernetes.
After Kubernetes 1.10, you can specify a fixed CPU number for Pod. We will not describe it in detail here, but will focus on conventional computing resource management. Let's briefly talk about the CGroups resource limitations of containers using Kubernetes as the orchestration engine:
1. Read container CPU cores
# this value divided by 100000 is the container core number ~ # cat / sys/fs/cgroup/cpu/cpu.cfs_quota_us 400000
2. Get the container memory usage (USAGE / LIMIT)
~ # cat / sys/fs/cgroup/memory/memory.usage_in_bytes 4289953792 ~ # cat / sys/fs/cgroup/memory/memory.limit_in_bytes 4294967296
Dividing these two values gives the percentage of memory usage.
3. Get whether the container is set with OOM and whether OOM has occurred.
~ # cat / sys/fs/cgroup/memory/memory.oom_control oom_kill_disable 0under_oom 0 ~ # ~ #
Here is an explanation:
Oom_kill_disable defaults to 0, which means that oom killer is turned on, that is, the kill process is triggered when memory times out. You can specify disable oom when using docker run, set this value to 1, and turn off oom killer
The value of under_oom is used only to indicate whether the current state of the CGroups is already oom. If so, the value will be displayed as 1.
4. Get the container disk Ibank O
~ # cat / sys/fs/cgroup/blkio/blkio.throttle.io_service_bytes253:16 Read 20015124480253 Write 16 Write 24235769856253 Write 16 Sync 0253 Write 16 Async 44250894336253 cat 16 Total 44250894336Total 44250894336
5. Obtain the inbound / outbound traffic of container virtual Nic
~ # cat / sys/class/net/eth0/statistics/rx_bytes 10167967741 ~ # cat / sys/class/net/eth0/statistics/tx_bytes 15139291335 ~ #
If you are interested in reading CGroups from the container, you can click "read the original text" at the bottom to learn about the docker stats source implementation.
Use LXCFS
 
Due to habitual reasons, it is still a common requirement to use commands such as top and free in the container, but the / proc and / sys directories in the container are still mounted to the host directory, and there is an open source project: LXCFS. LXCFS is a set of user-mode file system based on FUSE. Using LXCFS makes it possible for you to continue to use top, free and other commands in the container. However, it should be noted that there may be many problems with LXCFS, so it is recommended that you do not use it in the online environment.
Summary
The container has brought us a lot of convenience, and many companies have moved or are moving their business to the container. In the process of migration, you need to know whether the problem described above will affect the normal operation of the application, and take appropriate measures to bypass the pit.
So much for the sharing of this article. I hope it will be helpful to all of you.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

The market share of Chrome browser on the desktop has exceeded 70%, and users are complaining about

The world's first 2nm mobile chip: Samsung Exynos 2600 is ready for mass production.According to a r


A US federal judge has ruled that Google can keep its Chrome browser, but it will be prohibited from

Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope





 
             
            About us Contact us Product review car news thenatureplanet
More Form oMedia: AutoTimes. Bestcoffee. SL News. Jarebook. Coffee Hunters. Sundaily. Modezone. NNB. Coffee. Game News. FrontStreet. GGAMEN
© 2024 shulou.com SLNews company. All rights reserved.