How to analyze Docker file system 04/19 Update SLTechnology News&Howtos

How to analyze Docker file system

2025-04-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article will explain in detail how to analyze the Docker file system, the content of the article is of high quality, so the editor will share it for you as a reference. I hope you will have a certain understanding of the relevant knowledge after reading this article.

Let's build a Docker image, then instantiate the container, and analyze the file storage situation and DockerFile optimization strategy of Docker in detail during the life cycle of Docker.

Before we start the actual combat, let's introduce a concept, federated file system (Union File System). The federated file system is the technical basis for implementing Docker mirroring. it supports the modification of the file system as a commit to overlay layer by layer, and different directories can be mounted to the same virtual file system at the same time. Hierarchical storage and inheritance of mirrors are implemented based on this feature.

Here is an official Docker picture describing the file system, showing the role of a federated file system in concatenating the mirror layer and the container layer

Docker supports a variety of federated file systems, most commonly aufs,deviceMapper,overlay,overlay2. The system version used in this article is debian9.1,Docker version, 17.06.2-ce, and the default is overlay2.

See here if you already have a simple concept of the Docker file system, let's get started and get a deeper understanding of how hierarchical file systems are stored.

Mirror layer

This is a basic jdk8 image created based on the debian system image in the Yunxin privatization project. In order to facilitate reading and analysis, we have made some simplification in Dockerfile, leaving only the core content.

FROM hub.c.163.com/library/debian:stretch

MAINTAINER nim

# download jdk

ADD http://10.173.11.100/nim/jdk-8u202-linux-x64.tar.gz / usr/local/nim/

# decompress jdk and delete

RUN tar-xzvf / usr/local/nim/jdk-8u202-linux-x64.tar.gz-C / usr/local/nim/\

& & rm / usr/local/nim/jdk-8u202-linux-x64.tar.gz

# set environment variables

ENV JAVA_HOME=/usr/local/nim/jdk1.8.0_202

ENV PATH=$JAVA_HOME/bin:$PATH

CMD ["/ bin/bash"]

According to the build image, view the construction result. The original basic image is 100m, and the volume of the built image is 697m.

Mirror storage

Now let's take a look at how the build image is stored at the file layer. First of all, let's use Docker history to take a look at the image we just built. We can see that the basic image occupies 100m, and the two image layers occupy 194MB and 403m.

Next, let's take a look at the storage in the file system. The default path for overlay2,Docker mirror layer storage in this environment is / var/lib/Docker/overlay2/,. You can see that there are four directories under the image storage directory, of which 110m corresponds to the basic image, and the other two are ADD JDK (186m) and the image layering of the decompressed JDK package (389m).

The l directory contains soft links for all layers, and the soft links use short names to prevent the parameters from reaching the page size limit when mount.

Let's take a look at the contents of the files in each layer. The basic image hierarchy contains the diff folder and the link file, the diff folder stores the current layering content, and the link file records the short name.

Next, take a look at the content generated by COPY JDK. The diff folder stores the jdk package. Compared with the basic mirror layer, there are three more lower,merged,work files / folders in this layer, in which lower records the lower layer ID (basic mirror layer) of this layer, the merged directory is used to provide a unified view in the container layer read and write layer, and the work directory is used to jointly mount the specified working directory, and the use process is not visible to users.

The folder structure of the decompressed JDK layer is similar to that of the previous layer, with the main concern that the space occupied by the jdk package is 0, indicating that it has been deleted.

Now let's focus on one problem: the image size is equal to the sum of all layers, and the jdk package deleted in the subsequent layers still takes up storage space, which is not our original intention, so there is a point for image file optimization. The optimized Dockerfile is as follows

FROM hub.c.163.com/library/debian:stretch

MAINTAINER nim

RUN curl-o / usr/local/nim/jdk-8u202-linux-x64.tar.gz http://10.173.11.100/nim/jdk-8u202-linux-x64.tar.gz\

& & tar-xzvf / usr/local/nim/jdk-8u202-linux-x64.tar.gz-C / usr/local/nim/\

& & rm / usr/local/nim/jdk-8u202-linux-x64.tar.gz\

& & export JAVA_HOME=/usr/local/nim/jdk1.8.0_202\

& & export PATH=$JAVA_HOME/bin:$PATH

CMD ["/ bin/bash"]

With this optimized content, let's talk about the points that can be optimized in time and space when building a Docker image:

Combined run statements: merge the same type of construction statements, which can effectively reduce mirror layering

Using image to build cache: time synchronization, basic software installation and other fixed contents are partially processed before the image, and the cache is used when the image is rebuilt to save time.

Clean up intermediates: pay attention to the software and package used during installation must be cleaned in the same layer, otherwise the image space will still be occupied.

Build statement optimization: for example, ADD can decompress local files directly, playing the role of COPY + RUN tar.

Optimize basic mirror sources: domestic universities and large IT enterprises have created mirror stations. Choosing a stable and timely mirror station can effectively shorten the construction time.

For example, the optimization strategy in the image involves three items. Curl is used instead of add and merged with decompression and deletion into one layer. Dockerfile reduces the number of layers and cleans up the jdk installation package in the intermediate process. The following figure shows the change of image volume after optimization:

Is it really better to build an image with as few layers as possible? Of course, it is not so absolute, especially when the early image version is not very stable or the subsequent iterations are frequent, reasonable image layering will reduce the compilation time, reduce the error probability, and make Dockerfile more readable. The image can be re-optimized after the stable version is formed.

Mirror metadata

To analyze a mirror metadata, we mainly focus on three directories.

/ var/lib/Docker/image/overlay2/imaged/

/ var/lib/Docker/image/overlay2/layerdb/

/ var/lib/Docker/overlay2/

The first directory holds the mirror basic metadata, the second directory holds the mirror hierarchical metadata, and the third directory is the tiered storage directory mentioned above, which holds the actual tiered content. Let's take a look at how metadata relates to stored information based on the actual situation.

The basic information of Docker image is saved under / var/lib/Docker/image/overlay2/imaged/content/sha256/, and the corresponding ID beginning file can be found in this directory according to Docker image ID. The hierarchical file system, construction information, related containers and other contents of the image are saved in the form of json.

The second directory / var/lib/Docker/image/overlay2/layerdb/sha256/ stores hierarchical metadata, and each hierarchical metadata directory contains cache-id,diff,size information, where cache-id corresponds to the hierarchical storage layer, and diff associates the mirror basic metadata information.

Container layer

First let's start a container and mount the host / opt/yunxin directory to the container / usr/local/yunxin directory

After the container is created, the initial layer and the read-write layer of the container are generated in the mirror storage directory / var/lib/Docker/overlay2/, both of which use the same identity, with-init after the initial layer. The initial layer mainly stores the environment information related to the container when initializing the container environment, such as container hostname, host host information and domain name service files. The read-write layer is used to read and write the container, and the processes in the Docker container only have write permission to the read-write layer and only read permission to the files in other layers.

Next, we enter the container operation to carry out a series of operations, and then analyze the preservation and processing of files in the read-write layer according to the results. The following are the operations and corresponding results, as well as the actual file storage in the read-write layer.

Serial number

Class type

Manipulate

Table show

one

Write a new file

Write / root/container_file.txt

Write read-write layer

two

Mount the directory to write to the new file

Write / usr/local/yunxin/mount_file.txt

Do not write to the read-write layer and save it only in the mount directory

three

Modify the original image file

Modify

/ usr/local/nim/jdk1.8.0_202/THIRDPARTYLICENSEREADME.txt

Write read-write layer

four

Delete mirrored original files

Delete

/ usr/local/nim/jdk1.8.0_202/README.html

Save on the read-write layer

The merged folder in the read-write layer provides a unified view of the final shape of the federated file system mount to the user.

Next, we start several container instances based on the same image, and then query the usage space of the Docker container. Only the first container takes up 154k because the modified file above takes up only 154k, and the newly launched container does not take up any additional space. It can be seen that when creating containers based on the same image, all containers share the contents of the mirror layer, which effectively saves space. The read-write layer only saves the changes, and if you are manipulating mirror layer files, Docker uses the copy-on-change policy (copy-on-write). At this point, if you look back at the two pictures that appear in the first section, you will have a deeper understanding of Docker's file system.

The knowledge of Docker image and container file system provides theoretical support for image management and OPS storage management of Yunxin privatized products, but this is only the beginning of an in-depth understanding of Docker. With the accumulation of time and the deepening of the privatization of IM, audio and video, VOD and many related products under Yunxin, more modules and images, more customers and needs, more complex networks and environments are gradually presented in front of us. Docker as the basis for building Yunxin privatized service, only a deeper understanding of the principle can better optimize products and carry out operation and maintenance in use. Hope that we can provide users with more reliable Yunxin privatization service, and hope to share more knowledge about Docker with you in the following articles.

On how to analyze the Docker file system to share here, I hope that the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.