How to solve the problem of missing containerd Image Files 07/09 Update SLTechnology News&Howtos

How to solve the problem of missing containerd Image Files

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article will explain in detail how to solve the problem of missing containerd image files. The content of the article is of high quality, so the editor will share it for you as a reference. I hope you will have a certain understanding of the relevant knowledge after reading this article.

Recently, in the process of providing technical support to a customer, we encountered the problem of missing files in containerd images. After a series of analysis, inference, reproduction and troubleshooting, we finally succeeded in finding the root cause and giving a solution. Now the whole detailed processing process will be sorted out into an article, hoping to provide you with a valuable way to deal with the problem and help you better understand the relevant principles.

Description of the problem of missing files in containerd image

Recently, some customers have reported the strange phenomenon of file loss in some container images. After simulation, the loss is summarized as follows:

Some specific images will steadily lose files.

"lost" reappears steadily in some distributions, but not on ubuntu

The v1.2 version of containerd will lose files, but v1.3 will not.

By reading the source code and documentation, I finally solved the problem of containerd image loss, and wrote this article, hoping to share with you the experience of solving the problem and the principle of image generation. For the convenience of some anxious students, this article will first reveal the answer to this question.

Root causes and solutions

Because of the kernel overlay module Bug, when containerd downloads the image's "compression package" from the image repository to generate the image's "layer", overlay mistakenly passes the trusted.overlay.opaque=y xattrs from the lower layer to the upper layer. If a directory has this property set, overlay will think that the directory is opaque, so that during a joint mount, the directory will overwrite the following directory, resulting in the loss of image files.

There are two solutions to this problem, one is simple and crude, and you can upgrade the overlay module in the kernel directly.

Another option is to consider upgrading containerd from v1.2 to v1.3, because the above opaque attribute is actively set in containerd v1.3, and this version of containerd does not trigger the bug of overlayfs. Of course, this approach is to circumvent rather than completely solve Bug.

Analysis of the principle of generating Mirror by snapshotter

Although the root cause seems simple, the process of analysis is tortuous. Before sharing the troubleshooting process and harvest of this problem, in order to facilitate your understanding, this section will focus on the knowledge of containerd and overlayfs involved in the troubleshooting process, which can be skipped by students who know better or are not interested.

Unlike the initial design of docker daemon, in order to reduce coupling, containerd consists of multiple modules through plug-ins. Combined with the following figure, you can see that the modules related to mirroring include the following:

Metadata is a kv storage module implemented by containerd through bbolt, which is used to store meta-information such as images, containers or layers. For example, the command line ctr lists all snapshot or kubelet to get data that all pod is queried through the metadata module.

Content is the module responsible for saving blob. Generally speaking, there are three types of content about images:

Mirrored manifest (a normal json that specifies the mirrored config and the mirrored layers array)

Config of the image (also a json, which specifies the meta-information of the image, such as startup commands, environment variables, etc.)

Mirrored layer (tar package, which will generate mirrored layers after decompression and processing)

Snapshots is the general term for snapshot modules, which can be set to use different snapshot modules, such as overlayfs, aufs, or native. During unpack, snapshots will generate the mirror layer and save it to the file system; when you run the container, you can call the snapshots module to provide rootfs to the container.

There are mainly three container image specifications: docker and oci v1 and v2. Considering that these three specifications are more or less the same in principle, you can refer to the following example to treat manifest as a meta-information of only one copy of each image, which is used to point to the config and layer of each image. Config is the image configuration, which is required to run the image as a container, and layer is each layer of the image.

Type manifest struct {c config layers [] layer}

The image download process is consistent with the order marked by the numbers in figure 1, and the role of each step is summarized as follows:

First add an image to the metadata module so that we can see the image when we execute the list image.

Secondly, you need to download the image, because the image is composed of manifest, config, layers and other parts, so download the manifest of the image and save it to the content module, and then parse the manifest to get the address of config and the address of layers. Next, download and save the config and each layer to the content module. Here, you need to emphasize that the mirrored layer should be a directory and jointly mounted to the root when the container is created. However, in order to facilitate network transmission and storage, it will be saved by tar + compression. Saving here to content is also unzipped.

The roles of ③, ④ and ⑤ are strongly related, which are explained together here. The snapshot module reads the manifest to the content module, finds all the mirrored layers, then goes to the content module to read these layers from "bottom" to "top", decompresses and processes them one by one, and finally puts them under the directory of the snapshot module, such as 1001/fs and 1002/fs in figure 1. These are mirrored layers. (when creating a container, you need to mount these layers together to generate the rootfs of the container, which can be understood as 1001/fs + 1002/fs +. = > 1008/work).

The function call relationship of the whole process is shown in figure 2. Students who like to read the source code can take a look at this.

To make it easier to understand, next use layer to represent layers in snapshot, and refer to newly downloaded unprocessed "layers" as mirror layer tar packages or tar packages.

The process of downloading the image and saving it to content is relatively simple. Just skip it. The process of generating layer in snapshot through mirrored tar packages is ingenious, and even bug appears here, which will be described in detail next.

First of all, we get the manifest of the image through content, so we know what layers the image is made of. The image in the bottom layer is relatively simple, just decompress it to the directory provided by snapshot, such as 10/fs. Assuming that the next step is to generate a second layer in 11/fs (when 11/fs is still empty), snapshot will use mount-t overlay overlay-o lowerdir=10/fs,upperdir=11/fs,workdir=11/work tmp to mount the generated layer 10 and ungenerated layer 11 to a tmp directory, where the write layer is 11/fs, which is the layer we want to generate. Go to content to get the tar package corresponding to layer 11, traverse the tar package, and write or delete the mount point tmp according to the different files in the tar package (because it is a joint mount, the operation on the mount point will become an operation on the write layer). The specific logic of converting a tar package to layer is consistent with the following simplified source code. You can see that if there is a whiteout file in the tar package or if the current layer, such as 11/fs, conflicts with the previous layer, such as 10/fs, the underlying directory will be deleted. After the file of the tar package is written to the directory, the xattr,PAXRecords is added to the file according to the PAXRecords recorded in the tar package. This can be seen as a kv array for every file in tar, which can be used to map file attributes in the file system.

/ / tmp here is the file in the overlay mount point applyNaive (tar, tmp) {for tar.hashNext () {tar_file: = tar.Next () / / tar package real_file: = path.Join (root) File.base) / / Real-world files / / delete files if isWhiteout (info) {whiteRM (real_file)} if! (file.IsDir () & & IsDir (real_file)) {rm (real_file)} / / write the files of the tar package to layer createFileOrDir (tar_file, real_file) for k V: = range tar_file.PAXRecords {setxattr (real_file, k, v)}

The situations that need to be deleted are summarized as follows:

If a directory with the same name exists, both merge

If there is a directory with the same name but not all directories, you need to delete the lower directory (upper file lower directory, upper file lower file, upper file lower file)

If there is a .wh. File, you need to remove the underlying directory that should be overwritten, for example, if there is a .wh.wh.opaque file under the directory, you need to delete the corresponding directory in lowerdir.

Of course, the deletion here is not that simple, remember that the current operation is to delete the underlying files through the mount point? In overlay, if you delete the contents of the lower layer through the mount point, you will not really kill the file from the lower file directory, but will add whiteout to the upper layer. One of the ways to add whiteout is to set the xattr trusted.overlay.opaque=y of the upper directory.

When the traversal of the tar package is over, do a umount to the tmp, and the 11/fs you get is the layer we want. When we want to generate the layer of 12/fs, we only need to mount the 10max fsjel11max fs as the lowerdir and the 12/fs as the upperdir. In other words, each layer generation after the mirror needs to mount the previous layer, and the following figure illustrates the whole process.

You can think about why you go to so much trouble? There are two key points.

First, the deletion of lower-level files in the image is to follow the definition of the whiteout file in image-spec (image-spec), which will only be used as an identity in the tar package and will not have a real impact. What really matters is that when the whiteout file is encountered in applyNaive, the federated file system will be called to delete the underlying directory. Of course, this deletion is marked opaque for overlay.

Second, because of the phenomenon that files and directories overwrite each other, the files in each tar package need to be compared with all the contents of the tar package before, if we do not borrow the "super power" of the federated file system, we can only take each file in tar to traverse the previous layer.

Problem troubleshooting process

With the knowledge of mirroring, let's take a look at the troubleshooting process of this problem. First of all, we look at the user's container. After simplification and coding, the directory structure is as follows, where the directory modules is the place where accidents often occur.

/ data └── prom ├── bin └── modules ├── file └── lib/

Then take a look at the layers of the user's image. We mark the mirrored layers with incremental ID from the bottom up, and there are 5099, 5101, 5102, 5103, and 5104 layers that have been modified to this directory. After running the container, you will see the same modules directory as provided in 5104. There is no combination of 5103 and other "below" images, which is equivalent to 5104 overwriting the following directories (of course, there is a difference between 5104 and 5103 files).

5104 Why is the lower directory overwritten?

When you see here, the first thought is whether there is a problem with the parameters when creating the container's rootfs, resulting in less mount layers? So simulate manual mount mount-t overlay overlay-o lowerdir=5104:5103 point to mount the top two layers, and as a result, 5104 still covers 5103. It is inferred here that there may be a .wh of overlay. File, so try to search in these two layers. WH. Files, no results. So look up the overlayfs documentation:

A directory is made opaque by setting the xattr "trusted.overlay.opaque" to "y". Where the upper filesystem contains an opaque directory, any directory in the lower filesystem with the same name is ignored.

Directories with the property trusted.overlay.opaque=y set will become "opaque", and directories with the same name in the lower layer will be ignored when the upper file system is set to "opaque". Overlay needs to set this property if it wants to overwrite the lower layer in the upper layer.

The command getfattr-n "trusted.overlay.opaque" dir shows that the / data/asr_offline/modules below 5104 does have this attribute, which in turn causes the lower-level directory to be "overwritten".

[root@] $getfattr-n "trusted.overlay.opaque" 5104/fs/data/asr_offline/modules# file: 5102 qpxqfqpxxxxxxxxxxxxxxxxxxxxxxxxx

A lot of twists and turns, layer after layer of investigation, so the question is, why only a specific release will appear this phenomenon? We tried to pull down the image in ubuntu and found that the "homology" directory did not set opaque! Since the mirrored layer is generated by decompressing and unpacking the source files, we decided to make sure that the md5 of the "mirror source files" in different operating systems is the same, unpack and re-mount the mirror source files through tar-zxf on each operating system, and find that 5104 will not overwrite 5103.

Inferred from the above phenomenon, it may be that some distributions of containerd read the tar package from content and decompressed the layer of snapshot, which mistakenly set the directory of snapshot to this property.

To verify this inference, I decided to comb through the source code and found the suspicious point (the related code is as follows)-traversing the tar package when generating layers reads the PAXRecords of each file and sets this on the file's xattr (tar package prepares PAXRecords for each file, which is equivalent to Pod's labels).

Func applyNaive () {/ /... For k, v: = range tar_file.PAXRecords {setxattr (real_file, k, v)}} func setxattr (path, key, value string) error {return unix.Lsetxattr (path, key, [] byte (value), 0)}

Because containerd, which was previously tested with v1.3, does not have this problem, comparing the code of the following two, it is found that the logic of extracting the PAXRecords setting xattr from the tar package is different. The code for v1.3 is as follows:

Func setxattr (path, key, value string) error {/ / Do not set trusted attributes if strings.HasPrefix (key, "trusted.") {return errors.Wrap (unix.ENOTSUP, "admin attributes from archive not supported")} return unix.Lsetxattr (path, key, [] byte (value), 0)}

That is to say, trusted. 0 will not be set in v1.3.0. The beginning of the xattr! If a directory in the tar package contains the PAX of trusted.overlay.opaque=y, the lower version of containerd may set these properties to the snapshot directory, while the higher version will not. Then, when the user is packing, if the opaque is also typed into the tar package, the extracted layer corresponding directory will also have this attribute. 5104 this directory may be the reason why it became opaque.

To test this point, I wrote a simple program to scan the content corresponding to layer for this attribute, and found that layers 5102, 5103, and 5104 did not have this attribute. At this point, I began to doubt this view, after all, if there is only a special logo in the tar package, it should not behave differently in different operating systems.

Scanned 5099 and 5101 with the last glimmer of hope, and sure enough, there is no such attribute. However, during the scan, the file / data/asr_offline/modules/.wh..wh.opq was noticed in 5101 of the tar package. Remember that when looking at the code applyNaive, if you encounter .wh.wh.opq, the corresponding operation should be to delete / data/asr_offline/modules at the mount point, and deleting the lower directory in overlay will add trusted.overlay.opaque=y to the upper directory with the same name. In other words, when generating layer 5101 (5100 and 5099 need to be mounted in advance), if you encounter this wh file when traversing the tar package, you should first delete modules at the mount point, that is, add opaque=y to the corresponding directory in 5101.

Once again, with the mentality of verifying the results of the source code, go to the 5101/fs of snapshot to check the opaque of the directory modules, as expected. These files should all be in the lower layer, so the operation of the corresponding overlayfs should be to set trusted.overlay.opaque=y in the / data/asr_offline/modules directory of upper, that is, layer 5101. To check this directory of 5101, sure enough, with this attribute, curiosity drove me to continue to look at the 5102, 5103, 5104 of the directory, found that all have this attribute.

That is, each of these layer will overwrite the following? This doesn't seem to be in line with common sense. So, look at the normal ubuntu and find that only 5101 have this attribute. After repeatedly confirming that there is no whiteout file of directory modules in the tar package of 5102, 5103 and 5104, that is to say, the original intention of the mirror is to let 5101 cover the following layer, and then merge the modules directory of 5101, 5102, 5103, 5104. In the whole process of generating an image, only the layer that "borrows" overlay to generate snapshot involves the operating system.

The clouds opened and the fog dispersed, making a bold guess.

Let's take a wild guess, as shown below, when generating layer 5102, because the kernel or overlay's bug added opaque attributes to modules?

In order to test this feature separately, a simple script is written. After running the script, I found that in this distribution, if the lower-level directory of overlay has this attribute and the same directory is created in the upper layer, the opaque will be "propagated" to the directory of the upper layer. If the image is generated recursively like containerd, every layer above must have this attribute since there is a whiteout layer, which results in the container seeing only the top layer in some specific directories.

`#! / bin/bashmkdir 12 work pmkdir 1/functouch 1/func/minmount-t overlay overlay p-o lowerdir=1,upperdir=2,workdir=workrm-rf p/funcmkdir-pp / functouch p/func/maxumount pgetfattr-n "trusted.overlay.opaque" 2/funcmkdir 3mount-t overlay overlay p-o lowerdir=2:1,upperdir=3,workdir=worktouch p/func/sqrtumount pgetfattr-n "trusted.overlay.opaque" 3 / func` is finally confirmed to be the bug of the kernel overlayfs module with the help of several kernel bosses. Xattr is not detected when copy_up is called in the lower layer, causing the xattr of opaque to propagate to the upper layer. When doing joint mount, if the upper file gets this attribute, it will naturally overwrite the lower file, and the file will be lost in the image. On how to solve the problem of missing containerd image files to share here, I hope that the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.