How to reduce the volume of Docker image by 99% 07/06 Update SLTechnology News&Howtos

How to reduce the volume of Docker image by 99%

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article introduces the knowledge of "how to reduce the volume of Docker images by 99%". In the operation of practical cases, many people will encounter this dilemma. Next, let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

1. The root of all evil

I bet everyone who builds a Docker image with their own code for the first time will be intimidated by the size of the image. Let's take a look at an example.

Let's move out of that tried-and-tested hello world C program:

/ * hello.c * / int main () {puts ("Hello, world!"); return 0;}

And use the following Dockerfile to build the image:

FROM gccCOPY hello.c .Run gcc-o hello hello.cCMD [". / hello"]

Then you will find that the volume of the successfully built image is much larger than 1 GB. Because the image contains the contents of the entire gcc image.

If you use the Ubuntu image, install the C compiler, and finally compile the program, you will get an image of about 300 MB, which is much smaller than the image above. But it's still not small enough, because the compiled executable is less than 20 KB:

$ls-l hello-rwxr-xr-x 1 root root 16384 Nov 18 14:36 hello

Similarly, the Go language version of hello world will get the same result:

Package mainimport "fmt" func main () {fmt.Println ("Hello, world!")}

The size of the image built using the base image golang is 800 MB, while the compiled executable is only 2 MB:

$ls-l hello-rwxr-xr-x 1 root root 2008801 Jan 15 16:41 hello

It is still not ideal, is there any way to greatly reduce the volume of the mirror image? Look down.

In order to more intuitively compare the size of different images, all images use the same image name and different labels. For example, hello:gcc,hello:ubuntu,hello:thisweirdtrick, etc., so that you can directly use the command docker images hello to list all the mirrors named hello without being disturbed by other mirrors.

two。 Multi-stage construction

Multi-stage construction is essential in order to significantly reduce the size of the mirror. The idea of a multi-stage build is simple: "I don't want to include a bunch of C or Go compilers and the entire compilation tool chain in the final image, I just want a compiled executable!"

A multi-phase build can be recognized by multiple FROM instructions, each FROM statement represents a new build phase, and the phase name can be specified with the AS parameter, for example:

FROM gcc AS mybuildstageCOPY hello.c .Run gcc-o hello hello.cFROM ubuntuCOPY-- from=mybuildstage hello .CMD [". / hello"]

This example uses the base image gcc to compile the program hello.c, and then starts a new build phase, which uses ubuntu as the base image, copying the executable hello from the previous stage to the final image. The final image size is 64 MB, which is 95% less than the previous 1.1 GB:

? → docker images minimageREPOSITORY TAG... SIZEminimage hello-c.gcc... 1.14GBminimage hello-c.gcc.ubuntu... 64.2MB

Can we continue to optimize? Of course. Before we proceed with the optimization, let me remind you:

You don't have to use the keyword AS when declaring the build phase, and you can directly use the sequence number to indicate the previous build phase (starting from scratch) when copying the file in the final phase. That is, the following two lines are equivalent:

COPY-- from=mybuildstage hello. Copy-- from=0 hello.

If the Dockerfile content is not very complex and there are not many build phases, you can directly use the sequence number to represent the build phase. Once the Dockerfile becomes complex and the build phase increases, it is best to name each phase with the keyword AS, which is also convenient for later maintenance.

Use classic base mirrors

I strongly recommend using classic base images in the first phase of the build, where classic images refer to images such as CentOS,Debian,Fedora and Ubuntu. You may also have heard of Alpine mirroring. Don't use it! At least not for the time being. I'll tell you what the pits are later.

COPY-from uses absolute paths

When copying files from the previous build phase, the path used is relative to the root directory of the previous phase. If you use the golang image as the base image for the build phase, you will encounter a similar problem. Suppose you use the following Dockerfile to build the image:

FROM golangCOPY hello.go .Run go build hello.goFROM ubuntuCOPY-- from=0 hello .CMD [". / hello"]

You will see an error like this:

COPY failed: stat / var/lib/docker/overlay2/1be...868/merged/hello: no such file or directory

This is because the COPY command wants to copy / hello, while the WORKDIR of the golang image is / go, so the real path to the executable is / go/hello.

Of course you can use absolute paths to solve this problem, but what if the underlying image changes the WORKDIR later? You also have to constantly modify the absolute path, so this scheme is still not very elegant. The best way is to specify the WORKDIR in the first stage and copy the files in the absolute path in the second stage, so that even if the underlying image modifies the WORKDIR, it does not affect the construction of the image. For example:

FROM golangWORKDIR / srcCOPY hello.go .Run go build hello.goFROM ubuntuCOPY-from=0 / src/hello .CMD [". / hello"]

The final effect is amazing, reducing the volume of the mirror directly from 800 MB to 66 MB:

? → docker images minimageREPOSITORY TAG... SIZEminimage hello-go.golang... 805MBminimage hello-go.golang.ubuntu-workdir... 66.2MB3. The magic of FROM scratch

Back to our hello world,C language version of the program size is 16 kB,Go language version of the program size is 2 MB, so can we reduce the image to this small? Can I build an image that contains only the program I need without any extra files?

The answer is yes, you just need to change the basic image of the second phase of the multi-phase build to scratch. Scratch is a virtual image that cannot be pull or run because it represents empty, nothing! This means that the new mirror is built from scratch and there are no other mirror layers. For example:

FROM golangCOPY hello.go. Run go build hello.goFROM scratchCOPY-- from=0 / go/hello .CMD [". / hello"]

The size of the image built this time is exactly 2 MB, which is perfect!

However, using scratch as the base image will cause a lot of inconvenience, so let me tell you one by one.

Missing shell

The first disadvantage of scratch mirroring is that there is no shell, which means that strings cannot be used in CMD/RUN statements, for example:

.. FROM scratchCOPY-- from=0 / go/hello .CMD. / hello

If you use the built image to create and run the container, you will encounter the following error:

Docker: Error response from daemon: OCI runtime create failed: container_linux.go:345: starting container process caused "exec:\" / bin/sh\ ": stat / bin/sh: no such file or directory": unknown.

As you can see from the error message, the image does not contain / bin/sh, so the program cannot be run. This is because when you use a string as a parameter in a CMD/RUN statement, these parameters are executed in / bin/sh, that is, the following two statements are equivalent:

CMD. / helloCMD / bin/sh-c ". / hello"

The solution is also simple: * * use JSON syntax instead of string syntax. * * for example, replace CMD. / hello with CMD [". / hello"] so that Docker runs the program directly and does not run it in shell.

Lack of debugging tools

The scratch image does not contain any debugging tools, ls, ps, ping, of course, shell (mentioned above), you cannot use docker exec to enter the container, you cannot view network stack information, and so on.

If you want to view the files in the container, you can use docker cp;. If you want to view or debug the network stack, you can use docker run-- net container:, or use nsenter; to better debug the container. Kubernetes also introduced a new concept called Ephemeral Containers, but it is still an Alpha feature.

Although there are so many miscellaneous ways to help us debug containers, they can make things more complicated. We are looking for simplicity, and the simpler the better.

With a compromise, you can choose busybox or alpine images to replace scratch, although they have a few more MB, but on the whole, it is worth sacrificing a small amount of space for debugging convenience.

Missing libc

This is the most difficult problem to solve. When using scratch as the basic image, the Go version of hello world runs happily, but the C version fails, or it is impossible to change to a more complex Go program (for example, using a network-related toolkit). You will encounter an error similar to the following:

Standard_init_linux.go:211: exec user process caused "no such file or directory"

The missing files can be seen from the error message, but they do not tell us which files are missing. In fact, these files are the dynamic libraries (dynamic library) necessary for the program to run.

So, what is a dynamic library? Why do I need dynamic libraries?

The so-called dynamic library and static library refers to the linking phase of program compilation and the way of linking into executable files. Static library means that the target file generated by assembly is packaged into an executable file with the referenced library during the linking phase, so the corresponding linking method is called static linking (static linking). The dynamic library is not connected to the object code when the program is compiled, but is loaded when the program is running, so the corresponding linking method is called dynamic linking (dynamic linking).

Programs in the 1990s mostly used static links, because most of them ran on floppy disks or cassettes, and there was no standard library at that time. In this way, the program has nothing to do with the function library when it is running, and it is convenient to transplant. But for time-sharing systems such as Linux, multiple programs will be run concurrently on the same hard disk, and these programs will basically use the standard C library, so the advantage of using dynamic linking is realized. When using dynamic links, executables do not contain standard library files, only indexes to those library files. For example, a program that relies on the cos and sin functions in the library file libtrigonometry.so will find and load the libtrigonometry.so based on the index when the program runs, and then the program can call the functions in the library file.

The benefits of using dynamic links are obvious:

Save disk space, and different programs can share common libraries.

To save memory, shared libraries only need to be loaded into memory from disk once, and then shared between different programs.

It is easier to maintain, and after the library file is updated, there is no need to recompile all programs that use the library.

Strictly speaking, the combination of dynamic libraries and shared libraries (shared libraries) can achieve the effect of memory saving. The extension of the dynamic library in Linux is .so (shared object), while the extension of dynamic library in Windows is .DLL (Dynamic-link library).

Going back to the original question, by default, C programs use dynamic links, and so do Go programs. The above hello world program uses the standard library file libc.so.6, so only if the file is included in the image can the program run properly. It is certainly not possible to use scratch as the basic image, and it is not possible to use busybox and alpine, because busybox does not contain a standard library, while the standard library used by alpine is musl libc, which is not compatible with the commonly used standard library glibc. Subsequent articles will be explained in detail, so I will not repeat them here.

So how to solve the problem of standard library? There are three options.

1. Use static libraries

There are many ways to get the compiler to compile the program using static libraries. If you use gcc as the compiler, you only need to add a parameter-static:

$gcc-o hello hello.c-static

The size of the compiled executable is 760 kB, which is much larger than the previous 16kB, because the executable contains the library files needed to run it. The compiled program can run in the scratch image.

If you use the alpine image as the base image to compile, the resulting executable file will be smaller

< 100kB），下篇文章会详述。 2、拷贝库文件到镜像中为了找出程序运行需要哪些库文件，可以使用 ldd 工具： $ ldd hello linux-vdso.so.1 (0x00007ffdf8acb000) libc.so.6 =>

/ usr/lib/libc.so.6 (0x00007ff897ef6000) / lib64/ld-linux-x86-64.so.2 = > / usr/lib64/ld-linux-x86-64.so.2 (0x00007ff8980f7000)

As can be seen from the output, the program only needs a library file called libc.so.6. Linux-vdso.so.1 is related to a mechanism called VDSO, which is used to speed up certain system calls and is optional. Ld-linux-x86-64.so.2 represents the dynamic linker itself and contains information about all dependent library files.

You can choose to copy all the library files listed by ldd to the image, but this can be difficult to maintain, especially if the program has a large number of dependent libraries. For hello world programs, copying library files is no problem at all, but for more complex programs (such as those that use DNS), there is a puzzling problem: glibc (GNU C library) implements DNS through a rather complex mechanism called NSS (Name Service Switch, name service switch). It requires a configuration file / etc/nsswitch.conf and additional libraries, but these libraries are not displayed when using ldd because they are not loaded until the program runs. If you want DNS parsing to work correctly, you must copy these additional library files (/ lib64/libnss_*).

Personally, I do not recommend copying the library file directly, because it is very difficult to maintain, needs to be constantly changed later, and there are many unknown hidden dangers.

3. Use busybox:glibc as the basic image

There is one mirror that can solve all these problems perfectly, and that is busybox:glibc. It is only 5 MB in size and includes glibc and various debugging tools. If you want to choose a suitable image to run programs that use dynamic links, busybox:glibc is the best choice.

Note: if your program uses libraries other than standard libraries, you still need to copy these library files to the image.

This is the end of "how to reduce the volume of Docker images by 99%". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.