How to solve the problem of container being kill caused by insufficient memory in K8s cluster environment 07/19 Update SLTechnology News&Howtos

How to solve the problem of container being kill caused by insufficient memory in K8s cluster environment

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces "how to solve the problem of container being kill caused by insufficient memory in K8s cluster environment". In daily operation, it is believed that many people have doubts about how to solve the problem of container being kill caused by insufficient memory in K8s cluster environment. The editor consulted all kinds of materials and sorted out simple and useful operation methods. I hope it will be helpful to answer the question of "how to solve the problem of container being kill caused by insufficient memory in K8s cluster environment"! Next, please follow the editor to study!

Background

Recently, there is a problem in the online environment. The tomcat container in the k8s cluster environment Pod is directly killd after running for a period of time, but sometimes everything looks normal and it is impossible to accurately judge when the Killd problem occurs.

This article explains why Linux is out of memory and why specific processes are killed. It also provides a tutorial on troubleshooting the Kubernetes cluster environment.

Analysis of the causes of tomcat process being killed

When this application is troubleshooting by a kill problem, it is largely determined that the operating system killed it, because the entire process confirms that no kill operation is taking place. When I looked at the tomcat log, I found that tomcat simply prompted killd. As for the reason, there was no detailed prompt in the log. Then I looked at the syslog log grep-I kill / var/log/messages*, syslog to give a more detailed hint, which probably means that the application takes up more memory than the cgroup limit and is directly used by Kill. As follows:

Oct 1 20:37:21 k8swork kernel: Memory cgroup out of memory: Kill process 13547 (java) score 1273 or sacrifice child

If you use free to check the memory usage when the service has hung up, it will not be very helpful for us to troubleshoot the problem, because at this time, the memory occupied by the service has been released with the service hanging. As follows:

[root@k8swork log] # free-lm

Total used free shared buffers cached

Mem: 498 93 405 0 15 32

Low: 498 93 405

High: 0 0 0

-/ + buffers/cache: 44 453

Swap: 1023 0 1023

However, Linux vmstat can redirect the output to a file using the following command. We can even adjust the duration and number of times to monitor longer periods of time. When the command is running, we can view the output file at any time to see the results. We check the memory 1000 times every 120 seconds. The end of the line allows us to run it as a process and get the terminal back.

Vmstat-SM 120 1000 > memoryuse.out &

From the above information, it can be determined that the culprit is that the Java process takes up more memory than the resource limit and is directly killed by the system. Why did this problem arise?

First of all, the maximum usage of resources has been limited to 4G in the orchestration file. Theoretically, it is impossible for containers in Pod to occupy so many resources. By default, Java takes up about 1x4 of physical resources, but since this problem has occurred, it means that the resources occupied by the Java process exceed this limit.

Therefore, the following information is found on the Internet, which roughly means that jdk supports the container's restrictions on memory and CPU through options since version 131, as shown in the following figure:

Https://blogs.oracle.com/java-platform-group/java-se-support-for-docker-cpu-and-memory-limits

When I opened the update message for version 131, I didn't see any updates about the container, so I started looking for the later version, and finally found version 191, and I could see that Java supported the container.

Https://www.oracle.com/java/technologies/javase/8u191-relnotes.html

Check the current problematic Java version, which is obviously lower than this version, and determine the problem.

The Java virtual machine is not aware of the resource limit in Pod, so it directly occupies about 1x4 memory of the host (32 GB of memory of the host). Cgroup detects that the memory consumption of Pod exceeds the limit (Pod is limited to 4G) and performs Kill operation.

The solution is also simple: configure the maximum and minimum memory footprint directly in the tomcat service and limit its memory footprint at the Java level. However, it is necessary for business developers to figure out why specific Java processes take up so much memory.

Summary

Through this article, we can see that in the virtual mechanism construction project based on Java, we should try our best to adapt to the high version or the Jdk version with affinity to docker container in the containerization process. If not, we must limit the amount of memory occupied by Java service at the virtual machine level. In addition, be sure to add a survival probe to the service. If you do not add a survival probe, it is similar to a container service like tomcat. Even if the internal service dies, Kubernetes will not automatically pull it up for you. The reason is simple: it cannot sense whether your service is alive or not. Therefore, the Http survival probe must be added to the service (the probe based on the TCP layer only detects whether the port is alive. In most cases, the service will have the problem of suspended animation, but the port can still be accessed normally).

Problem troubleshooting Guide tutorial recommendation

First of all, this book is summarized by Ali Yun, which not only introduces the core concepts of Kubernetes in easy-to-understand language, but also introduces the ideas of solving problems in Kubernetes cluster, which is worth using for reference. For example, in one of the cases, the problem of Ca certificate expiration at two o'clock in the morning not only introduces the whole troubleshooting and solution in detail, but also gives an introduction to the process of certificate authentication system in cluster environment.

Citadel certificate system

At this point, the study on "how to solve the problem of container being kill caused by insufficient memory in K8s cluster environment" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.