How to optimize JVM Warm-up on Kubernetes 04/18 Update SLTechnology News&Howtos

How to optimize JVM Warm-up on Kubernetes

2025-04-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article mainly explains "how to optimize JVM Warm-up on Kubernetes". The content of the explanation is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "how to optimize JVM Warm-up on Kubernetes".

Genesis

A few years ago, we gradually moved from a single architecture to a micro-service architecture and deployed to Kubernetes. Most of the new services are developed in Java. We first encountered this problem when we enabled the Java service. The normal capacity planning process was performed through load testing and it was determined that N containers were sufficient to handle peak traffic that exceeded expectations.

Although the service handled peak traffic effortlessly, we began to find problems during deployment. Each of our Pod handles more than 10k of RPM during peak hours, while we use the Kubernetes rolling update mechanism. During deployment, the response time of the service surges for a few minutes and then stabilizes to its usual steady state. In our NewRelic dashboard, we will see a graph similar to the following:

At the same time, other services that depend on our deployment also experienced high response time and timeout errors during the relevant period of time.

Take 1: increase the number of applications

We quickly realized that the problem was related to the JVM warm-up phase, but because other important things were under way, we didn't have much time to troubleshoot. Therefore, we tried the simplest solution-increasing the number of containers to reduce the throughput of each container. We have almost tripled the number of Pod, so the throughput per Pod is about 4k RPM at peak times. We have also adjusted the deployment strategy to ensure a maximum of 25% deployment (using the maxSurge and maxUnavailable parameters). This solves the problem, and although we are running at three times the capacity required for steady state, we can deploy in our service or any related service without a problem.

Over the next few months, as we migrated more services, we began to notice this problem frequently in other services. Then we decided to spend some time troubleshooting the problem and finding a better solution.

Take 2: Warm-Up script

After reading various articles, we decided to try a warm-up script. The idea is to run a warm-up script that sends a comprehensive request to the service for a few minutes to warm up the JVM before allowing actual traffic to pass through.

To create a warm-up script, we crawled the actual URL from the production traffic. Then we create a Python script that uses these URL to send parallel requests. We have configured the initialDelaySeconds of the ready probe accordingly to ensure that the warm-up script is completed before the Pod is ready and starts to accept traffic.

To our surprise, although we have seen some improvements, this is not important. We still observe response times and errors. In addition, the warm-up script introduces new problems. Previously, our Pod was ready in 40-50 seconds, but with scripts, they took about 3 minutes, which became a problem during deployment, but more importantly during automatic scaling. We made some adjustments to the warm-up mechanism, such as a brief overlap between the warm-up script and the actual traffic, and changes in the script itself, but did not see a significant improvement. In the end, we decided that the small benefits of the warm-up strategy were not worth it, so we gave up completely.

Take 3: explore heuristic techniques

Now that our warm-up script idea was dashed, we decided to try some heuristic techniques:

GC (G1, CMS, and Parallel) and various GC parameters

Heap memory

CPU allocated

After several rounds of experiments, we finally made a breakthrough. The service we are testing is configured with Kubernetes resource limits:

Resources: requests: cpu: 1000m memory: 2000Mi limits: cpu: 1000m memory: 2000Mi

We added the CPU request and limited it to 2000m, and deployed the service to see the impact. We see a huge improvement in response time and errors compared to warm-up scripts.

For further testing, we upgraded the configuration to 3000m CPU, and surprisingly, the problem disappeared completely. As shown below, there is no peak response time.

It was soon discovered that the problem was CPU throttling. Obviously, during the warm-up phase, JVM requires more CPU time than the average steady state, but the Kubernetes resource processing mechanism (CGroup) is limiting CPU according to configured limitations.

There is a direct way to verify this. Kubernetes exposes the metric container_cpu_cfs_throttled_seconds_total for each container, which indicates how many seconds of CPU have been saved for this container since it was started. If we follow this target in a 1000m configuration, we should see a lot of throttling at the beginning and then stabilize in a few minutes. We deployed with this configuration, which is the container_cpu_cfs_throttled_seconds_total chart for all the Pod in Prometheus:

As expected, there will be a lot of throttling in the first 5 to 7 minutes of container startup-usually between 500 and 1000 seconds, but then it stabilizes, confirming our hypothesis.

When we deployed with a 3000m CPU configuration, we observed the following figure:

CPU throttling is almost negligible (almost all Pod is less than 4 seconds), which is why the deployment is going smoothly.

Take 4: configuring Burstable Qos

Although we found the bottleneck causing this problem, the solution (tripling the CPU request / limit) is not feasible from a cost point of view. This solution may actually be worse than running more Pod because the Kubernetes schedules the Pod according to the request, which may cause the cluster auto-scaler to trigger frequently, adding more nodes to the cluster.

Think again about this question:

During the initial warm-up phase (lasting several minutes), JVM requires more CPU (~ 3000m) than the configured limit (1000m). After preheating, even if the CPU is limited to 1000m JVMs, the JVM can reach its full potential. Kubernetes uses "request" instead of "limit" to schedule Pod.

Once we read the question statement clearly and calmly, the answer will appear: Kubernetes Burstable QoS.

Kubernetes assigns the QoS class to Pod based on configured resource requests and restrictions.

So far, we have been using guaranteed QoS classes by using equivalents (initially 1000m, then 3000m) to specify requests and restrictions. Despite the benefits of the QoS guarantee, we don't need the full functionality of the three CPU throughout the Pod lifecycle, we just need to use it in the first few minutes. This is what the Burstable QoS class does. It allows us to specify requests that are less than the limit, such as

Resources: requests: cpu: 1000m memory: 2000Mi limits: cpu: 3000m memory: 2000Mi

Because Kubernetes uses the value specified in the request to schedule Pod, it will find a node with 1000m spare CPU capacity to schedule this Pod. However, because this limit is much higher at 3000m, if the application needs more than 1000m of CPU at any time and there is available CPU spare capacity on that node, the application will not be restricted on CPU. If available, it can be used up to 3000m long.

Finally, it's time to test the hypothesis. We changed the resource configuration and deployed the application. And it works! We deployed several more times to test whether we could repeat the results, and the results were consistent. In addition, we monitored the container_cpu_cfs_throttled_seconds_total metrics, which is a chart of one of the deployments:

As we can see, this figure is very similar to the guaranteed QoS setting of 3000m CPU. Throttling is almost negligible, and it proves that the solution with Burstable QoS is effective.

Conclusion

Kubernetes resource constraints are an important concept, and we have implemented this solution in all Java-based services, and deployment and automatic extension work without any problems.

The following three key points need your attention:

Container_cpu_cfs_throttled_seconds_total thank you for reading, the above is the content of "how to optimize JVM Warm-up on Kubernetes", after the study of this article, I believe you have a deeper understanding of how to optimize JVM Warm-up on Kubernetes, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.