In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/02 Report--
Guide reading
This article will share with you the problems I encountered in deploying micro-service projects on K8S production environment:
1. Limit container resources and often be killed?
2. The importance of health check-up for rolling updates
3. Loss of traffic for rolling updates
Let's start with the first question, why do you limit container resources and often kill them?
That is to say, the deployed java application is restarted soon, in fact, the restart is rebuilding, which means that your pod is unhealthy, and then K8s will help you pull it again, so you have to find the problem to troubleshoot. To put it bluntly, it is actually killed. You can check the event through describe. Generally, you can see that because the health check failed, and then pull it, because it is the java app. Because the heap memory overflowed and was dropped by kill, a kill field will appear in the last line of the log to see why it is restarted. What I encountered before is that its heap memory is not limited. It has jvm,jvm memory that mainly has memory to exchange data. Its heap memory is mainly a manifestation of performance design, so its heap memory can easily be exceeded. After that, it is likely to be killed by K8s. Why does K8s kill it? because it exceeds the limit, the default container will use all the resources of the host. If there is no resource limit, it will affect the entire host, and then the entire host will drift and transfer to other hosts, and then an exception may have an avalanche effect. So generally, we have to make resource restrictions. Can't this limit limit java applications? In fact, it cannot be limited. Although K8s is still very convenient to deploy applications, it is still incompatible to deploy java applications. For example, it cannot recognize the limitation of the current java container, that is, it cannot recognize the limit we specify, that is, you limit it in yaml. There is no limit on the heap memory of the container, and it will exceed this limit. If the limit of limits is exceeded, K8s will kill it. K8s itself has this strategy. If it exceeds this capacity limit, it will kill it for you and then pull it up for you.
In the face of a little burst of heap memory like java, data utilization may come up, so this range is relatively large, which will lead to K8s being killed, and then pulled up. This cycle may have this effect hundreds of times a day.
When it comes to this problem, how to solve the resource limit of docker? in fact, this resource limit is still done by this docker, but K8s converts it into, but K8s calls the interface to do some restrictions. In fact, how to let docker identify the java heap memory limit to solve this problem.
There are two ways to solve this problem, that is, to reconfigure the use of heap memory for this java, the largest heap memory use of Java-Xmx, and one is that-Xms is the initial heap memory use.
Generally, a maximum heap memory usage is set. If the setting is not set beyond this setting, one of the host's memory will continue to be used, resulting in insufficient physical memory and heap memory overflow, which is very common. We use this java-Xmx, that is, when it is almost full, it will have a garbage collection and then recycle, which can ensure the stable operation of the java application. It is certainly not enough for us to configure resource limits in yaml. We must set the heap memory for this java. It is impossible for us to write this manually in Dockerfile. Generally, we pass this value in dockerfile and set a variable in the yaml file:
Env:
-name: JAVA_OPTS
Value: "- Xmx1g"
Here are some restrictions on the container resources we configured earlier, and this variable will be passed into pod, that is, the container that builds the image, that is, the variable passed in $JAVA_OPTS under the container CMD command, which will call our system variable, which has been assigned a value, so it can directly drink this variable, go to the application, and set the heap memory size. This value is recommended to be a little smaller than limlts, a small 10%, because exceeding this limits limit will kill and pull again.
Generally set up, rebuild the image, go to the container to check the process, you can see, other settings are the same.
The second question is the importance of rolling updated health check-ups.
Rolling update is the default policy of K8s, which is usually used first after we deploy to K8s. When you configure a health check, rolling update will determine whether to continue to update allowed access traffic based on the status of probe, that is, whether your current application provides services. In this way, when you roll through the update process, you will make sure that you have a node available to ensure a smooth upgrade. So this is the beginning of rolling update settings, and that health check is very important!
What is the role of health checks when rolling updates are initiated?
Include a copy. It takes one minute to provide services after startup. For example, java starts slowly. If there is no health check to ensure that it is ready, it is directly thought that it is ready. During this period, it cannot provide services within a minute, so the new traffic must not be handled. This is the first case, and the second case is due to human configuration errors. For example, if you can't connect to the database, or you can't connect to other places, or where the configuration file is written wrong, then trigger a rolling update. Well, the pod, uh, all the rolling updates are completed, and the result is all caused by the problem. In this case, the new copy will replace the old copy. In this case, the consequences in the production environment will be very serious, and many services will not be able to provide them, so when configuring rolling updates, The health check must be matched. After the health check is matched, the new traffic will not be forwarded until the new copy check is completed. If it does not pass, it will not be completely replaced, that is, it will not continue to be updated. Because it has a limit on available nodes, if the number of available nodes does not reach this number, it will not continue to update.
There are two types of health checks:
ReadinessProbe: readiness check, that is, your Pod check fails. If it is http, you can detect it through a page to determine the return status code. If the local port of probe is not available, it will not let it join behind service, because service is your entire unified access entrance. If it fails, new traffic will not be forwarded to it. This is the readiness check. If the health status does not pass and will not forward new traffic to you, another is initialDelaySeconds:60, that is, check for 60 seconds, because usually the java application starts in about a minute, and there is a periodSeconds:10, that is, it is not done again in 10 seconds.
LivenessProbe: survival check, that is, if the check fails, it will kill the container. According to your restart strategy, it usually rebuilds and pulls a new container for you, and then determines whether it is ready or not. The judgment method is also based on the detection of the port, or you can use the other two methods, http,exec, so generally these two should be configured, and the readiness check is not to assign new traffic to you. The survival test is to help you pull it back.
The two checks also have different detection methods, such as http, probing a url, and tcp socket, port probing, and one executes a shell command, executes a shell command, and determines a return value.
Last question: the lost traffic of rolling updates
The general phenomenon is that the connection is rejected, the response is wrong, and the call is not available.
Rolling update closes the existing pod and starts a new pod. To close the existing pod is to delete a pod, and then apiserver will notify kubelet, and then kubelet will close the container, remove it from the service backend, do not distribute new traffic, then remove it, and then tell kube-proxy that you can deal with the new forwarding rules and schedule them to the node. In fact, this is also a pod offline cycle.
In addition, in the process of forwarding the new pod, there is an interval. After closing the pod, there will be a waiting time. At this time, some new traffic may be connected, but its service no longer handles new requests, so it will cause the connection to be rejected. How to solve this problem? in fact, the readiness probe plays a key role in the whole process. Once endpoint receives the delete event of pod, this is no longer relevant to the result of readiness probe.
How do you make sure it handles it gracefully?
In fact, you need to add a sleep time when shutting down the pod, in fact, you can solve this problem. There is a hook in both closing and starting, so you can execute the hook before closing the container. The hook defines a shell, and you can also define a http request, that is, two types of supporters, that is, in the container peer, env.
Hibernate for 5 seconds, that is, the container you closed will not exit immediately, then hibernate for 5 seconds, and then close the application. These 5 seconds are enough for kube-proxy to refresh this rule, so that newly added traffic will not be forwarded to the newly closed pod. Adding this hook will delay the time for you to close pod, thus allowing kube-proxy to increase the time it takes to refresh rules. As follows:
Lifecycle:
PreStop:
Exec:
Command:
-sh
-- c
-"sleep 5"
In this way, you don't need to modify the code of your application to solve the problem.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.