Missetting the scheduling priority of K8S container leads to cluster avalanche trample record 07/12 Update SLTechnology News&Howtos

Missetting the scheduling priority of K8S container leads to cluster avalanche trample record

2025-07-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/03 Report--

When I was going to work today, I suddenly received a lot of alerts about container scheduling failure in the cluster, roughly as follows

Seeing a large number of container scheduling failures, check the management platform to see if any business has been released during this time period, and it is found that a transcoding service (CPU consumptive) business has released a batch of tasks at this time. But why did this batch of task releases lead to a bunch of task scheduling failures?

Then we randomly checked several alarm containers and found that these containers are in proeempted state. The reason is that the higher priority pod can only be dispatched to this node node to crowd out the original pod above. However, the service just released does not set container scheduling priority, so why is it higher than other containers?

After troubleshooting, the reason is that colleagues have adjusted the default priority of the container on the management platform from the default low priority to high priority. The specific background is as follows:

A few days ago, a business reported that their business was running and suddenly the container died. It was finally found that the scheduling of high-priority containers led to the exclusion of the original low-priority pod. Colleagues from the business side think that this strategy is unreliable. If everyone chooses the highest priority, then everyone's priority will be the same. They suggest that the default policy be changed to the highest priority, allowing the business to lower the priority as needed. The containers with low priority will be discounted appropriately when they finally allocate the cost, so the container group colleagues will change the default priority of container scheduling on the management platform to the highest priority.

Colleagues set the globalDefault of highest-priority in PriotityClasses to true, which means that by default all newly scheduled containers will be the highest priority.

View PriorityClasses configuration

ApiVersion: scheduling.k8s.io/v1beta1

Kind: PriorityClass

Metadata:

Name: highest-priority

Value: 400

GlobalDefault: true

The following explanation has been made about priorityClass on the official website

The globalDefault field indicates that the value of this PriorityClass should be used for Pods without a priorityClassName. Only one PriorityClass with globalDefault set to true can exist in the system. If there is no PriorityClass with globalDefault set, the priority of Pods with no priorityClassName is zero.

PriorityClass also has two optional fields: globalDefault and description. GlobalDefault indicates that the value of PriorityClass should be used by Pod that does not have PriorityClassName set. There can only be one PriorityClass with globalDefault set to true for the entire system. If no PriorityClass with a globalDefault of true exists, then those Pod that do not have PriorityClassName set will have a priority of 0.

Be careful

1 、 If you upgrade your existing cluster and enable this feature, the priority of your existing Pods is effectively zero.

If you upgrade an existing cluster environment and enable this feature, the priority of those Pod that already exist in the system will be set to 0.

2 、 Addition of a PriorityClass with globalDefault set to true does not change the priorities of existing Pods. The value of such a PriorityClass is used only for Pods created after the PriorityClass is added.

Setting the globalDefault of a PriorityClass to true does not change the priority of the Pod that already exists in the system. That is, the value of PriorityClass can only be used in those Pod that are created after the PriorityClass is added.

That is to say, if the globalDefault in priorityClass is set to true, it will only take effect for the subsequently released Pod, and the priority of the released Pod will still be 0.

The problem encountered this morning is that a machine in the cluster goes down, and then all the above pod are rescheduled to other machines. These rescheduled Pod have the highest priority. When these containers are dispatched to other node nodes, they will crowd out the original low-priority Pod on that node. These excluded Pod then have the highest priority to schedule to other node nodes, squeezing out the Pod from another node, resulting in an avalanche of the entire cluster (all containers are rebooted).

Summary:

1. Set globalDefault to false in priorityClass and set the highest priority by default at the frontend.

Specify the default priority in Pod rather than through globalDefault

2. Pay attention to the use of cluster resources, and expand the capacity in time when there are insufficient resources to ensure that the containers carried above can drift to other remaining node nodes after at least 2 node nodes are hung up in the cluster.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.