Analysis of the Mechanism of Knative Serving Health examination 07/19 Update SLTechnology News&Howtos

Analysis of the Mechanism of Knative Serving Health examination

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/03 Report--

Author | Niu Qiulin, technical expert of Aliyun Intelligent Business Group (Dongdao)

Guide: it is not easy to develop a Serverss engine from scratch. Let's start with the health check of Knative today. Take a look at the health check to see how the Serverless model differs from the traditional model, and how Knative thinks about Serverless scenarios.

The core principle of the Knative Serving module is shown in the figure below, in which Route can be understood as the role of Istio Gateway.

When the capacity is reduced to 00:00, the incoming traffic will be pointed to the Activator; when the number of Pod is not 00:00, the traffic will be pointed to the corresponding Pod, and the traffic will not pass through the Activator;. In this case, the Autoscaler module will dynamically scale up the capacity in real time according to the requested Metrics information.

Knative's Pod is made up of two Container: Queue-Proxy and business container user-container. The architecture is as follows:

Let's take http1 as an example: business traffic first enters Istio Gateway, then forwards to port 8012 of Queue-Proxy, and Queue-Proxy 8012 forwards the request to the listening port of user-container. At this point, the service of a business request is completed.

The basic principle of a rough introduction is like this, and now let's take an in-depth analysis of a few details to take a look at its internal mechanism:

Why introduce Queue-Proxy? When Pod is reduced to zero, traffic will be forwarded to Activator, so how does Activator handle these requests? The business Pod in Knative includes Queue-Proxy and user-container, so how do Pod's readinessProber and LivenessProber do respectively? What is the relationship between Pod's readinessProber, LivenessProber and the health of the business? How does Istio Gateway choose Pod to forward traffic to Pod when forwarding traffic? Why introduce Queue-Proxy?

One of the core demands of Serverless is to sink the complexity of the business to the basic platform, so that the business code can iterate quickly and use resources as needed. Now, however, it is more focused on the use of resources on demand.

If we want to use resources on demand, we need to collect relevant Metrics and use this Metrics information to guide the scaling of resources. Knative first implements the KPA policy, which determines whether capacity expansion is needed based on the number of requests. So Knative needs a mechanism to collect the number of business requests. In addition to the number of business requests, the following information also needs to be handled uniformly:

Access log management; Tracing;Pod health check mechanism; need to implement the interaction between Pod and Activator, how to receive traffic forwarded by Activator when Pod is reduced to zero; other logic such as determining whether Ingress is Ready is also based on Queue-Proxy.

In order to maintain a low coupling relationship with the business, we also need to implement these functions, so we introduced Queue-Proxy to take care of these things. In this way, the functions of Serverless can be realized without business awareness.

The process from zero to one

When the Pod is reduced to zero, the traffic will be directed to the Activator. After receiving the traffic, the Activator will actively "notify" Autoscaler to do a capacity expansion operation. After the expansion, Activator will detect the health status of Pod, and you need to wait for the first Pod ready before forwarding the traffic. So here comes the logic of the first health check: Activator checks to see if the first Pod is ready.

This health check is done by calling port 8012 of Pod. Activator initiates a health check of HTTP and sets K-Network-Probe=queue Header, so Queue Container will determine that this is a check from Activator based on K-Network-Probe=queue, and then execute the corresponding logic.

Refer to Activator to perform health checks before forwarding real requestsActivator: Retry on Get Revision errorRetry on Get Revision error?Always pass Healthy dests to the throttlerConsolidate queue-proxy probe handlersQueue proxy logging, health check of metrics and end to end tracesEnd to end traces from queue proxyVirtualService

After the Knative Revision deployment is completed, an Ingress (formerly known as ClusterIngress) is automatically created, and the Ingress is eventually parsed by Ingress Controller into the VirtualService configuration of Istio, and then Istio Gateway can forward the corresponding traffic to the relevant Revision.

So every time a new Revision is added, the VirtualService of Ingress and Istio needs to be created simultaneously, and VirtualService has no status indicating whether the configuration of Envoy managed by Istio is effective. So Ingress Controller needs to initiate a http request to monitor whether the VirtualService is ready. This http check will eventually be called on port 8012 of Pod. Identifies that Header is K-Network-Probe=probe. Queue-Proxy needs to judge based on this, and then execute the corresponding logic.

The relevant code is as follows:

Picture source

Reference reading

Gateway uses this health check to determine whether Pod can provide services.

Health examination of New probe handling in Queue-Proxy & ActivatorExtend VirtualService/Gateway probing to HTTPSProbe Envoy pods to determine when a ClusterIngress is actually deployedClusterIngress StatusConsolidate queue-proxy probe handlersKubelet

The final Pod generated by Knative needs to be implemented in the Kubernetes cluster, and Pod in Kubernetes has two health check mechanisms: ReadinessProber and LivenessProber.

LivenessProber determines whether the Pod is alive, and if the check fails, Kubelet will try to restart Container;ReadinessProber to determine whether the business is Ready. Only if the business Ready, will the Pod be mounted to the EndPoint of Kubernetes Service, which ensures that the business will not be damaged in case of Pod failure.

So the problem is that there are two Container:Queue-Proxy and user-container by default in Knative's Pod.

You should also find that the "first half path" of traffic needs to be forwarded to the current Pod through Queue-Proxy, while in the mechanism of Kubernetes, whether Pod joins Kubernetes Service EndPoint or not is entirely determined by the result of ReadinessProber. The two mechanisms are independent, so we need a scheme to coordinate the two mechanisms. This is also the problem that Knative needs to solve when it is used as a Serverless orchestration engine to control the traffic more finely. So Knative finally converges the ReadinessProber of user-container into Queue-Proxy, and the state of Pod is determined by the result of Queue-Proxy.

In addition, it is also mentioned in this Issue that when istio is enabled, the tcp check initiated by kubelet may be intercepted by Envoy, so it is not allowed to configure TCP detector for user-container to determine whether user-container is ready. This is also a motivation for converging Readiness to Queue-Proxy.

The way for Knative to converge user-container health check capabilities is:

The ReadinessProber; with empty user-container configures the json String configured by user-container 's ReadinessProber into the Queue-Proxy 's env; the Readinessprober command of Queue-Proxy parses the json String of user-container 's ReadinessProber and then implements the health check logic, and the check mechanism is combined with the Activator health check mechanism mentioned earlier. This also ensures that the user-container must be in the Ready state when Activator forwards traffic to Pod. Read Consolidate queue-proxy probe handlersUse user-defined readinessProbe in queue-proxyApply default livenessProbe and readinessProbe to the user containerGood gRPC deployment pods frequently fail at least one health checkFix invalid helloworld example for reference

There is a more detailed discussion of the plan, and the final plan chosen by the community is also introduced here. How to use Allow probes to run on a more granular timer.Merge 8022/health to 8012/8013TCP probe the user-container from the queue-proxy before marking the pod ready.Use user-defined readiness probes through queue-proxyqueue-proxy / heatlth to perform TCP connect to user container

Readiness can be defined in Knative Service as shown below.

ApiVersion: serving.knative.dev/v1alpha1kind: Servicemetadata: name: readiness-proberspec: template: metadata: labels: app: helloworld-go spec: containers:-image: registry.cn-hangzhou.aliyuncs.com/knative-sample/helloworld-go:160e4db7 readinessProbe: httpGet: path: / initialDelaySeconds: 3

Two points need to be explained:

Compared with the native Kubernetes Pod Readiness configuration, timeoutSeconds, failureThreshold, periodSeconds and successThreshold in Knative must be configured together and cannot be zero, otherwise the Knative webhook check will not pass. And if periodSeconds is set, once a Success occurs, the user-container will never be detected again (setting periodSeconds is not recommended, but should be handled automatically by the system).

If periodSeconds is not configured, the default probe policy will be used. The default configuration is as follows: timeoutSeconds: 60 failureThreshold: 3 periodSeconds: 10 successThreshold: 1

From this point of view, Knative is gradually converging the user-container configuration, because in the Serverless mode, the system needs to automate a lot of logic, so these "system behaviors" do not need to bother the user.

Summary

The comparative relationship of the three health check-up mechanisms mentioned above:

"Alibaba Yun × × icloudnative × × erverless, containers, Service Mesh and other technical fields, focusing on cloud native popular technology trends, cloud native large-scale landing practice, to do the best understanding of cloud native development × ×

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.