Analysis of the principle of buried Point in Istio call chain-whether it is really a "Zero Modification" sharing record (part two) 05/03 Update SLTechnology News&Howtos

Analysis of the principle of buried Point in Istio call chain-whether it is really a "Zero Modification" sharing record (part two)

2025-05-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/02 Report--

Call chain principle and scenario

Just as the birth of Service Mesh is to solve the governance problem of large-scale distributed service access, the emergence of call chain is also corresponding to the problem of fault location and demarcation encountered in the operation of large-scale and complex distributed system. A large number of service calls, across processes, across servers, and possibly across multiple physical computer rooms. No matter the problems of the service itself or the network environment lead to the problems on the uplink are more complex, how to locate is much more difficult than a single-process service printing an exception stack to find a method. Need to have a similar call link tracking, through the complete expression of the logic rules of a request, you can observe the call relationship of each stage, and can see the time consumption and call details of each phase.

Dapper, a Large-Scale Distributed Systems Tracing Infrastructure describes the principles and general mechanisms. The model also contains a lot of terms, so you can understand the two most important terms:

Trace: a complete distributed call tracks the link.

Span: a single call across services; multiple Span are combined into a single Trace tracking record.

The figure above is a classic illustration of the Dapper paper, with a distributed invocation relationship on the left. The front end (A), two middle layers (B and C), and two back ends (D and E). When a user initiates a request, it first arrives at the front end, and then sends two services B and C. B replies directly, C service invokes the back-end D and E interaction and gives An a reply, which in turn returns the final reply. To use call chain tracing, you add trace identifiers and timestamps such as TraceId and SpanId to each call.

The right indicates the management relationship of the corresponding Span. Each node is a Span, representing an invocation. Contains at least the first name, parent SpanId, and SpanId of the Span. The connection between nodes represents the relationship between the Span and the parent Span. All Span belong to one trace and share one TraceId. You can see from the figure that the two sub-Span of the call to front-end A Span are the Span,D and E back-end service invocation to B and C, respectively, and the Span invoked by the two back-end services are both child Span of C.

There are many implementations of the call chain system, such as zipkin, and there are more and more Jaeger that have joined the CNCF foundation and are used, so many meet the Opentracing semantic standards.

A complete call chain tracking system, including call chain buried point, call chain data collection, call chain data storage and processing, call chain data retrieval (in addition to providing retrieval APIServer, generally also includes a very cool front end of the call chain) and other important components. The figure above is a complete implementation of Jaeger. Here we only focus on the content related to the application, that is, the part of the call chain burying point, to see if the "non-invasive" call chain embedding point can be achieved in Istio. Of course, in the end, we will also take a look at the different call chain data collection methods provided under the Istio mechanism.

Istio standard BookInfo example

During the simple period, we take Istio's most classic Bookinfo as an example. Bookinfo simulates a category of online bookstores, displaying information about a book. It is a heterogeneous application, and several services are written in different languages.

The simulation function and invocation relationship of each service are:

Productpage: the productpage service invokes the details and reviews services to generate the page.

Details: this microservice contains information about books.

Reviews: this microservice contains reviews related to books. And invoke the ratings microservice.

Ratings: the ratings microservice contains rating information made up of book reviews.

Call chain output

Run this typical example on Istio without making any code changes, and you can see the following call chain output on your own Zipkin.

You can see that the call chain shown to us is consistent with the invocation relationship designed in the Boookinfo scenario: the productpage service invokes the details and reviews services, and the reviews invokes the ratings micro service. In addition to showing the call relationship, it also shows the time consuming and call details of each intermediate call. Based on this view, the operation and maintenance staff of the service can intuitively demarcate slow or problematic services, drill into the details of the invocation at that time, and then locate the problem.

We should pay attention to where and how to make the embedding point of the downlink.

In Istio, the implementation of all governance logic is the Sidecar of Envoy deployed with the business container, whether it is load balancing, circuit breaker, traffic routing or secure and observable data generation on Envoy. Sidecar intercepts all traffic flowing into and out of business programs and performs various actions according to the rules received. In practical use, it is generally based on the InitContainer mechanism provided by K 8 S, which is used to perform some initialization tasks in Pod. A script for iptables is executed in InitContainer. It is through these Iptables rules that traffic in pod is intercepted and sent to Envoy. Envoy will perform different operations to intercept traffic to Inbound and Outbound, perform the operations configured above, and then send the request to the backend. For Outbound, it is to find the corresponding target service backend based on service discovery; for Inbound traffic, it is directly sent to the local service instance.

Our focus today is to see what Sidecar does at the buried point of the call chain after intercepting the traffic.

Istio call chain burying point logic

The burying point rule of Envoy is not much different from the corresponding buried point logic of other service callers and callees.

Inbound traffic: for the traffic flowing into the application through Sidecar, if there is no tracking-related information in the Header when passing through the Sidecar, the SpanId is created as the root Span,TraceId, and then the request is passed to the service of the business container. If the request contains Trace-related information, the Sidecar extracts the context information of the Trace and sends it to the application.

Outbound traffic: for traffic that flows through Sidecar, if there is no tracking-related information in the Header when passing through the Sidecar, the root Span will be created and the context information related to the Span will be put in the request header and passed to the next called service. When the Trace information exists, the Sidecar will extract the Span-related information from the Header, create a sub-Span based on this Span, and pass the new Span information in the request header.

In particular, the call chain burying point logic of the Outbound part is described by a pseudo code as shown in the figure:

Detailed analysis of call chain

As shown in the figure, a perspective view of a Trace output on the previous Zipkin, observe the details of each call. You can see how the four services in each phase fit with the Sidecar deployed next to it. Only the main Span information generated by Sidecar is marked on the diagram.

Because the logic of Sidecar in dealing with Inbound and Outbound is different, the table on the diagram is also expressed separately in two block diagrams. For example, productpage, receiving an external request is a process, sending a request to details is one processing, and sending a request to reviews is another, so there are three black processing blocks around the app of productpage, which is actually a Sidecar doing things.

At the same time, in order not to make too many arrows on the diagram, the final Response is not expressed, in fact, each requested arrow on the diagram has a Response in the opposite direction. When the Sidecar of the service initiator receives the Response, a CR (client Received) is recorded to indicate the time the response was received and the duration of the entire Span is calculated.

The following is to find out the buried point logic by parsing the specific data:

Starting with the Gateway of the invocation portal, Gateway, as an Envoy process deployed independently in a pod, transfers the request to the portal service productpage when a request comes. The Envoy of Gateway does not contain Trace information when it makes a request. It will generate a root Span:SpanId and TraceId that are both f79a31352fe7cae9. Because it is the first Span on the first call chain, that is, the root Span, all ParentId are empty, and CS (Client Send) will be recorded at this time.

The Inbound traffic of the app business process that requests to enter the productpage from the ingress Gateway Envoy is intercepted by the Envoy in the productpage Pod. The Envoy processes the request header with the Trace information, records the SR (Server Received), and sends the request to the productpage business container for processing. When productpage accepts the called parameters in the business method of the request, it not only accepts the general business parameters, but also parses the call chain Header information in the request. The Trace information in Header is passed to the microservices of Details and Reviews that are called.

Before the request from productpage arrives at the reviews service, its Oubtbound traffic again checks that the Header contains Trace-related information through the Envoy,Envoy buried point logic of the same Pod. Before sending the request, it will do the client's call chain buried point, that is, take the current Span as the parent Span to generate a sub-Span: the new SpanId cb4c86fb667f3114,TraceId is consistent 9a31352fe7cae9parentId is the Id: f79a31352fe7cae9 of the previous Span.

The request from prodcutepage to review is sent to an instance of review after going through Sidecar of productpage and going to LB. Before arriving at the Review service container, the request is also intercepted by the Envoy of Review. If the Envoy check parses the Trace information from the Header, it sends the Trace information to reviews. The Header information containing Trace is also received and parsed in the server code where the reviews processes the request and sent to the next Ratings service.

Here we just go through the process of requesting Gateway from the entry, accessing the productpage service, and then accessing the reviews service. You can see that during each access phase, the Inbound and Outbound traffic to the service is intercepted by Envoy and the corresponding call chain burying point logic is executed. The Reviews access Ratings and productpage access Details logic shown in the figure is similar to the above and will not be repeated here.

The above process also confirms the buried point logic of Envoy that we proposed earlier. You can see that in the process, except for the corresponding burying point logic to be executed when Envoy reprocesses Inbound and Outbound traffic. Each step of the call should be strung together, and the application actually does something. That is, when the request is sent to the next service, the information related to the call chain needs to be passed on, even though it does not generate these Trace and Span identities. In this way, before the proxy of outbound traffic initiates a request to the next-hop service, the sub-Span can be determined and generated and associated with the original Span, thus forming a complete call chain. Otherwise, if the Trace in the Header is not processed in the application container, the Sidecar will create a root Span when processing the request, and eventually form several fragmented Span, which cannot be associated to a Trace, and the problem we mentioned in the beginning will arise.

Two questions are constantly asked to explain that it may not be necessary for the business code to cooperate with modifications to implement the call chain logic: first, since the incoming request already has these Header information on it, why not just pass it down? Wouldn't it be all right for Sidecar to request APP with these Header,APP when requesting Sidecar with these Header? Question 2: since TraceId and SpanId are generated by Sidecar, why bother to get App to parse when receiving the request and send it back to Sidecar when sending the request?

To answer question 1, you only need to understand that the App business code here handles the request rather than forwarding the request, that is, the request from Request to Productpage to prodcutpage on the left of the figure ends. How to handle the content of the service interface that is entirely productpage can be returned directly by calling local processing logic, or by constructing a new request to invoke other services as in the scenario in the example. The Request from productpage on the right is entirely another request from the service construct

To answer question 2, you need to understand that the current Envoy is a separate Listener to handle requests from Inbound and Outbound. Inbound only processes incoming traffic and forwards it to the local service instance. Outbound is to find the corresponding target service back-end according to service discovery. It can be said that there is no relationship between the two except in the same process. In addition, as described in problem 1, because the Outbound is already a newly constructed request, it is not feasible to maintain a map to record the Trace information.

So the main process of opening a call chain based on an example ends here. Attach the Span details of productpage accessing reviews, delete some data and retain only the main information, which is roughly as follows:

The Proxy of Productpage reported a CS,CR, and the proxy of reviews reported a SS,SR. Indicates when Productpage makes the request as client, when it finally receives the request, when the proxy of reviews receives the request from the client, and when the response is issued. It also includes other information about this visit, such as Response Code, Response Size and so on.

From the previous analysis, we can draw a conclusion: the buried point logic is completed in the Sidecar agent, and the application does not have to deal with the complex buried point logic, but the application needs to cooperate with passing the generated Trace related information on the request header.

Example of service code modification

Earlier, through a typical example, we analyzed in detail the coordination relationship between Envoy as a Sidecar and the application in the process of Istio's call chain burying point. The conclusion of the analysis is that the embedded point of the call chain is executed by Envoy, but the business program must be modified appropriately. Let's extract the service code to verify it.

When the productpage written by Python processes the request on the server, it first extracts the received Header from the Request. Then construct the request to call the details to get the service interface and forward the Header out.

First, the rest method of productpage is used to extract the Header related to Trace from Request.

Then reconstruct a request to send out, requesting the reviews service interface. You can see that the request contains the received Header.

The Rest code of the Java in the reviews service is similar. When the server receives the request, it not only receives the business parameters in the Request, but also extracts the Header information and passes it on when the Ratings service is called. Other productpage calls details,reviews call ratings logic is similar.

Of course, this is just a demo, indicating that you want to modify the code in that location. In the actual project, we will not make such changes to each business method in this way, which will invade the code, or even say that the pollution is too serious. According to the characteristics of the language, we will try our best to extract this logic into a general logic.

Istio call chain data collection: by Envoy

A complete embedding process, in addition to inject, extract, such as processing Span information, creating Span, but also Span report to the server of a call chain, storage and support for retrieval. In Isito, these are handled in the Sidecar of Envoy, and the business program does not need to care. This backend address is automatically brushed when proxy is automatically injected into the business pod.

That is, Envoy will connect to the server of zipkin to report the call chain data, and these business containers do not need to be concerned at all. Of course, the back-end address collected by this call chain is configured as jaeger is also ok, because Jaeger is compatible with zipkin format when receiving data.

Istio call chain data collection: by Mixer

In addition to directly reporting calls from Envoy to the zipkin backend, Istio provides a unified panel of Mixer to dock different backends to collect telemetry data. Of course, Trace data can also be used in the same way.

That is, as described in TraceSpan, create a template for TraceSpan to describe which data is extracted from an access to mixer, and you can see that several ID related to Trace are extracted from the requested Header. In addition to the basic data, based on the inheritance capabilities of Mixer and kubernetes, the metadata of some objects, such as the relevant information on Pod, can be supplemented by Mixr. In fact, Mixer connects kubeapiserver to obtain the corresponding pod resources, so there can be more business expansion than the original data collected directly from Envoy, such as namespace, cluster and other information APM data will be used, but Envoy itself will not generate In this way, you can automatically supplement the complete from Kubernetes, which is very convenient.

This is also an excellent practice of observability of Mixer, the core component of Istio.

Update of Istio official statement

Recently, I have been communicating with the community to urge users to clearly tell users that using Istio for governance does not need to modify the code in all scenarios, such as the call chain, although users do not need to bury the business code, but still need to modify some code. In particular, to avoid the home page "without any change" misleading to everyone. The response is that the community home page what-is-istio in 1.1 has modified this part of the description, no longer saying without any changes in service code in 1.0, but changing it to with few or no code changes in service code. It is suggested that when you use Isito to call the chain burying point, the application needs to make appropriate changes. Of course, if you understand the principle, it won't be too troublesome to do it.

Changed a lighter negative word, rarely need to change, or basically do not need to change the meaning. This is also the consistent view of the community.

Combined with the analysis of the principle of the Istio call chain and a typical example of detail fields, processes including code parsing, plus the point of view of communicating with the community. The following conclusions are drawn:

Most of the governance capabilities of Istio are implemented in Sidecar rather than applications, so it is non-intrusive

The call chain burying point logic of Istio is also completed in the Sidecar agent, which is non-intrusive to the application, but the application needs to make appropriate modifications, that is, to cooperate with the transmission of generated Trace-related information on the request header.

Huawei Cloud Istio Service Grid is under public trial

In Tencent's venue, we only talk about the technology of practical information and advertise as little as possible. Here is just a page of PPT to briefly introduce the Istio service grid service that Huawei Cloud is currently under public testing.

Deep integration of Huawei cloud container engine CCE, once enabled, you can enjoy all the governance capabilities of the Istio service grid; based on the panoramic view running of the application, you can configure and manage a variety of intelligent traffic management functions, such as circuit breaker, fault injection, load balancing and other intelligent traffic management functions; built-in canary, Aamp B Testing typical grayscale release process grayscale version one-click deployment, traffic switching one-click to take effect Configuration is based on the proportion of traffic and grayscale policy configuration of request content, and one-stop health, performance and traffic monitoring to achieve quantification, intelligence and visualization of the grayscale publishing process. Huawei Cloud APM is integrated to perspective, diagnose and manage application operation using call chain, application topology and other means.

Huawei Cloud Istio Community contribution

Huawei is a start-up member, platinum member and CNCF / Kubernetes TOC member of CNCF Foundation. The Kubernetes community contributed the first in China, the third in the world, and 3000 + PR in the world, and successively contributed important projects such as cluster federation, advanced scheduling strategy, IPVS load balancing, container storage snapshots and so on.

With the further production of the Istio project, the team is also actively involved in the community contribution of Istio. At present, the community contributes first in China and third in the world. 3 seats for Approver, 6 seats for Member and several seats for Contributor.

Health check of HTTP protocol is realized through Pilot agent forwarding: for mTLS enabled environment, traditional kubernetes http health check can not work, sidecar forwarding function is realized, and injector is injected automatically.

Istioctl debug function enhancement: in view of istioctl missing query ability of endpoint in sidecar, increase proxy-config endpoint, proxy-status endpoint commands, improve debug efficiency.

HTTPRetry API enhancement: added HTTPRetry configuration item RetryOn policy, which allows you to control sidecar retry.

Implementation of MCP configuration: Pilot supports mesh configuration and can interact with galley and other server clients that implement MCP protocol to obtain configuration. To decouple the back-end registry.

Pilot CPU exception problem resolution: 1.0.0-snapshot.0 pilot free status CPU utilization is more than 20%, reduced to less than 1%.

Optimization of Pilot service data delivery: cache service to avoid repeated conversion each time push.

Query optimization of Pilot service instance: query endpoints according to label selector (covering more than 95% of scenarios) to avoid traversing the endpoints of all namespace.

Performance optimization of Pilot data Push: update the original serial sequential push configuration to parallel push to reduce the sending delay of the configuration.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.