Example Analysis of Istio Traffic Management 07/02 Update SLTechnology News&Howtos

Example Analysis of Istio Traffic Management

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

What this article shares with you is an example analysis of Istio traffic management. The editor thinks it is very practical, so I share it with you to learn. I hope you can get something after reading this article.

Summary

This page provides an overview of how traffic management works in Istio, including the benefits of traffic management principles. Let's assume that you have read what is Istio? And familiar with the high-level architecture of Istio. You can find more information about traffic management features in other guides in this section.

Pilot and Envoy

The core component of traffic management in Istio is Pilot, which manages and configures all Envoy proxy instances deployed in the Istio service grid. It allows you to specify which rules to use to route traffic between Envoy agents and configure fault recovery features such as timeouts, retries, and circuit breakers. It also maintains a specification model for all services in the grid and uses it to let Envoys know about other instances in the grid through its service discovery.

Each Envoy instance maintains load balancing information based on information obtained from Pilot and periodic health checks of other instances in its load balancing pool, enabling it to intelligently distribute traffic between target instances while following its specified routing rules.

Benefits of traffic management

The traffic management model using Istio basically separates traffic flow from infrastructure extension, allowing operators to configure rules for traffic through Pilot, rather than which specific Pod/VM should receive traffic-Pilot and intelligent Envoy agents are responsible for supervising the rest of the traffic. So, for example, you can specify through Pilot that you want 5% of traffic from a particular service to be converted to a canary version, regardless of the size of the canary version, or send traffic to a specific version based on the requested content.

The separation of traffic from the infrastructure enables Istio to provide a variety of traffic management functions that are separate from the application code. In addition to Ahand B testing, rolling deployment, and canary release, it also uses timeouts, retries, and circuit breakers to handle fault recovery, as well as fault injection to test the compatibility of recovery strategies across services. These functions are implemented through Envoy sidecars/proxy deployed in the service grid.

Pilot

Pilot is responsible for the life cycle of Envoy instances deployed throughout the Istio service grid.

As shown in the figure above, Pilot maintains a service grid independent of the underlying platform, and all services are added to the service style through Envoy. The Platform Adapter in Pilot is responsible for populating this specification model appropriately. For example, the Kubernetes adapter in Pilot implements the necessary controllers to observe the Kubernetes API server to change pod registration information, ingress resources, and third-party resources for storage traffic management rules. These data are translated into canonical representations. The configuration of Envoy is generated based on the specification table (canonical representation) presentation.

Pilot exposes API for service discovery, which dynamically updates load balancing pools and routing tables. These API separate Envoy from platform-specific nuances, simplify design and improve cross-platform portability.

OPS can specify advanced traffic management rules through Pilot's rule API. These rules are converted to a low-level (low-level) configuration and distributed to the Envoy instance through discovery API.

Request Routing (request routing)

Describes how to route requests between services in an Istio services grid.

Service model and service version

As mentioned by Pilot, the specification representation of services in the service grid is maintained by Pilot. The Istio model of a service is independent of how it behaves on the underlying platform (Kubernetes,Mesos,Cloud Foundry, etc.). The platform-related adapters are responsible for populating the Istio internal model with metadata from the platform.

Istio introduces the concept of a service version, which is a more detailed way to break down service instances by version (v1, v2) or environment (staging, prod). These variants are not necessarily different versions of API: they may be iterative changes to the same service, deployed in different environments (prod,staging,dev, etc.). Common scenarios include the Aamp B test or the canary release. Istio's traffic routing rules can refer to service versions to provide additional control over traffic between services.

Communication between services

As shown in the figure above, client does not know the different versions of the service. They can continue to access the service using the hostname / IP address of the service. Envoy sidecar/proxy intercepts and forwards all requests / responses between the client and the service.

Envoy dynamically determines the actual service version according to the routing rules specified by the operation and maintenance staff using Pilot. This model enables application code to separate itself from the evolution of its related services while providing other benefits (see Mixer). Routing rules allow Envoy to select versions based on criteria, such as headers, the label associated with source/destination, and the weight assigned to each version.

Istio also provides load balancing for multiple instances of the same service version for traffic. You can find more information about this in Discovery and Load-Balancing.

Istio does not provide DNS. Applications can try to parse FQDN using DNS services (kube-dns,mesos-dns, etc.) that exist in the underlying platform.

Ingress and egress

Istio assumes that all traffic entering and leaving the service grid passes through Envoy proxies. By deploying the Envoy agent in front of the service, the operation and maintenance staff can carry out the Amax B test, canary deployment service, and so on, for user-oriented services. Similarly, operators can use sidecar Envoy to route traffic to external Web services (for example, access to Maps API or video service API), thereby increasing fault recovery functions, such as timeouts, retries, circuit breakers, etc., and obtain more information about metrics for connections to these services.

Service discovery and load balancing

This section describes how Istio loads the traffic between service instances in the service grid.

Service registration: Istio assumes that a service registry exists to track the Pod/VM of services in the application. It also assumes that new service instances will automatically register with the service registry and unhealthy instances will be automatically deleted. Platforms such as Kubernetes,Mesos already provide this capability for container-based applications. There are too many solutions for VM-based applications.

Service discovery: Pilot uses information from the service registry and provides a platform-independent service discovery interface. The Envoy instance in Mesh performs service discovery and dynamically updates its load balancing pool accordingly.

As shown in the figure above, services in the grid access each other using their DNS names. All HTTP traffic bound to the service is automatically rerouted through Envoy. Envoy distributes traffic among instances in the load balancing pool. Although Envoy supports a variety of complex load balancing algorithms, Istio currently allows three load balancing modes: polling, random and least weighted requests.

In addition to load balancing, Envoy periodically checks the health of each instance in the load balancing pool. Envoy follows the circuit breaker configuration mode and classifies instances as healthy or unhealthy according to the failure rate of health check API calls. In other words, when the number of health check failures for a given instance exceeds a pre-specified threshold, it is removed from the load balancing pool. Similarly, when the number of health checks exceeds a pre-specified health threshold, the instance is added back to the load balancing pool. You can find more information about Envoy fault handling capabilities in troubleshooting.

The service can actively lighten the load by responding to health checks through HTTP 503. In this case, the service instance is immediately removed from the caller's load balancing pool.

Fault handling

Envoy provides a set of out-of-the-box pluggable fault recovery features that can be utilized in applications. Features include:

Timeout

Retry according to the timeout time, number of retries, interval and other configurations.

Number of concurrent connections and request limits for upstream services

Conduct active (regular) health check for each member of the load balancing pool

For fine-grained circuit breakers applied for each instance in the load balancing pool (passive health check)

These features can be dynamically configured at run time through Istio's traffic management rules.

The jitter between retries minimizes the impact of the retry on the overloaded upstream service, while the timeout setting ensures that the calling service gets a response (success / failure) within a predictable time range.

A combination of active and passive health checks (4 and 5 above) minimizes access to unhealthy instances in the load balancing pool. When used in conjunction with platform-level health checks, such as those supported by Kubernetes or Mesos, applications can ensure that unhealthy pods/ containers / virtual machines can be quickly cleared from the service grid, minimizing request failures and the impact on latency.

Together, these functions enable the service grid to tolerate failed nodes and prevent local failures from affecting other nodes due to cascading instability.

Fine tuning

Istio's traffic management rules allow operators to set global default values for failure recovery for each service / version. However, consumers of the service can also override timeouts and retry defaults by providing request-level coverage through special HTTP headers. Implemented through the Envoy proxy, the header files are x-envoy-upstream-rq-timeout-ms and x-envoy-max-retries, respectively.

FAQ

Q: can the application handle failures when running in Istio?

Sure. Istio improves the reliability and availability of services in the grid. However, the application needs to handle failures (errors) and take appropriate fallback actions. For example, Envoy returns HTTP 503. 0 when all instances in the load balancing pool fail. The application is responsible for handling HTTP 503 errors from upstream services.

Q: does the fault recovery feature of Envoy interrupt applications that already use fault-tolerant libraries such as Hystrix?

No, I won't. Envoy is completely transparent to applications. The failed response returned by Envoy is indistinguishable from the failed response returned by the calling upstream service.

Q: how to handle failures when using application-level class libraries and Envoy at the same time?

Two recovery policies for the same service (for example, two timeouts-one set in Envoy and the other in the application library), the stronger of the two restrictions will be triggered when a failure occurs. For example, if an application sets a timeout of 5 seconds for an API call to a service, and the operator configures a timeout of 10 seconds, the timeout of the application will start first. Similarly, if Envoy's circuit breaker is triggered before the application's circuit breaker, the API invocation service will get 503 from Envoy.

Fault injection

Although Envoy sidecar/proxy provides a number of failure recovery mechanisms for services running on Istio, it is still necessary to test the end-to-end failure resilience of the entire application. A misconfigured recovery policy (for example, incompatible / restrictive timeouts across service invocations) can result in persistent unavailability of critical services in the application, resulting in a poor user experience.

Istio supports injecting failures of specified protocols into the network rather than killing Pod, delaying or destroying packets at the TCP layer. Our basic principle is that the observed failures at the application layer are the same regardless of network-level failures, and more meaningful failures (for example, HTTP error codes) can be injected into the application layer to increase the resilience of the application.

Operators can configure fault injection into requests that meet specific criteria. Operations can further limit the percentage of requests that should fail. Two types of faults can be injected: delay and abort. Delay is a timing failure, mimicking an increase in network delay or an overload of upstream services. Abort is a crash failure that mimics an upstream service failure. Aborting usually occurs in the form of a HTTP error code or a failed TCP connection.

For more details, see the traffic management rules for Istio in the next section.

Rule configuration

Istio provides a simple configuration model to control how API calls and layer 4 traffic flow between services in the application. The configuration model allows operators to configure service-level properties such as circuit breakers, timeouts, retries, and set common continuous deployment tasks, such as canary deployment, Awhite B testing, and phased release based on percentage traffic.

For example, 100% of the incoming traffic from the reviews service can be sent to version "v1" using the following configuration:

ApiVersion: networking.istio.io/v1alpha3kind: VirtualServicemetadata: name: reviewsspec: hosts:-reviews http:-route:-destination: host: reviews subset: v1

This configuration indicates that traffic sent to the reviews service (specified in the hosts field) should be routed to the v1 subset of the reviews service instance. The routing subset specifies the name of the defined subset in the appropriate destination rule configuration:

ApiVersion: networking.istio.io/v1alpha3kind: DestinationRulemetadata: name: reviewsspec: host: reviews subsets:-name: v1 labels: version: V1-name: v2 labels: version: v2

Subset specifies one or more labels that identify the version instance. For example, in the Kubernetes deployment of Istio, "version:v1" means that only pod containing the "version:v1" tag will receive traffic.

You can configure rules using istioctl CLI, or you can use the kubectl command to configure rules in Kubernetes deployment, but only istioctl verifies that the configuration is correct, and we recommend using istioctl. For an example, see configuring request routing tasks.

There are four traffic management configuration resources in Istio: VirtualService,DestinationRule,ServiceEntry and Gateway. Some important aspects of using these resources are described below. Please refer to the reference for details.

Virtual Services

VirtualService defines rules that control how service requests are routed in the Istio service grid. For example, virtual service can route requests to different versions of the service, or it can actually route requests to completely different services. Requests can be routed based on the source and destination of the request, the HTTP path and header fields, and the weights associated with each service version.

Rule destinations

Routing rules correspond to one or more request destination hosts specified in the VirtualService configuration. These hosts may or may not be the same as the actual target workload, and may not even have the actual routable services in the grid. For example, to define routing rules for the request comment service using its internal grid name reviews or through hostbookinfo.com, VirtualService can have a hosts field like this:

Hosts:-reviews-bookinfo.com

The hosts field implicitly or explicitly specifies one or more fully qualified domain names (FQDN). The short name reviews above implicitly extends to FQDN. For example, in a Kubernetes environment, the full domain name of reviews is the cluster and namespace from VirtualSevice (for example, reviews.default.svc.cluster.local).

Qualify rules through source/headers

Rules can optionally be limited to requests that match certain conditions, such as the following:

Restricted to specific callers. For example, Rules can specify that ratings applies only to calls from a reviews service (pods).

ApiVersion: networking.istio.io/v1alpha3kind: VirtualServicemetadata: name: ratingsspec: hosts:-ratings http:-match: sourceLabels: app: reviews.

The value of sourceLabels depends on the implementation of service. For example, in Kubernetes, it might be the same tag used in the pod selector in the corresponding Kubernetes.

Restricted to specific version of the caller. For example, the following rule limits the previous example to allow only calls to the "v2" version of the reviews service.

ApiVersion: networking.istio.io/v1alpha3kind: VirtualServicemetadata: name: ratingsspec: hosts:-ratings http:-match:-sourceLabels: app: reviews version: v2.

Select rules based on HTTP headers. For example, the following rule applies only to requests that contain the "cookie" header of the string "user = jason".

ApiVersion: networking.istio.io/v1alpha3kind: VirtualServicemetadata: name: reviewsspec: hosts:-reviews http:-match:-headers: cookie: regex: "^ (. *?)? (user=jason) (;. *)? $".

If more than one header is configured, all header must match.

Multiple standards can be set at the same time. In this case, AND or OR semantics are used depending on the nesting situation. If multiple conditions are nested in a single matching clause, the condition is ANDed. For example, the following rule applies only if the source of the request is "reviews:v2" and contains "cookie" with "user = jason".

ApiVersion: networking.istio.io/v1alpha3kind: VirtualServicemetadata: name: ratingsspec: hosts:-ratings http:-match:-sourceLabels: app: reviews version: v2 headers: cookie: regex: "^ (. *?)? (user=jason) (;. *)? $".

Conversely, if the condition is displayed in a separate match condition, then there is a match (OR semantics):

ApiVersion: networking.istio.io/v1alpha3kind: VirtualServicemetadata: name: ratingsspec: hosts:-ratings http:-match:-sourceLabels: app: reviews version: v2-headers: cookie: regex: "^ (. *?)? (user=jason) (;. *)? $". Split traffic between service versions

Each routing rule identifies one or more weighted backends for invocation when the rule is activated. Each backend corresponds to a specific version of the target service, where the version can be represented by a label. If multiple registered instances have the same label, they are routed according to the configured load balancing policy, which defaults to a loop.

For example, the following rule routes 25% of the traffic from the reviews service to the instance with the "v2" label, and the remaining traffic (that is, 75%) is routed to "v1".

ApiVersion: networking.istio.io/v1alpha3kind: VirtualServicemetadata: name: reviewsspec: hosts:-reviews http:-route:-destination: host: reviews subset: v1 weight: 75-destination: host: reviews subset: v2 weight: 25 timeout and retry

By default, the timeout for http requests is 15 seconds, but this can be configured in routing rules, as follows:

ApiVersion: networking.istio.io/v1alpha3kind: VirtualServicemetadata: name: ratingsspec: hosts:-ratings http:-route:-destination: host: ratings subset: v1 timeout: 10s

You can also configure the number of retries for http requests in routing rules. During the timeout period, the maximum number of attempts or as many attempts as possible can be set as follows:

ApiVersion: networking.istio.io/v1alpha3kind: VirtualServicemetadata: name: ratingsspec: hosts:-ratings http:-route:-destination: host: ratings subset: V1 retries: attempts: 3 perTryTimeout: 2s

Note that request timeouts and retries can also be configured on a per-request basis.

See requesting a timeout task for an example of timeout control.

Error in injection request path

Routing rules can specify one or more fault injections (man-made faults) while forwarding http requests to the corresponding request destination of the rule. The failure can be delayed or aborted.

The following example introduces a 5-second delay to 10% requests for the "v1" version of the microservice ratings.

ApiVersion: networking.istio.io/v1alpha3kind: VirtualServicemetadata: name: ratingsspec: hosts:-ratings http:-fault: delay: percent: 10 fixedDelay: 5s route:-destination: host: ratings subset: v1

Another type of failure, abort, can be used to simulate a request interrupt failure.

The following example returns a 400 error code for a HTTP request requested by the ratings service version "v1" version 10%.

ApiVersion: networking.istio.io/v1alpha3kind: VirtualServicemetadata: name: ratingsspec: hosts:-ratings http:-fault: abort: percent: 10 httpStatus: 400 route:-destination: host: ratings subset: v1

Sometimes delay and abort failures are used together. For example, the following rule delays all requests from reviews service "v2" to ratings service "v1" by 5 seconds and then aborts by 10%:

ApiVersion: networking.istio.io/v1alpha3kind: VirtualServicemetadata: name: ratingsspec: hosts:-ratings http:-match:-sourceLabels: app: reviews version: v2 fault: delay: fixedDelay: 5s abort: percent: 10 httpStatus: 400 route:-destination: host: ratings subset: v1

To see the actual situation of fault injection, see Fault injection Task.

HTTP routing rule priority

When there are multiple rules for a given destination, they are evaluated in the order in which they appear in the VirtualService, that is, the first rule in the list has the highest priority.

Why is priority important? If the routing rule for a service is purely weight-based, it can be configured in a single rule. However, when using other standards (for example, requests from specific users) to route traffic, multiple rules are required to configure routing. The priority of the rules must be carefully considered here to ensure that the rules are executed in the correct order.

A common pattern of general routing specifications is to provide one or more higher priority rules to qualify rules through source/headers, then provide a single weight-based rule without matching conditions, and finally provide weighted distribution of traffic for all other cases.

For example, the following VirtualService contains two rules that together specify that all requests for a reviews service that contains a header named "Foo" with a value of "bar" will be sent to the "v2" instance. All remaining requests will be sent to "v1".

ApiVersion: networking.istio.io/v1alpha3kind: VirtualServicemetadata: name: reviewsspec: hosts:-reviews http:-match:-headers: Foo: exact: bar route:-destination: host: reviews subset: v2-route:-destination: host: reviews subset: v1

Note that the header-based rules in the configuration here have higher priority. If it is low, these rules will not work as expected, because weight-based rules (no specific matching criteria) will be evaluated first, and then all traffic will be simply routed to "v1", even matching "Foo" headers. Once a rule is found that applies to the incoming request, it is executed and the rule evaluation process is terminated. This is why it is important to carefully consider the priority of each rule when there are multiple rules.

Destination Rules

DestinationRule configures a set of policies that are applied to requests after virtual service routing occurs. They are defined by the service owner, describing circuit breakers, load balancer settings, TLS settings, and so on.

DestinationRule also defines an addressable subset (that is, the version name) of the corresponding destination host. These subsets are used for the VirtualService routing specification when sending traffic to a specific version of the service.

The following DestinationRule configures the policy and subsets for the reviews service:

ApiVersion: networking.istio.io/v1alpha3kind: DestinationRulemetadata: name: reviewsspec: host: reviews trafficPolicy: loadBalancer: simple: RANDOM subsets:-name: v1 labels: version: V1-name: v2 labels: version: v2 trafficPolicy: loadBalancer: simple: ROUND_ROBIN-name: v3 labels: version: v3

Note that multiple policies can be specified in a single DestinationRule configuration (for example, default and v2).

Circuit breaker

Simple circuit breakers can be set up according to many standards, such as connection and request limits.

For example, the following DestinationRule sets a limit of 100 connections to the "v1" backend of the reviews service version.

ApiVersion: networking.istio.io/v1alpha3kind: DestinationRulemetadata: name: reviewsspec: host: reviews subsets:-name: v1 labels: version: V1 trafficPolicy: connectionPool: tcp: maxConnections: 100

View the demonstration of the circuit breaking task controlled by the circuit breaker.

DestinationRule evaluation

Like routing rules, policies are associated with a specific host, but if they are configured with a subset, which one is activated depends on the result of the routing rule evaluation.

The first step in the rule evaluation process evaluates the routing rules (if defined) in the VirtualService corresponding to the requested host to determine the subset (that is, a specific version) of the destination service to which the current request will be routed. Next, a set of policies corresponding to the selected subset are evaluated to determine whether they are applicable.

Note: one of the nuances of the algorithm is that policies defined for a particular subset are applied only when the corresponding subset is explicitly routed. For example, consider the following configuration as the only rule defined for the comment service (that is, there are no routing rules in the corresponding virtual service).

ApiVersion: networking.istio.io/v1alpha3kind: DestinationRulemetadata: name: reviewsspec: host: reviews subsets:-name: v1 labels: version: V1 trafficPolicy: connectionPool: tcp: maxConnections: 100

Because there are no specific routing rules defined for the reviews service, the default circular routing behavior will be applied, and sometimes the "v1" instance may be called, or even the "v1" instance may always be called if "v1" is the only running version. However, because the default route is done at a lower level, the above policy is not invoked. The rule evaluation engine will not know the final goal, so it cannot match the subset policy to the request.

You can fix the above example in one of two ways. You can move the traffic policy up one level to apply to any version:

ApiVersion: networking.istio.io/v1alpha3kind: DestinationRulemetadata: name: reviewsspec: host: reviews trafficPolicy: connectionPool: tcp: maxConnections: 100subsets:-name: v1 labels: version: v1

Or better yet, define the appropriate routing rules for the service. For example, you can add a simple routing rule to "reviews:v1".

ApiVersion: networking.istio.io/v1alpha3kind: VirtualServicemetadata: name: reviewsspec: hosts:-reviews http:-route:-destination: host: reviews subset: v1

Although the default Istio behavior makes it easy to send traffic from any source to all versions of the target service without setting any rules, rules are needed as long as version differentiation is required. Therefore, it is a best practice to set default rules for each service from the beginning.

Service Entries

ServiceEntry is used to add additional entries to the service registry maintained internally by Istio. It is most commonly used for requests that enable Istio services out of the network. For example, the following ServiceEntry can be used to allow external calls to services hosted under the * .foo.com domain name.

ApiVersion: networking.istio.io/v1alpha3kind: ServiceEntrymetadata: name: foo-ext-svcspec: hosts:-* .foo.com ports:-number: 80 name: http protocol: HTTP-number: 443 name: https protocol: HTTPS

The target of ServiceEntry is specified using the hosts field, which can be a fully qualified domain name or a wildcard domain name. It represents the whitelist of one or more services that are allowed access to services in the service grid.

ServiceEntry is not limited to external service configuration, it can be of two types: inside or outside the grid. Grid internal entries, like all other internal services, can be used to explicitly add services to the grid. They can be used to add services as part of extending the service grid to include unmanaged infrastructure (for example, VM added to a Kubernetes-based service grid). Items outside the grid represent services outside the grid. For them, mTLS authentication is disabled and policy enforcement is performed on the client side instead of the usual server-side internal service request.

As long as the service entry uses a matching hosts to reference the service, the service entry can be used with virtual services and target rules. For example, you can use the following rule in conjunction with the above ServiceEntry rule to set a 10-second timeout on bar.foo.com for calls to external services.

ApiVersion: networking.istio.io/v1alpha3kind: VirtualServicemetadata: name: bar-foo-ext-svcspec: hosts:-bar.foo.com http:-route:-destination: host: bar.foo.com timeout: 10s

Rules redirect and forward traffic, define retry, timeout, and fault injection policies all support external addresses. However, weighted (version-based) routing is not possible because there is no concept of multiple external service versions.

Check out the exit task for more information about accessing external services.

Gateways

The gateway configures a load balancer for HTTP/TCP traffic, which usually runs at the edge of the service grid to enable ingress traffic for the application.

Unlike Kubernetes Ingress, Istio Gateway configures only the functions of the L4-L6 layer in the network (for example, ports to be exposed, TLS configuration). Users can then use standard Istio rules to control HTTP requests and bind VirtualService to control TCP traffic entering the gateway.

For example, the following gateway configures a load balancer to allow external https traffic from the host bookinfo.com to enter the grid:

ApiVersion: networking.istio.io/v1alpha3kind: Gatewaymetadata: name: bookinfo-gatewayspec: servers:-port: number: 443 name: https protocol: HTTPS hosts:-bookinfo.com tls: mode: SIMPLE serverCertificate: / tmp/tls.crt privateKey: / tmp/tls.key

To configure the appropriate route, you must define a VirtualService for the same host and bind to the gateway using the gateways field in the configuration:

ApiVersion: networking.istio.io/v1alpha3kind: VirtualServicemetadata: name: bookinfospec: hosts:-bookinfo.com gateways:-bookinfo-gateway #

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.