In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/01 Report--
This article introduces the relevant knowledge of "how to hijack ServiceMesh traffic in the production environment". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!
Background
The ServiceMesh community uses iptables to achieve traffic hijacking. We will encounter some problems when using this mechanism in Baidu production environment. Therefore, we have explored other traffic hijacking mechanisms, such as service discovery-based traffic hijacking mechanism, SDK-based traffic hijacking mechanism, fixed Virutal IP-based traffic hijacking mechanism and so on.
This paper mainly introduces the traffic hijacking mechanism based on service discovery, which "forges" the address in the service discovery step to complete the traffic hijacking.
Based on iptables traffic hijacking mechanism
Let's first take a brief look at the community's traffic hijacking solution, and first take a look at Inbound traffic hijacking, as shown in figure 1:
All inbound traffic passes through iptables
Iptables ignores non-TCP and traffic accessing the istio management port and forwards other inbound traffic to Envoy
After Envoy processing, the traffic is forwarded to App.
Once again, the above traffic is matched by iptables rules, and iptables forwards the traffic with the following characteristics directly to the destination as traffic from Envoy:
Sent by Envoy.
Lo when outputting Devic
The destination is 127.0.0.1.
At this point, iptables has completed the hijacking and forwarding of Inbound traffic by Envoy.
Figure 1 iptables traffic hijacking
Next, let's look at Outbound traffic hijacking, as shown in figure 2:
App sends traffic to Server
Iptables forwards traffic to Envoy that meets the following conditions:
It's not from Envoy.
The output device is not a lo
The destination address is not localhost.
After Envoy processing, an endpoint forwarding traffic of the selected Server is selected.
Iptables sends traffic directly to the destination, that is, Server, if the following conditions are met:
Sent by Envoy.
The output device is not lo.
Figure 2 iptables hijacking outbound traffic
So far, iptables has completed the traffic hijacking of inbound and outbound. The advantage of this mechanism is that it can transparently hijack business traffic, but there are some problems when using it in Baidu production environment:
The controllability is poor, and the container networks in the private network are not isolated. Iptables is a global tool that can be modified by other users, resulting in abnormal traffic hijacking. As a stand-alone traffic management tool, there is no mature platform / product for unified management.
There are some forwarding performance problems in large concurrency scenarios, and the change delay is large when there are too many rules.
Therefore, we have explored other hijacking mechanisms. Next, I will introduce the traffic hijacking mechanism being used in Baidu production environment-- traffic hijacking mechanism based on service discovery.
Traffic hijacking mechanism based on service discovery
First, let's take a look at the design idea of the mechanism. According to the different directions, service traffic can be divided into Outbound and Inbound. As shown in figure 3, there are two services: Envoy for Client and Server,Client as EnvoyC,Server and Envoy as EnvoyS (essentially the same, but with different names for ease of expression). The traffic to be hijacked by EnvoyC comes from Outbound traffic sent by Client on the same machine, while most of the traffic to be hijacked by EnvoyS comes from traffic sent to Server by services on different machines.
The hijacking mechanisms of these two kinds of traffic can be designed separately. Considering that the commonly used policies of ServiceMesh take effect on EnvoyC, we first design a scheme for EnvoyC to hijack Outbound traffic.
Figure 3 ServiceMesh traffic hijacking
Outbound traffic hijacking
A complete request probably goes through the steps of domain name resolution (or service discovery), establishing a connection, and sending a request. Iptables is not available now, and other hijacking schemes that rely on Kernel are not available for the time being. We will turn our attention to the first step-service discovery. The services in Baidu production environment basically rely on the Naming system to resolve the real ip list of the service. We only need to let the Naming system return the ip address of Envoy to hijack the Outbound traffic of the service to Envoy.
As shown in figure 4, Naming Agent is the Agent responsible for service discovery on a stand-alone machine. Before sending a request, Client will go to Naming Agent and ask: I want to send a request to Server, please give me his address. At this point, Naming Agent will tell Client the address of Envoy as if it were the address of Server. Then Client will obediently send the request to Envoy,Envoy and forward the request to Server according to a series of policies.
Figure 4 Outbound traffic hijacking
The advantage of this hijacking mechanism is that the transformation is focused on the Naming system, and all services using the Naming system can transparently hijack Outbound traffic through this scheme.
In addition, the traffic hijacking mechanism based on Naming system can dynamically return traffic governance parameters to business services, such as timeout, retry and so on. One of the uses of this capability is to avoid service avalanches caused by multi-level retries after Mesh hijacking. As shown in figure 5, when business traffic is hijacked by Envoy, Envoy sets the number of retries of business services to 0 through Naming Agent.
Figure 5 dynamic backhaul traffic governance configuration
In addition, in order to reduce the impact on business services caused by data plane (Envoy) failures, we have also added capabilities such as automatic disaster recovery and active shutdown of Mesh on data plane:
Automatic disaster recovery capability for data side failures: when Envoy is abnormal, Naming Agent will automatically return the actual instance list of Server. In this case, Client will automatically fall back to non-Mesh hijacking mode.
Actively disable Mesh hijacking: users can also actively turn off Mesh hijacking. In this case, Client will automatically fall back to non-Mesh hijacking mode.
At this point, Envoy can hijack Outbound traffic. However, only Envoy with Outbound traffic hijacking capability is incomplete. For ingress traffic restrictions and other functions, Inbound traffic hijacking capability is also required.
Inbound traffic hijacking
Inbound traffic is mainly from other machines, we can no longer rely on stand-alone Naming Agent to forge addresses, we have to find another way out. Still based on the idea of the Naming system, EnvoyS and Server are deployed on the same machine, and the only difference in the address they provide is the port. Therefore, as long as we can change the port when EnvoyC accesses Server, we can hijack Inbound traffic to EnvoyS.
As shown in figure 4, EgressPort receives Outbound traffic and IngressPort receives Inbound traffic.
The control plane (Istio) sends the IngressPort of EnvoyS as the port of Server to EnvoyC
EnvoyC forwards traffic that accesses Server to IngressPort, which is received by EnvoyS.
EnvoyS then forwards the traffic to the Server service port NamedPort.
At this point, Envoy has a partial Inbound traffic hijacking capability, why is it partial, because this mechanism cannot hijack the traffic of the ingress service. The upstream of ingress service (Client) is an external service, and its configuration is not controlled by Istio, so it is impossible to use this mechanism to hijack traffic. This capability needs to be further improved.
Pit in Inbound Traffic hijacking
In addition to the problems mentioned above, there are some holes in Inbound traffic hijacking. We find that when EnvoyS hijacks Inbound traffic, part of the health check mechanism of the L3/L4 layer communication protocol fails.
Part of the function failure of active health check of L3/L4 layer communication protocol
Reason: the active health check of L3/L4 layer communication protocol is to check port survival by default. When traffic is hijacked to EnvoyS, this feature actually checks the IngressPort port survival of EnvoyS, so it cannot report back the Server NamedPort port survival.
Our current solution is to adopt a two-stage active health check mechanism, which are:
Inter-Envoy health check: EnvoyC's health check on EnvoyS, which can feedback the status of EnvoyS and Server.
Health check between Envoy and local Service: EnvoyS checks the viability of Server port, and the check result is fed back to EnvoyC by EnvoyS.
Outlier eviction (passive health check) function failure of L3/L4 layer communication protocol
Reason: the outlier expulsion condition of L3/L4 layer communication protocol is connection exception. When traffic is hijacked to EnvoyS, this function actually checks whether EnvoyC can establish a connection with EnvoyS normally, not Server.
The solution we currently adopt is to improve the expulsion condition of the L3/L4 layer communication protocol and increase the access timeout as the expulsion condition. Therefore, when an Server exception occurs, the EnvoyC marks the downstream as an exception because it has been unable to get a reply.
Summary
Finally, a simple comparison of the above two schemes:
The traffic hijacking mechanism based on service discovery has been applied to hundreds of services and tens of thousands of instances in Baidu App, information flow, Baidu map and other business lines. This traffic hijacking mechanism can reduce the loss of forwarding performance, has the ability of automatic disaster recovery for data surface faults, and can dynamically return traffic management parameters. However, this mechanism also lacks some capabilities: it is unable to hijack the traffic of the ingress service, which we will make up later.
This is the end of the content of "how to hijack ServiceMesh traffic in production environment". Thank you for your reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.