In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-31 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article will explain in detail the exploration and practice on how to achieve OCTO2.0. The content of the article is of high quality, so the editor shares it for you as a reference. I hope you will have a certain understanding of the relevant knowledge after reading this article.
First, the analysis of the current situation of OCTO OCTO is Meituan's standardized service governance infrastructure, with unified governance capabilities, excellent performance and ease of use, and rich ecological governance capabilities, which has been widely used in various career lines of Meituan. The current situation of OCTO can be summarized as follows:
It has become Meituan's highly unified service governance technology stack, covering 90% of the company's applications, with an average of more than a trillion calls a day.
Has experienced a large-scale technical test, covering tens of thousands of services / hundreds of thousands of nodes.
Collaborative peripheral governance ecology provides rich governance capabilities, including but not limited to SET, link-level complex routing, full-link stress testing, authentication encryption, current-limiting circuit breakers and other governance capabilities.
A system supports multiple businesses and covers all business lines of the company.
At present, Meituan already has a relatively perfect governance system, but there are still many pain points and challenges:
Multilingual support is not good enough. The language used in Meituan's technology stack is mainly Java, accounting for more than 80%. Many of the governance capabilities described above are also concentrated in the Java system. However, Meituan also has nearly 10 other background service languages in use, and the governance ecology of these languages is very weak. At the same time, there is bound to be a growing demand for multilingualism in the multi-business model. It is expensive to build a comprehensive governance system for each language, and it is unlikely to land on the ground.
Middleware and business are bound together, restricting each other's iteration. Generally speaking, the core governance capabilities are mainly carried by the communication framework, although the logic of the middleware is logically isolated, but the logic of the middleware is inevitably physically coupled with the business. In this mode, the introduction of Bug into middleware requires all businesses to upgrade, which will also damage the efficiency of business research and development; the release of new features also depends on business upgrades one by one, and does not have independent control capability.
The cost of technology integration of heterogeneous governance system is very high.
Governance decisions are scattered. Each node can only make decisions according to its own state, and can not cooperate with other nodes in arbitration.
In view of the above pain points, we consider relying on Service Mesh solution. In Service Mesh mode, a Sidecar proxy is deployed for each business instance, and all business traffic in and out of the application is carried out by Sidecar. At the same time, service governance is mainly performed by Sidecar, while all Sidecar is controlled globally by a unified centralized control brain control surface. How does this model solve the above four problems?
In Service Mesh mode, the communication framework of each language is generally only responsible for coding and decoding, but the logic of coding and decoding is often unchanged. The core governance functions (such as routing, current limiting, etc.) are mainly completed by the Sidecar agent and the control brain, so as to achieve a set of governance system, all languages are common.
The changeable logic of middleware sinks into the Sidecar and control brain as much as possible, and the subsequent upgrade of middleware basically does not require business cooperation. SDK mainly contains very thin and immutable logic, thus realizing the decoupling of business and middleware.
The newly integrated heterogeneous technology system can be connected to Meituan's governance system through frivolous SDK (the technical system is difficult to be compatible, in essence, they have their own independent operation specifications, and the core content of the operation specification in Service Mesh mode is control plane and Sidecar). At present, there are also such cases on Meituan line.
The control brain centrally controls the information of all nodes, and then can make some global optimal decisions, such as service preheating, dynamically adjusting routing according to load and so on.
To sum up, the Mesh transformation in the current governance system can further improve the governance capability. Meituan also defines the Mesh transformed OCTO as the next generation service governance system OCTO2.0 (internal name is OCTO Mesh).
2. Technology selection and architecture design 2.1The OCTO Mesh technology selection Meituan's Service Mesh construction started at the end of 2018, when a core issue was faced with which aspects should be paid attention to in the most critical consideration of the overall plan. When starting the design phase, we have a very clear consciousness: the issues that need to be considered in the transformation of Mesh on the premise of large-scale and rich governance capabilities are still very different from the considerations that are relatively weak in the governance system and are expected to rely on the rich governance capabilities of Service Mesh. To sum up, technology selection needs to focus on the following four aspects:
The OCTO system has gone through nearly 5 years of iteration and formed a series of standards and norms. The scope of upgrading the governance architecture of Service Mesh transformation will be very large. While ensuring that the technical solution can be landed, it is also necessary to block the technical upgrade or only need to make low-cost changes to the business.
Governance capabilities can not be weakened, and gradually provide more refined and easier-to-use operational capabilities on the basis of ensuring alignment.
To cope with super-large-scale challenges, the technical solution must be able to support the current magnitude or even the current N-fold increment, and the system itself can not become the bottleneck of the entire governance system.
Try to maintain affinity with the community, to a certain extent, co-evolution with the community.
In view of the above considerations, we choose the way that the data plane is based on the secondary development of Envoy, and the control plane is mainly self-developed.
On the data side, Envoy had the opportunity to become the de facto standard on the data side. At the same time, the design of Filter mode and xDS were relatively friendly to expansion, and the future functions were rich, and the performance optimization was also weak with the standard. The content that needs to be considered in the decision-making of self-research in the control plane is more complex, and generally speaking, the following aspects need to be considered:
As of press time, Meituan's containerization mainly adopts the rich container mode, which is very expensive to match with the data model of Istio and Kubernetes, and the Istio API has not yet been determined.
As of press time, Istio is prone to performance problems when the cluster scale becomes larger, which can not support the mass of tens of thousands of applications and hundreds of thousands of nodes of Meituan. At the same time, Kubernetes clusters with hundreds of thousands of nodes also need continuous optimization and exploration.
The functions of Istio can not meet the complex and fine governance requirements of OCTO, such as traffic recording and playback pressure testing, more complex routing policies, and so on.
At the start of the project, the proportion of non-container applications is relatively high, and the technical solution needs to be compatible with stock non-container applications.
2.2 OCTO Mesh architecture design
The figure above shows the overall architecture of OCTO Mesh. From bottom to top, it is logically divided into business process and communication framework SDK layer, data plane layer, control plane layer, and all the surrounding ecological layers of governance system cooperation.
First, let's focus on the business process, SDK layer and data plane layer:
OCTO Proxy (the data side Sidecar agent is internally called OCTO Proxy) and the business process are deployed in a 1-to-1 manner.
OCTO Proxy and business processes use UNIX Domain Socket for inter-process communication (here we do not choose to use Istio default iptables traffic hijacking, mainly considering the unified private protocol communication that is basically used in Meituan, and the rich container mode does not use Kubernetes naming service model. Iptables management will be very complex, while iptables complexity will lead to high performance loss. OCTO Proxy uses TCP to communicate across nodes and uses the same protocol as inter-process, which ensures that the client and server have the ability to upgrade independently.
In order to improve efficiency and reduce human error, we have independently built an OCTO Proxy management system. The LEGO Agent deployed on each instance is responsible for the survival and thermal upgrade of OCTO Proxy, similar to Istio's Pilot Agent, which can reduce human intervention to a lower level and improve the efficiency of operation and maintenance.
The data plane communicates with the control plane through two-way streaming. The interactive way of routing is to enhance the semantics of xDS, which is because the current xDS can not meet Meituan's more complex routing needs. In addition to routing, this channel carries the instructions and configuration of many governance functions. We have designed a series of custom protocols.
The control plane (Meituan's internal name is Adcore) is mainly self-developed, which is divided into: Adcore Pilot, Adcore Dispatcher, centralized health inspection system, node management module, monitoring and early warning module. In addition, the unified metadata management and Meta Server module of the service registration and discovery system in the Mesh system are independently built. The specific responsibilities of each module are as follows:
Adcore Pilot is an independent cluster, and the module carries the management and control of most of the core governance functions, which is equivalent to the brain of the whole system, and is also a module that directly interacts with the data surface.
Adcore Dispatcher is also an independent cluster, and this module is a convenient access center for many subsystems of governance system cooperation to access Mesh system.
Different from Envoy's P2P node health check mode, OCTO Mesh system uses centralized health check.
The control plane node management system is responsible for collecting the runtime information of each node and making the overall optimal governance decision and execution according to the state of the node.
The monitoring and early warning system is a module built to ensure the stability of Mesh itself, which realizes its own observability and can locate quickly when a fault occurs. At the same time, it will also do real-time inspection of the whole system.
Unlike Istio, which is based on Kubernetes for addressing and metadata management, an independent Meta Server is responsible for the management and naming of many meta-information of Mesh itself.
3. Key design and analysis the key points for the successful landing of large-scale governance system Mesh construction are as follows:
In terms of system level scalability, it can support the governance of tens of thousands of applications / millions of nodes.
In terms of functional expansibility, it can support the integration of all kinds of heterogeneous governance subsystems.
It can meet the complex availability and reliability requirements of the link after Mesh transformation.
Have mature and perfect Mesh operation and maintenance system.
Around these four points, we can support the landing of Meituan's current multi-volume new structure in terms of system capability, governance ability, stability, and operational efficiency.
3.1 capacity building of large-scale system Mesh system
For the community Istio scheme, in order to achieve the landing of a very large-scale application cluster, we need to complete a lot of technical transformation. It is mainly because the horizontal expansion ability of Istio is relatively weak, there are many internal redundant operations, and the overall stability construction is relatively weak. In view of the above problems, our ideas are as follows:
Each node in the control plane does not carry all the governance data, and the system is expanded horizontally as a whole, on the basis of which the overall throughput and performance of each instance are improved.
The ability to cope with a sudden increase in instantaneous traffic when abnormal situations such as the disconnection of the computer room occur.
Only do the necessary P2P mode health check, cooperate with centralized health check for million-level node management.
Demand loading and data slicing are mainly realized by Adcore Pilot and Meta Server.
The logical architecture of Pilot is divided into three parts: SessionMgr, Snapshot and Diplomat, in which SessionMgr manages a series of actions and processes such as the life cycle, creation, interaction and destruction of each data session; Snapshot maintains the latest consistent snapshot of data, synchronizes resource updates to SessionMgr for processing, responds to data change notifications from different platforms, calculates and caches a group of data with associated relationships. The Diplomat module is responsible for interfacing with many platforms of the service governance system, and only this module is directly dependent on third-party platforms. Each Pilot node in the control plane does not load the entire registry and other data, but loads the relevant governance data needed by the Sidecar it controls on demand, that is, the relevant governance data of the application requested from SessionMgr, as well as the peer service registration information that the application is concerned about. In addition, all OCTO Proxy of the same application should be controlled by the same Pilot instance, otherwise it is easy to approach the full amount in the global state. How exactly is it realized? The answer is Meta Server, which realizes the discovery of machine services in the control plane and refines the control routing rules at the same time, thus realizing data slicing at the application level.
Meta Server controls the attribution relationship of each Pilot node responsible for the application of OCTO Proxy. When the Pilot instance is launched, it will be registered to Meta Server, and then the heartbeat will be sent regularly to renew the lease, and the abnormal heartbeat for a long time will be eliminated automatically. A more complex consistent hash strategy is implemented in Meta Server, which integrates node application, computer room, load and other information for grouping. When a Pilot node is abnormal or released, the OCTO Proxy belonging to the Pilot will connect to the replacement node regularly, without causing a storm to the back-end registry by a global random connection. When the exception or the published node is restored, the divided OCTO Proxy will be regularly reclassified to the current Pilot instance management. For OCTO Proxy applications with a large number of concerned nodes, Pilot can also be deployed independently, and routing management can be unified through Meta Server.
The naming service of the Mesh system requires Pilot to communicate with the registry. The conventional implementation is shown in the figure on the left (take Zookeeper as an example). When each OCTO Proxy establishes a session with Pilot, the client role subscribes to the registry for the server change listener of its concern. Assuming that the service needs to access 100 applications, at least 100 Watcher need to be registered. Assuming that there are 1000 instances of the application running at the same time, 100 instances 1000 = 100000 Watcher will be registered. There are still many applications with more than 1000 nodes in Meituan. In addition, many applications focus on the same peer nodes, which will cause a lot of redundant monitoring. When the scale is large, when the network jitter or the business is released centrally, it is easy to cause the storm effect to link the control plane to the back-end registry.
To solve this problem, we adopt the solution of hierarchical subscription. The session of each OCTO Proxy does not interact directly with the registry or other publish and subscribe systems, and the notification of changes is all managed by the Snapshot snapshot layer. The Snapshot is divided into three layers: the Data Cache layer docks and caches the original data of the registry and other systems with a granularity of application; the Node Snapshot layer retains the calculated node granularity data; the Ability Manager layer manages indexing and mapping. When there is a node state change in the registry, the change is pushed to the OCTO Proxy concerned about the change through the index.
For the scenario just mentioned, after isolating one tier, 1000 nodes need to register only 1000 Watcher, and only one change message will be sent to the Data Cache layer after a Watcher change, and then notified to 1000 OCTO Proxy according to the index, thus greatly reducing the load of the registry and Pilot.
In addition to reducing unnecessary interactions and improving performance, the Snapshot layer will also cache the calculated data format. On the one hand, a large number of the same requests will be blocked by the cache at the snapshot layer, on the other hand, it is easy to package the associated data together to avoid concurrency problems. Refer to the design of Envoy-Control-Plane. Envoy-Control-Plane packages all the data including xDS together, while we isolate the data, such as routing and authentication, and do not pull and update authentication information when routing data changes.
The main purpose of preloading is to improve the cold start performance of the service. We make the routing rules for Meta Server. Therefore, we load the latest data in the Pilot node in advance. When the business process starts, Proxy can immediately obtain the data from Snapshot, avoiding the problem of slow access for the first time.
By default, each Envoy agent in Istio performs P2P health inspection on all remaining Envoy in the whole cluster. When there are N nodes in the cluster, the square N detection is needed within a detection period (usually not very long). In addition, when the size of the cluster becomes larger, the load of all nodes will increase accordingly, which will become a great obstacle to scale-up deployment.
Different from full-cluster scanning, Meituan uses a centralized health check method, together with the necessary P2P detection. The specific implementation method is as follows: the central service Scanner monitors the status of all nodes, and when Scanner actively detects node anomalies or Pilot perceived connection changes notify Scanner scan to confirm node anomalies, Pilot immediately updates node status to Proxy through eDS. In this mode, only N times are needed in the detection cycle. Google's Traffic Director also adopts a similar design, but large-scale use requires some skills: the first is to choose the same computer room detection method to avoid the impact of the autonomy of the computer room, and the second is to reduce the misjudgment caused by the central testing machine due to its own GC or network anomalies, and uses the Double Check mechanism. In addition, in addition to centralized health examination, heartbeat detection will also be carried out on the peers that fail frequently, and the weight reduction or removal operation will be carried out according to the detection results to improve the success rate. 3.2 Integration Design of heterogeneous Governance system
OCTO Mesh needs to align the core governance capabilities of the current system, which inevitably connects Mesh with all the surrounding subsystems that govern the ecology. Istio and Kubernetes implement all the data storage, publishing and subscription mechanisms based on Etcd, but Meituan's more than 10 governance subsystems have different functions, different storage, and different publish and subscription modes, showing obvious heterogeneous characteristics. If you connect a function, you need to store the platform or other large-scale transformation, which is not feasible at all. One idea is to decouple the governance subsystem from the Pilot by a module that carries all the changes and sends the changes to the Pilot, but there are some problems to consider in this approach. Previously, the data concerned by each Pilot node is different, and the sharding rules may change from time to time, and there is a mechanism to send messages to the concerned Pilot nodes.
Generally speaking, we need to achieve three sub-goals: to get through all systems, to align governance capabilities, to quickly respond to the access of future new systems, and to send changes to the concerned nodes. Our solution is: an independent unified access center shields the storage, publish and subscribe mechanisms of all heterogeneous systems, and Meta Server undertakes the metadata management of real-time slicing rules.
The specific implementation mechanism is shown in the figure above: when each system changes, the client uses the client to push the change notification to the message queue, and only pushes the change but does not contain the specific value. (when Pilot receives the change notification, it will actively Fetch all the data. This way, on the one hand, ensures that the Mafka message is small enough, on the other hand, multiple changes do not need to keep order in the queue to resolve version conflicts. Adcore Dispatcher consumes information and pushes changes to the concerned Pilot machine according to the index, and when Proxy changes controlled by Pilot are synchronized to Meta Server,Meta Server, the index relationship is updated and synchronized to Dispatcher in real time. In order to solve the message loss in the mapping change gap between Pilot and application, Dispatcher uses the backtracking check change loss mode to compensate for the loss, so as to improve the reliability of the system.
3.3 Stability guarantee design
The system modified by Service Mesh can not avoid the two characteristics of "new" and "complex", any of which may bring stability risk to the system, so the availability and reliability of the whole link must be built in advance in order to popularize it easily. Meituan mainly focuses on controlling the scope of fault influence, abnormal real-time self-healing, real-time rollback, flexible availability, improving self-observability and regression ability.
The testing problem of the control plane is introduced separately here, and there is not much that the industry can learn from. The two-way communication of xDS is complex, so it is difficult to test the function like the traditional interface, and it is expensive to customize multiple Envoy to simulate the data surface for testing. We developed Mock-Sidecar to simulate the behavior of the real data plane to test the control plane, which is no different from the data plane for the control plane. Mock-Sidecar divides the overall behavior of the data plane into composable Step, and the mechanism and policy are separated. The execution engine is the so-called mechanism, which only needs to execute the Step step by step. The YAML file is a combination of Step that describes the policy. We manually construct a variety of YAML to simulate the behavior of the real Sidecar, and verify the control plane by regression. At the same time, different YAML files are executed in parallel, and stress testing can be carried out.
3.4 Design of operation and maintenance system
In order to cope with the pressure of millions of Proxy operation and maintenance in the future, Meituan independently built OCTO Proxy operation and maintenance system LEGO, which not only keeps Proxy alive, but also centrally controls the distribution. The specific operation process is as follows: the operation and maintenance personnel release the version on the LEGO platform, determine the scope and version of the release, upload the resource content of the new version to the resource warehouse, update the rules and scope to DB, send the upgrade instruction to the scope to be released, and receive the order from the LEGO Agent of the machine to go to the resource warehouse to pull the version to be updated (if there is a failure in the middle, there will be an active Poll mechanism to ensure the success of the upgrade). After the new version is downloaded successfully Start the new version of OCTO Proxy by LEGO Agent.
IV. Summary and Prospect 4.1 Summary of experience
The construction of service governance should focus on three aspects: system standardization, ease of use and high performance.
The Mesh of large-scale governance system should focus on the following:
The technical system of the adaptation company is more important than the trendy technology, focusing on containerization & the compatibility of the governance system.
Build a systematic stability guarantee system and operation and maintenance system.
Four magic weapons of OCTO Mesh control plane: Meta Server management and control of Mesh internal service registration discovery and metadata, tiered and sliced design, unified access center decoupling and connecting Mesh with existing governance subsystem, and centralized health check.
4.2 looking to the future, we will continue to explore the OCTO Mesh road, including, but not limited to, the following aspects:
Improve the system: gradually enrich the OCTO Mesh governance system, explore other types of traffic, and comprehensively improve the efficiency of service governance.
Large-scale landing: continue to build a robust OCTO Mesh governance system, and steadily promote the large-scale landing in the company.
Exploration of centralized governance capability: exploration of global optimal governance capability under the centralized control of the new governance model.
On how to achieve OCTO2.0 exploration and practice to share here, I hope that the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.