How to use the distributed system Architecture Design of JARVIS Meituan Real-time Logistics 07/09 Update SLTechnology News&Howtos

How to use the distributed system Architecture Design of JARVIS Meituan Real-time Logistics

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

In this issue, the editor will bring you about how to use the distributed system architecture design of JARVIS Meituan real-time logistics. The article is rich in content and analyzes and describes for you from a professional point of view. I hope you can get something after reading this article.

Background

Meituan takeout has been developed for five years, real-time logistics exploration has also experienced more than 3 years, the business from zero incubation to beginning to take shape, in the whole process has accumulated some construction experience of distributed high concurrency system. The main gains include two points:

The real-time logistics service has very low tolerance to failure and high delay, which requires the system to have the ability of distributed, scalable and disaster tolerance with the increase of business complexity. The real-time logistics system gradually implements the architecture upgrade of the distributed system in stages, and finally solves the risk of system downtime. Focusing on the three core elements of cost, efficiency and experience, the real-time logistics system combines a large number of AI technologies, from pricing, ETA, scheduling, capacity planning, capacity intervention, subsidies, accounting, voice interaction, LBS mining, business operation and maintenance, index monitoring and other aspects, business breakthroughs combined with architecture upgrading to achieve the effect of promoting scale, ensuring experience and reducing costs.

This paper mainly introduces the technical obstacles and challenges encountered in the layer-by-layer evolution of Meituan's real-time logistics distributed system architecture.

The large scale of orders and riders and the super-large-scale calculation of the matching process of supply and demand. In the event of holidays or bad weather, the order aggregation effect, the peak of traffic is more than ten times the usual. Logistics implementation is the key link of online and offline, the fault tolerance is very low, can not be down, can not lose the order, the availability requirement is very high. The data is required to be real-time and accurate, and is very sensitive to delays and anomalies. Meituan real-time logistics framework

Meituan's real-time logistics and distribution platform is mainly launched around three things: the first is to provide compliance SLA for users, including calculating delivery time ETA, distribution fee pricing, etc.; the second is to match the most suitable rider under the background of multi-objective (cost, efficiency, experience) optimization; the third is to provide riders with auxiliary decision-making in the process of complete implementation, including intelligent voice, path recommendation, store reminder and so on.

Behind a series of services is the support of Meituan's powerful technical system, and the resulting distribution business architecture, based on the architecture of the platform, algorithms, systems and services. The support of distributed system architecture is indispensable behind the huge logistics system, and this architecture should ensure high availability and high concurrency.

Distributed architecture is a kind of architecture system relative to centralized architecture. The distributed architecture applies CAP theory (Consistency consistency, Availability availability, Partition Tolerance partition tolerance). In the distributed architecture, a service is deployed in multiple peer nodes, the nodes communicate through the network, and multiple nodes form a service cluster to provide highly available and consistent services.

In the early days, Meituan was divided into several vertical service architectures according to the business domain; with the development of the business, he made a hierarchical service architecture from the perspective of availability. Later, the business development became more and more complex, and it gradually evolved to the micro-service architecture from the perspectives of operation and maintenance, quality and so on. Two principles are mainly followed here: it is not appropriate to enter the design of micro-service architecture prematurely. A good architecture is evolved, not designed in advance.

Distributed system practice

The figure above shows a typical distributed system structure under Meituan's technology system: relying on Meituan's public components and services, it has completed the capabilities of regional expansion, disaster recovery and monitoring. The front-end traffic will be distributed and load balanced through HLB; in the partition, services and services will communicate through OCTO, providing service registration, automatic discovery, load balancing, fault tolerance, grayscale publishing and other services. Of course, you can also communicate through message queues, such as Kafka, RabbitMQ. Zebra is used in the storage layer to access the distributed database for read and write operations. Use CAT (Meituan's open source distributed monitoring system) to collect, report and monitor distributed business and system logs. Distributed caching uses a combination of Squirrel+Cellar. Distributed task scheduling is through Crane.

In the process of practice, there are several problems to be solved, the typical one is the scalability of the cluster, the scalability of the stateful cluster is relatively poor, the machine can not be expanded rapidly, and the traffic pressure can not be alleviated. At the same time, there will also be problems of node hotspots, including uneven resources, uneven use of CPU and so on.

First of all, the distribution background technical team changes stateful nodes into stateless nodes through architecture upgrade, and allows small business nodes to share the computing pressure through the ability of parallel computing, so as to achieve rapid expansion.

The second is to solve the problem of consistency. For scenarios where both DB and cache are written, business write cache cannot guarantee data consistency, which is mainly solved by Databus in Meituan. Databus is a real-time database change transmission system with high availability, low latency, high concurrency and data consistency. Business Binlog changes can be monitored through the upstream of Databus, and the change information can be transmitted to ES and other DB, or other KV systems through pipelines, taking advantage of the high availability of Databus to ensure that data can eventually be synchronized to other systems.

The third is what we have been working hard to solve, that is, to ensure the high availability of clusters, mainly from three aspects: full-link pressure assessment, peak capacity estimation, periodic cluster health checks, and random fault drills (services, machines, components). Do abnormal alarm (performance, business indicators, availability); fast fault location (stand-alone failure, cluster failure, IDC fault, component exception, service exception); system change collection before and after failure. After the event, focus on system rollback; capacity expansion, current restriction, melting, downgrade; nuclear weapons.

Rapid deployment of single IDC & disaster recovery

After a single IDC failure, the ingress service realizes fault identification and automatic traffic switching; rapid capacity expansion of a single IDC, data synchronization in advance, service deployment in advance, and opening ingress traffic after Ready; all services for data synchronization and traffic distribution are required to have automatic fault detection and automatic removal of fault services, and the capacity can be scaled up or reduced according to IDC.

Multicenter attempt

Meituan IDC is based on partition. There is a full row of resources, and the partition cannot be expanded. Meituan's plan is for multiple IDC to form a virtual center, with the center as the partition unit; the services are deployed in the center without difference; the center capacity is not enough, so a new IDC is directly added to expand the capacity.

An attempt to unify

Compared with multi-center, unitalization is a better scheme for regional disaster recovery and capacity expansion. With regard to traffic routing, Meituan mainly uses regions or cities for routing according to the characteristics of the business. In data synchronization, there will be delays in different places. When there is a problem with the local or remote SET in SET disaster recovery, you can quickly switch the SET to another SET to bear the traffic.

Core Technical competence and platform precipitation of Intelligent Logistics

Machine learning platform is an one-stop model training and algorithm application platform from offline to online. The purpose of building this platform is to solve the contradiction of too many application scenarios of algorithms, the repetition of wheels, and the inconsistency of online and offline data quality. If the process is not clear and incoherent, there will be some obstacles such as low iterative efficiency and data quality in the online deployment of features and models.

JARVIS is an intelligent business operation and maintenance AIOps platform with the goal of stability guarantee. Mainly used to deal with system failures, there are many alarm sources, there will be a large number of repeated alarms, effective information is easy to be flooded and other problems. In addition, in the past, the operation and maintenance failures of small-scale distributed clusters mainly rely on people and experience to analyze and locate, with low efficiency, slow processing speed, unstable expectations of each fault processing, and can not be guaranteed in terms of effectiveness and timeliness. So the AIOps platform is needed to solve these problems.

The above is the editor for you to share how to use the JARVIS Meituan real-time logistics distributed system architecture design, if you happen to have similar doubts, you might as well refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.