What is the concept of MetaQ? 07/08 Update SLTechnology News&Howtos

What is the concept of MetaQ?

2025-07-08 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article mainly explains "what is the concept of MetaQ". The content of the explanation is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "what is the concept of MetaQ".

Message Middleware-- broadcaster of distributed message

Summary

Message middleware is the most typical middleware technology which is composed of message transmission mechanism or message queue pattern. Through message middleware, applications or components can communicate reliably asynchronously to reduce the coupling between systems, thus improving the scalability and availability of the whole system.

3.1 、 Notify

Notify is a set of message service engine independently developed by Taobao. It is one of the core systems supporting double 11. It is widely used in the core transaction scenarios of Taobao and Alipay. The core function of message system is three points: decoupling, asynchronous and parallel. Let me illustrate the specific meaning of decoupling asynchronism and parallelism with a practical example:

Suppose we have such an application scenario, in order to complete a user registration operation on Taobao, we may need to write the user information into the user library, then notify the red packet center to send the novice red packet to the user, and then need to notify Alipay to prepare the corresponding Alipay account for the user, verify the validity, and tell the sns system to import new users and other 10 steps.

So one of the simplest design methods for this scenario is to execute the entire process sequentially, as shown in figure 3-1:

The biggest problem with this approach is that as there are more and more back-end processes, each process takes a lot of extra time, resulting in longer waiting delays for users. Naturally, we can do business in parallel, which can greatly reduce latency, as shown in figure 3-2.

But after parallelism, there will be a new problem. In the step of user registration, the system initiates four requests in parallel. Among these four requests, if it takes a long time to notify SNS, for example, 10 seconds, then even if you send a new handbag, prepare your Alipay account, and verify the validity of these steps, the speed will be faster. Users still have to wait 10 seconds to complete the user registration process. Because only when all the subsequent operations are completed, the user registration process is really "completed". The information status of the user is complete. If there is a more serious accident at this time, such as all the servers that send red packets to beginners cause downmachines because of the business logic bug, then the business process will fail because the user's registration process has not been fully completed. This is obviously not in line with the actual needs, with the gradual increase of downstream steps, then the user waiting time will be longer and longer, and more seriously, with more and more downstream systems, the probability of error in the whole system is also increasing.

Through business analysis, we can know that the actual core process of users is actually only one, that is, user registration. And the follow-up preparation Alipay, notify sns and other operations must be completed, but it does not need to let users wait.

There is a professional term for this model, which is called final consistency. In order to achieve final consistency, we introduce the MQ system. The business process is as follows:

The mainstream process is shown in figure 3-3:

Figure 3-3-user Registration process-introduction of MQ system-main process

The asynchronous process is shown in figure 3-4:

Figure 3-4-user Registration process-introduction of MQ system-Asynchronous process

Core principle

The design idea of Notify is different from that of traditional MQ. His core design idea is

Design a system for message accumulation

No single point, freely expandable design

Next, please follow me into our messaging system to take a look at the core principles of his design.

Design most of the MQ products in the market for message accumulation, most of the core scenarios are peer-to-peer message transmission channels, and then very aggressive use of memory to improve the overall system performance, although the nominal tps can be very high, but this design idea is difficult to meet the actual needs of large-scale distributed scenarios.

In the actual distributed scenario, such a system will have a large application scenario bottleneck. Under the premise that there are a large number of consumers in the back end, it is very common for consumers to have problems, while the messaging system must be able to ensure that the user writes normally and the TPS will not fall in the case of unstable back-end consumption, which is a practical scenario that tests the capabilities of the messaging system.

Because of this, in the overall design of Notify, we give top priority to the problem of message accumulation. In the current design, we use the way of persistent disk, in which every time a user sends a message to Notify, the message is dropped first, and then delivered asynchronously, instead of using a radical memory solution to speed up the delivery.

In this way, although the system performance at the peak is lower than the current market MQ efficiency, but as the core unit of the whole business logic, stability, security and reliability is the core demand of the system.

No single point, freely expandable design

Figure 3-5 shows the five core parts that make up the entire ecosystem of Notify.

The cluster that sends messages is mainly the machines of the business side, and these APP machines do not have any status information. With the increase of user requests, you can increase or decrease the number of machines in the business sender at any time, so as to expand or reduce the cluster capacity.

The main purpose of configuring server cluster (Config server) is to dynamically perceive the process of application cluster, message cluster machine online and offline, and broadcast to other clusters in time. For example, when the machine on which the business receives messages goes offline, config server will sense that the machine is offline, so it kicks the machine out of the target user group, and notifies notify server,notify server that the offline machine can be deleted from its delivery target list after receiving the notification. In this way, the capacity of the machine can be automatically expanded.

Message server (Notify Server) message server, that is, the server that really carries the sending and receiving of messages, is also a cluster. When applications send messages, they can randomly choose a machine to send messages. If any server hangs, the system can run normally. When you need to increase your processing power, you can simply increase your notify Server.

The storage cluster of Storage Notify can be implemented in many different ways to meet the actual storage needs of different applications. For applications with high message security requirements, we will choose to store message data using multiple storage disks, while for scenarios where throughput is required but message security is not required, we can use memory storage model storage. Naturally, all storage is also designed as a random stateless write storage model to ensure that it can be expanded freely.

The server group used by the message receiving cluster business side to process messages can also be dynamically sensed by the config server when the machines are online and offline, so that the machines can be expanded automatically.

3.2. preparation and optimization of Notify double 11

In the whole preparation process of Singles' Day, Notify carries a lot of pressure, because our core assumption is that the back-end system will definitely fail, and we need the actual scenario in which all messages in the entire trading peak will be piled up in the database.

In many pressure tests, our system performance is still very stable, when 450 million messages are piled up with the number of 60w/s writes, the whole system performance is very calm and reliable. When the real promotion comes, the response efficiency of our back-end system is better than expected, so we can easily meet all the message delivery requests of users and better meet the actual needs of users.

3.3 、 MetaQ

METAQ is a complete queue model message middleware. The server is written in Java language and can be deployed on a variety of software and hardware platforms. The client supports Java and C++ programming languages and has been open source since March 2012. The open source address is http://metaq.taobao.org/. MetaQ has gone through the following three stages

MetaQ 1.0 was released in January 2011, which is derived from Apache Kafka and is mainly used internally for log transfer.

MetaQ 2.0 was released in September 2012, which solves the problem of limited number of partitions and has been widely used in database binlog synchronization.

With the release of MetaQ 3.0 in July 2013, MetaQ has been widely used in order processing, cache synchronization, streaming computing, IM real-time messaging, binlog synchronization and other fields. The MetaQ3.0 version is open source, see here

In summary, MetaQ draws lessons from the idea of Kafka, and combines the performance requirements of Internet application scenarios, and makes a new design of the data storage structure. At the functional level, function points that are more suitable for the characteristics of large-scale Internet have been added.

Introduction to MetaQ

Figure 3-overall structure of 6-MetaQ

As shown in figure 3-6, MetaQ provides a queue service externally, and the internal implementation is also a complete queue model, where the queue is a persistent disk queue with high reliability and makes full use of the operating system cache to improve performance.

Is a queue model of message middleware, with high performance, high reliability, high real-time, distributed characteristics.

Producer, Consumer, and queues can be distributed.

Producer sends messages to some queues in turn. The queue set is called Topic,Consumer. If you do broadcast consumption, a consumer instance consumes * * all queues corresponding to this Topic. If you do cluster consumption, multiple Consumer instances consume this topic queue set on average.

Can guarantee strict message order

Provide rich message pull mode

Efficient subscriber horizontal scalability

Real-time message subscription mechanism

100 million-level message accumulation capability

MetaQ storage structure

The storage structure of MetaQ is a completely redesigned storage structure according to the needs of Ali's large-scale Internet applications. Using this storage structure, it can support tens of thousands of queue models, and can support message query, distributed transaction, timing queue and other functions, as shown in figure 3-7.

Figure 3-7-MetaQ storage architecture

Tens of thousands of queues for MetaQ single machine

Most of the internal functions of MetaQ are driven by queues, so enough queues must be supported to better meet business needs. As shown in the figure, MetaQ can support tens of thousands of queues on a single machine, and all queues here are persistent disks, thus posing a challenge to IO performance. This is how MetaQ solves it.

Message are all written to a separate queue, completely sequentially

The location information of Message in the file is written to another file, which is written in serial mode.

In this way, we can not only achieve reliable data, but also support more queues, as shown in figure 3-8.

Figure 3-8-MetaQ stand-alone tens of thousands of queues

The difference between MetaQ and Notify

Notify focuses on transaction messages and distributed transaction messages.

MetaQ focuses on sequential message scenarios, such as binlog synchronization. And actively pull message scenarios, such as flow computing.

Preparation and optimization of MetaQ double 11

Improvement of the scheme of transferring Notify transaction message to MetaQ

MetaQ transaction cluster is mainly a mirror of Notify transaction messages. The old solution is to subscribe to Notify transaction messages through Notify-Client and then switch to MetaQ cluster. The disadvantages of this scheme are as follows: 1. Through the way of message subscription to bring greater pressure to Notify cluster 2. Once the MetaQ cluster is not processed in time, it will cause the accumulation of messages to Notify, which will bring chain adverse effects. The new scheme is to pull binlog directly from Notify DB to MetaQ, which brings the following advantages: 1. Liberate the pressure of NotifyServer cluster; 2. Through binlog batch processing, the throughput of the system can be improved.

Low latency optimization of transaction cluster

Tmall studio aims to obtain transaction data on the day of the event in real time, and to display various business data in a timely and accurate manner through real-time streaming computing. It is characterized by complete and accurate data and high real-time requirements. In the process of full-link pressure testing, it is found that there is a large delay when obtaining data from Notify Mysql binlog, and the most serious delay is as high as 4 hours, which obviously does not meet the requirements of the system. Based on these problems, we have made a lot of optimizations on the Notify Mysql side:

Expand the capacity of Mysql database instances, thus improving the overall throughput of the cluster

Optimize the storage location of binlog and separate the data storage from binlog storage to maximize the write performance of DB.

Because of the lock operation in the binlog operation of MySQL, the configuration of binlog generated by MySQL is optimized to ensure that there is no delay in pulling binlog.

Tuning the running parameters of different clusters

According to the characteristics of the business, the operating parameters of different clusters are tuned, such as batch pull size, disk brushing mode, data validity period, etc.; at the same time, parameters such as io scheduling and virtual memory are tuned to provide more efficient stacking.

Monitoring and real-time alarm

Any trusted system, at least need to find and deal with anomalies in time, eliminate the possibility of failure at the first time, so as to improve the availability of the product. Before the Singles Day event, we implemented monitoring statistics and active alarm by the business data indicators of each message within the MetaQ system according to the status of the cluster, as well as dynamic adjustment through Diamond, so as to improve the timeliness and flexibility of monitoring.

Looking back on the day of the Singles' Day, the total number of messages written on Taobao was 11.2 billion, the total number of messages delivered was 22 billion, the total number of written messages on Alipay was 2.4 billion, and the total number of messages delivered was 2.4 billion. Among them, the peak value of cluster message writing in real-time live room is 13.1w, and the peak value of message delivery is 27.8w.

On the whole, we are well prepared. The performance of MetaQ clusters is stable during the peak period and stable throughout the day. Individual subscription groups retrace messages, and some messages accumulate in a small amount, but they do not affect the system, and the effect is still very good. 75% of the transactions are completed on the gathering tower, the real-time live room transaction statistics is delayed by about 1 s, and the inventory is added or reduced to zero oversold.

Summary

At present, distributed message middleware products have served the whole group, supporting more than 500 business application systems of various companies of Ali Group. Processing massive messages more than 35 billion times a day, ensuring that all transaction data are highly reliable, high-performance, distributed transaction processing, is one of the oldest middleware products of the middleware team.

Supplementary reading: the way Notify implements distributed transactions

Source: http://club.alibabatech.org/resource_detail.htm?topicId=61

Business operations and message storage are carried out in the local transaction domain, and there are no transactions across resources.

The commit / rollback message may fail and the system will be in a temporary inconsistent state

Broker will actively send Check messages to confirm whether the messages are submitted or rolled back

Final agreement

Decompose a distributed transaction into two local transactions

The price to be paid by the client

Implement the CheckMessageListener interface

Extended Reading: the Application of MetaQ in double Twelve Lottery

Original post link: http://jm-blog.aliapp.com/?p=3405&utm_source=tuicool

Double Twelve Promotion is a year-end promotion in Taobao Fair, and the activity of scanning the QR code on the home page and giving away a lottery ticket on the same day makes everyone "play". In the face of the instant several times the usual peak, how to let users have a good experience, how to ensure the stable operation of the system, let's reveal all this.

Summarize the following points that the system needs to do:

RT is short enough

Uniform pressure distribution

Complex logic separation, asynchronous conversation

System structure diagram

It is generally divided into two parts: the activity system, the lottery system, and they are driven by messages.

Only one lottery allocation status is updated in the active system, and the data will be returned to the user if the data is updated successfully. The logic is simple enough to ensure that the RT is very short. The lottery system is responsible for some time-consuming operations such as real lottery issuing and updating user status.

The user scans the QR code on the home page to initiate a request for the active system, and the active system updates the lottery allocation status, generates a message, and returns immediately. The lottery system receives this message to complete some business operations such as subsequent ticket issuance. The whole process is driven by the message service provided by MetaQ. During the event, a large number of requests will pour into the activity system, and a large number of messages will be generated, and it is estimated that the qps will reach 24 weeks; and the message can not be lost to ensure that users can get a lottery ticket; activity system, lottery system business complexity, processing capacity is also large, there may be a large number of accumulation; in order to let users have a better experience, message consumption also needs to be enough timely. To sum up, there are some challenges for MetaQ:

High throughput

Reliable data

Efficient accumulation

Message delivery low enough latency

Here's how MetaQ does this.

Introduction to MetaQ

METAQ is a complete queue model message middleware. The server is written in Java language and can be deployed on a variety of software and hardware platforms. The client supports Java and C++ programming languages and has been open source with an open source address since March 2012. The design goal of MetaQ is high throughput and efficient stacking. The complete queue model also provides some features such as sequential messages, message backtracking, and so on.

Basic concept

Topic message topic

Partition partition, which represents a consumption queue (a Topic can be divided into multiple partitions. The more the number of partitions, the greater the parallelism, the higher the supported Qps)

Group consumption grouping, which represents a consumption cluster

What MetaQ provides is a queue service, and the internal implementation is also a complete queue model, where the queue is a persistent disk queue with high reliability and makes full use of the operating system cache to improve performance. These features are derived from the design of the storage layer.

MetaQ storage structure

The storage structure of MetaQ is a completely redesigned storage structure according to the needs of Ali's large-scale Internet applications. Using this storage structure, it can support tens of thousands of queue models, and can support message query, distributed transactions, timing queues and other functions, as shown in figure 3.

The storage layer can be roughly divided into two parts: data files (CommitLog) and index files. The data file saves the contents of all the messages, and the index file stores the offset of the data file in which the message is located. Message data does not distinguish between Topic, sequential append to CommitLog, index files are organized according to Topic-Partition dimension, and messages from different partitions are append to different index queues.

Message writing

The client sends a message and the data is first written to the file cache and a write index request is sent; the message index is built asynchronously.

Message reading

The client reads the message, finds the physical location and length of the message according to the location of the index, reads the data from the CommitLog, and accelerates the reading of the message through the file cache. Usually, hot data is in the cache without IO operation; non-hot data will trigger a page fault interrupt, and the data will be loaded into the file cache from disk and written directly to the socket buffer to prevent the data from entering the Java heap.

Data flushing disk

After brushing, the data is finally persisted to disk. There are two ways to flush the disk synchronously, immediately after the data is written to the cache to ensure that the data is returned to the client after the data is written to the cache. MetaQ has also made some optimization in the process of synchronous flushing to avoid excessive performance loss; asynchronous flushing, data batch time brushing to the disk.

Data cleaning

The message data is cleaned up regularly according to the validity period of the file.

Data replication

Applications that require high data reliability can ensure the reliability of data through data replication. MetaQ provides two methods of data synchronization: synchronous double write. After the data is written to the host, it will be written to the slave at the same time. Both master and slave are successfully written before the client is returned to the client successfully. There is no delay between master and slave. MetaQ has its own efficient data replication scheme between master and slave. Asynchronous replication, data is successfully returned to the client after data is written to the host, asynchronous synchronization between master and slave, and there is a certain delay between master and slave.

Tens of thousands of queues for MetaQ single machine

Most of the functions of MetaQ are driven by queues, stored as files, and manipulated by memory mapping. All messages are written to the data file (CommitLog) sequentially, completely sequentially, avoiding the random IO; message index distinguishing between serial writes to index files according to the dimensions of Topic and Partition. Tens of thousands of queues on a single machine can be achieved through this organization, as shown in figure 4.

Data flow graph

Message data is first written to the Java heap, then written to the file cache, and then Flush to disk according to the specific flushing strategy; when consuming messages, hot data is written directly from the system cache to socket and sent to the remote end; if non-hot data is generated by the system, the missing page interrupt is generated by the system to load the data from disk to the system cache to write to socket, and the data does not enter the application memory space. The management of memory and the swapping of pages are managed by OS. At the same time, the pull of messages is also batch, processing multiple pieces of data at a time to minimize round-trip time.

MetaQ performance depends on system memory allocation and effective use of disk IO, so we have also made some tuning to operating system memory allocation, dirty page writing strategy, and IO scheduling algorithm to make resource allocation time-consuming tend to be stable and maintain high throughput when stacking.

Load balancing

Sender load

By default, the sender writes a message to broker by polling, as shown in figure 6; you can also specify where the message is sent.

Consumer load

By default, the consumption cluster machines are divided into all consumption queues. The rest is consumed by top consumers, as shown in figure 7. The consumer load can also be customized, such as the priority of the computer room.

Thank you for your reading, the above is the content of "what is the concept of MetaQ", after the study of this article, I believe you have a deeper understanding of what the concept of MetaQ is, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.