Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the core foundations of the MQ series?

2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/03 Report--

This article introduces the relevant knowledge of "what are the core foundations of the MQ series". In the operation of actual cases, many people will encounter such a dilemma. Next, let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

01 from the essence of MQ

Break up the MQ and crush it to see that it is all "one hair, one deposit, one consumption", and then bluntly, it is a "transponder".

The producer delivers the message to a container called a queue, then takes the message out of the container, and finally forwards it to the consumer, that's all.

The figure above is the original model of the message queue, which contains two keywords: message and queue.

1. Message: the data to be transferred can be the simplest text string or a custom complex format (as long as it can be parsed according to the predetermined format).

2. Queue: everyone should be familiar with it. It is a first-in, first-out data structure. It is a container for storing messages, a process in which messages enter the team from the end of the queue, leave the team from the head of the team, send messages as soon as they enter the team, and receive messages as soon as they leave the team.

02 evolution of the original model

If you look at today's most commonly used message queuing products (RocketMQ, Kafka, etc.), you will find that they all extend the original message model and come up with some new terms, such as topic, partition, queue, and so on.

To thoroughly understand these various new concepts, let's simplify them and start with the evolution of the message model (for example: architecture is never designed, but evolved).

2.1 queue model

The original message queue is the original model described in the previous section, which is a Queue in a strict sense. Messages are read out in the order in which they are written. However, there is no "read" operation in the queue, which means getting out of the queue and "deleting" the message from the head of the queue.

This is the queue model: it allows multiple producers to send messages to the same queue. However, if there are multiple consumers, it is actually a competitive relationship, that is, a message can only be received by one of the consumers and deleted after reading.

2.2 publish-subscribe model

If you need to distribute a message data to multiple consumers, and each consumer requires to receive a full number of messages. Obviously, the queue model does not meet this requirement.

One possible solution is to create a separate queue for each consumer and let the producer send multiple copies. This approach is stupid, and multiple copies of the same data will be copied, and it is a waste of space.

In order to solve this problem, another message model has been evolved: publish-subscribe model.

In the publish-subscribe model, the container in which the message is stored becomes a "topic", and subscribers need to "subscribe to the topic" before receiving the message. Eventually, each subscriber can receive a full number of messages on the same topic.

Carefully compare the similarities and differences between it and the "queue mode": the producer is the publisher, the queue is the topic, and the consumer is the subscriber. The only difference is whether a message data can be consumed multiple times.

2.3 Summary

Finally, to make a summary, the above two models put it bluntly: the difference between unicast and broadcasting. Moreover, when there is only one subscriber in the publish-subscribe model, it is the same as the queue model, so it is functionally fully compatible with the queue model.

This also explains why the mainstream modern RocketMQ and Kafka are directly based on the publish-subscribe model. In addition, why is there an Exchange module in RabbitMQ? In fact, in order to solve the problem of message delivery, we can implement the publish-subscribe model in disguise.

Including the concepts of "consumer group", "cluster consumption" and "broadcast consumption", which are related to the above two models, as well as the most common situations at the application level: intergroup broadcasting and intra-group unicast, also fall into this category.

Therefore, first master some common theories, and then learn the specific implementation principles of each message middleware, in fact, we can better grasp the essence and distinguish the concept.

03 look at the application scenario of MQ through the model

At present, there are many application scenarios of MQ, and what we can memorize is: system decoupling, asynchronous communication and traffic peaking. In addition, there are delayed notification, final consistency assurance, sequential messages, streaming, and so on.

So is there a message model or an application scenario first? The answer must be: first the application scenario (that is, the problem first), and then the message model, because the message model is just an abstraction of the solution.

After more than 30 years of development, MQ has evolved from the most primitive queuing model to today's various message middleware (platform-level solutions). I think it remains the same, thanks to the wide adaptability of the message model.

Let's try to re-understand the message queuing model. It actually solves the problem of communication between producers and consumers. So what is the connection and difference between it and RPC?

Through comparison, we can clearly see two differences:

1. After the introduction of MQ, the previous RPC has changed into the current two RPC, and the producer is only coupled with the queue, so it does not need to know the existence of consumers at all.

2. Adding an intermediate node "queue" to dump messages is equivalent to turning synchronization into async.

If you go back and think about all the application scenarios of MQ, it's not hard to understand why MQ works. Because these application scenarios nothing more than take advantage of the above two features.

To give a practical example, for example, the most common "order payment" scenario in e-commerce business: after the order payment is successful, you need to update the order status, update the user points, notify the merchant of a new order, update the user profile in the recommendation system, and so on.

With the introduction of MQ, order payment now only needs to focus on its most important process: updating the order status. All other unimportant matters will be notified by MQ. This is the core problem that MQ solves: system decoupling.

Before the transformation, the order system relies on three external systems, but after the transformation, it only depends on MQ, and the subsequent business is expanded (for example, the marketing system intends to reward coupons for paying users), and does not involve the modification of the order system, thus ensuring the stability of the core process and reducing the maintenance cost.

This transformation also brings another benefit: because of the introduction of MQ, the steps of updating user points, notifying merchants, and updating user portraits are all executed asynchronously, which can reduce the overall time-consuming of order payment and improve the throughput of the order system. This is another typical application scenario of MQ: asynchronous communication.

In addition, because the queue can dump messages, for scenarios that exceed the carrying capacity of the system, MQ can be used as a "funnel" for current-limiting protection, that is, the so-called traffic peaking.

We can also make use of the sequence of the queue itself to meet the scenario where messages must be delivered sequentially, and use queues + scheduled tasks to achieve delayed consumption of messages.

Other MQ application scenarios are basically similar. They can all return to the characteristics of the message model and find out why it is applicable. We will not analyze them one by one here.

In short, it is suggested that we should return from the complex and changeable practice scene to the theoretical level for thinking and abstraction, so that we can eat more thoroughly.

04 how to design a MQ?

After understanding the above theoretical knowledge and application scenarios, let's take a look at: how to design a MQ?

4.1 prototype of MQ

Let's start with a simple version of MQ. If we just implement a rough MQ without considering the requirements of the production environment, how should we design it?

At the beginning of the article, it is said that any MQ is nothing more than one send, one deposit and one consumption, which is the core functional requirement of MQ. In addition, from the technical perspective, the communication model of MQ can be understood as: two RPC + message dumps.

With this understanding, I believe that as long as you have a certain programming foundation, you can write a prototype of MQ in less than an hour:

1. Directly use the mature RPC framework (Dubbo or Thrift) to implement two interfaces: sending messages and reading messages.

2. The message can be stored in local memory, and the data structure can use the ArrayBlockingQueue that comes with JDK.

4.2 write a MQ suitable for production environment

Of course, our goal is not just a prototype of MQ, but to implement a message middleware that can be used in a production environment, which is certainly not an order of magnitude of difficulty, how should we start?

1. Grasp the key points of this problem first.

Suppose we still consider only the most basic functions: sending messages, storing messages, and consuming messages (publish-subscribe model is supported).

So what challenges will these basic functions face in a production environment? We can quickly think of the following:

1. How to ensure the performance of sending and receiving messages in high concurrency scenarios?

2. How to ensure the high availability and reliability of the message service?

3. How to ensure that the service can be expanded horizontally and arbitrarily?

4. How to ensure that message storage is also horizontally scalable?

5. How to manage all kinds of metadata (such as nodes, topics, consumption relationships, etc.) in the cluster, and do you need to consider the consistency of the data?

It can be seen that the three high problems in high concurrency scenarios will be encountered when you design a MQ. "how to meet non-functional requirements such as high performance and high reliability" is the key to this problem.

2. Overall design ideas.

First, let's take a look at the overall architecture, which involves three types of roles:

In addition, after further refinement of the core process of "one send, one deposit, one consumption", the more complete data flow is as follows:

Based on the above two diagrams, we can quickly identify the roles of the three types of roles, as follows:

1. Broker (server side): the core part of MQ is the server side of MQ. The core logic is almost all here. It provides RPC interface for producers and consumers, and is responsible for the storage, backup and deletion of messages, as well as the maintenance of consumer relations.

2. Producer (producer): one of the clients of MQ, which calls the RPC API provided by Broker to send messages.

3. Consumer (consumer): another client of MQ calls RPC API provided by Broker to receive messages and complete consumption confirmation at the same time.

3. Detailed design

Next, we will discuss some specific technical difficulties and feasible solutions.

Difficult 1:RPC communication

It solves the communication problem between Broker, Producer and Consumer. If you don't repeat the wheel, you can directly use the mature RPC framework Dubbo or Thrift to implement it, so you don't need to consider a series of problems such as service registration and discovery, load balancing, communication protocol, serialization and so on.

Of course, you can also do the underlying communication based on Netty, use Zookeeper, Euraka, etc., as a registry, and then customize a new set of communication protocols (similar to Kafka), or you can implement them based on AMQP, a standardized MQ protocol (similar to RabbitMQ). Compared with using RPC framework directly, this scheme has more customization ability and optimization space.

Difficulty 2: high availability design

High availability mainly involves two aspects: the high availability of Broker services and the high availability of storage solutions. The discussion can be taken apart.

The high availability of Broker services only needs to ensure that Broker can be scaled horizontally for cluster deployment, and is further guaranteed through service automatic registration and discovery, load balancing, timeout retry mechanism, and ack mechanism when sending and consuming messages.

There are two ideas for the high availability of storage schemes: 1) refer to the partition + multi-replica mode of Kafka, but you need to consider data replication and consistency schemes in distributed scenarios (such as Zab, Raft, etc.), and achieve automatic failover; 2) you can also use mainstream DB, distributed file systems, and KV systems with persistence, all of which have their own high availability schemes.

Difficulty 3: storage design

The message storage scheme is the core part of MQ. Reliability assurance has been discussed in the high availability design. If the reliability requirement is not high, it is possible to use memory or distributed cache directly. Here we focus on how to ensure the high performance of storage? The decisive factor of this problem is the design of the storage structure.

At present, the mainstream scheme is: append and write log files (data part) + index files (many mainstream open source MQ are in this way). Index design can consider dense indexes or sparse indexes, search messages can use jump tables, double lookups, etc., and can also improve the read and write performance of disk files through the operating system's page cache, zero copy and other technologies.

If you do not pursue high performance, you can also consider off-the-shelf distributed file systems, KV storage, or database solutions.

Difficulty 4: consumption relationship management

In order to support the publish-subscribe broadcast mode, Broker needs to know which Consumer subscriptions are available for each topic and deliver messages based on this relationship.

Because Broker is deployed in a cluster, consumer relationships are usually maintained on public storage, which can be managed and notified based on configuration centers such as Zookeeper, Apollo, and so on.

Difficulty 5: high performance design

The high performance of storage has been discussed earlier, and of course performance can be further optimized in other ways.

For example, the IO model of Reactor network, the design of business thread pool, the batch sending of production side, the asynchronous flushing of Broker side, the batch pull of consumer side and so on.

4.3 Summary

To sum up, the answer is good: how to design a MQ?

1. We need to start with functional requirements (sending and receiving messages) and non-functional requirements (high performance, high availability, high scalability, etc.).

2. Functional requirements are not the focus, and you can cover the most basic functions of MQ. Advanced features such as delayed messages, transaction messages, and retry queues are just icing on the cake.

3, the core is to be able to combine functional requirements, sort out the overall data flow, and then follow this train of thought to consider how to meet non-functional demands, which is the technical difficulty.

This is the end of the content of "what are the core foundations of the MQ series". Thank you for your reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report