How to understand message middleware Kafka and RocketMQ 04/21 Update SLTechnology News&Howtos

How to understand message middleware Kafka and RocketMQ

2025-04-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

In this issue, the editor will bring you how to understand the message middleware Kafka and RocketMQ. The article is rich in content and analyzes and narrates it from a professional point of view. I hope you can get something after reading this article.

Application scenario of message Middleware

Asynchronous decoupling

Cut peak and fill valley

Sequential transceiver

Distributed transaction consistency

Mainstream MQ Framework and its comparison

Description

Kafka: widely used in the whole industry

RocketMQ: Ali, hatched from apache

Pulsar: Yahoo open source, message queue in line with cloud native architecture, active community

RabbitMQ architecture is relatively old, and AMQP is not supported in mainstream MQ.

NSQ: memory type, not the best choice

ActiveMQ and ZeroMQ are negligible

Advantages of Kafka

Very mature, ecologically rich, closely connected with Hadoop

Very high throughput and high availability sharding increases replication speed

Main function: pub-sub, good compression support

It can be configured and used according to at least once and at most once, and exactly once needs the cooperation of Consumer.

The deployment of the cluster is simple, but the controller logic is very complex, so there are many copies of partition and data consistency.

Controller depends on ZooKeeper

Flash disk asynchronously (except for money business, there is little need for synchronous flush)

Shortcomings of Kafka

Write delay stability problem. Partition often uses a mechanical disk in Kafka. Random writing results in a drop in throughput and an increase in delay 100ms ~ 500ms.

Complexity of operation and maintenance the optimization of fast migration of supplementary replica data after a single machine failure: the old data will not move when the partition is migrated, and the new data will be directly switched after being written into the new partition for a certain period of time

RocketMQ

Ali adapts to online business scenarios such as e-commerce according to Kafka transformation.

At the expense of performance, enhance the function Ann key to query messages, maintain hash table, affect io in order to ensure write delay stability in multi-shard scenarios, at the broker level, put all the data currently written by shard into one file to form commitlog list, put several index files to maintain logical topic information, resulting in more random reads

There is no central management node, now it seems to be useless, there is not much metadata.

High-precision delay messages (Kuaishou has supported second-precision delay messages)

Pulsar

Storage and computing are separated to facilitate capacity expansion: bookkeeperMQ logic: stateless broker processing

Trend of development

Cloud origin

Batch integration: when you run a task, you need to → the Kafka data first, which consumes a lot of resources. If HDFS already exists, it can save a lot of resources.

Serverless

Development of each company

Kuaishou: all Kafka scenarios are using a special form of read-write separation data to be consumed to HDFS in real time. When consumer reads with obvious lag, broker forwards requests from the local disk without obvious disk random read and write because of consumer with lag. Due to self-transformation, it is difficult to introduce new features in the community.

Alibaba: open source RocketMQ

NSQ → RocketMQ offline scenario: BMQ of storage computing classification developed by Kafka → (the protocol layer is directly compatible with Kafka, and users can not change client)

Baidu: self-developed BigPipe, not so good

Meituan: refactoring with Java on the basis of Kafka architecture, internally called Mafka

Tencent: some use the self-developed PhxQueue, and the bottom layer is the KV system.

Didi: there may be problems with data consistency in multiple computer rooms when DDMQ encapsulates RocketMQ and Kafka.

Xiaomi: self-developed Talos architecture is similar to pulsar, storage is HDFS, and read scenarios are optimized.

Kafka

Kafka official website: https://kafka.apache.org/documentation/#uses

Latest version: 2.7

What is Kafka?

Open source message engine system (message queuing / message middleware)

Distributed stream processing platform

Publish / subscribe model

Cut peak and fill valley

Kafka terminology

Topic: topics for publishing subscriptions

Producer: the client that publishes the message to Topic

Consumer: consumer

Consumer Group: consumer group, where multiple consumers form a group

Service process of Broker:Kafka

Replication: backup, copy the same data to multiple machines Leader ReplicaFollower Replica, do not interact with the outside world

Partition: partition to solve scalability problems. Multiple Partition form a Topic.

Segment:partition consists of multiple segment

How is Kafka persisted?

Message log (Log) holds data, and disk overwrite (Append-only) avoids high throughput of slow random Istroke O operations.

Delete messages periodically (log segment)

Kafka file storage mechanism

Https://www.open-open.com/lib/view/open1421150566328.html

Each partition is equivalent to one giant file → in multiple equal-size segment data files

Each partition only needs to be read and written sequentially, and the life cycle of segment files is determined by configuration.

Segment file composition: index file: index file data file: data file

Segment file file naming rules: the first global segment is the maximum offset of each global partition followed by 0.

A pair of segment file

Physical structure of message

Partition Why is it partitioned?

Message organization of Kafka: topic-partition-message

A message that exists only in a partition

To improve scalability, different partitions can be put on different machines, and read and write operations are based on partition granularity.

Zoning strategy?

Polling

Random

Keep order according to key, single partition is orderly.

Will the message be lost by Kafka?

Limited persistence guarantee for "submitted" messages only committed messages: messages written to log files limited persistence guarantee: at least one of N broker survives

Producer lost data producer.send (msg) sends messages asynchronously, and there is no guarantee that the data will arrive at Kafkaproducer.send (msg, callback) judgment callback

When consumer programs lose data, they should "consume messages first, then update the order of displacements." New problem: repeated processing of messages, multithreaded asynchronous processing of messages, Consumer do not turn on automatic submission of displacements, applications manually submit displacements

Controller

Manage and coordinate the entire Kafka cluster with the help of ZooKeeper

During operation, only one Broker can be the controller.

How to choose the controller?

In the ZooKeeper creation / controller node, the first successfully created Broker is designated as the controller.

What's the use of the controller?

Theme management (create, delete, add partitions)

Partition redistribution

Leader election

Cluster member management (new Broker, Broker active shutdown, Broker downtime) (ZooKeeper temporary node)

Data service: the most complete cluster metadata information

Controller failover

Only one Broker is used as controller, single point of failure, enable backup controller immediately

ZooKeeper storage structure of Kafka

Application scenario of distributed transaction

Within the team, some operations update multiple data sources at the same time

After business team A completes an operation, a certain operation of business B must also be completed, and business A cannot directly access B's database.

Between companies, after the user pays, the payment system (Alipay / Wechat) must notify the merchant's system to update the order status.

The two stages are finally consistent.

First complete the transaction of data source A (phase 1)

After success, a mechanism is adopted to ensure that the transaction (phase two) of data source B will not be completed successfully and will be retried until successful or stopped after reaching a certain number of retries (reconciliation, manual processing).

How to ensure final consistency?

To ensure ultimate consistency, messaging systems and business programs need to ensure that:

Consistency of message delivery: when a message is sent, the first phase of the transaction and the message delivery must succeed or fail at the same time

Message storage is not lost: after the message is sent successfully, before the message is successfully consumed, the message server (broker) must store the good message to ensure that the message will not be lost in the event of a failure.

Consumers do not lose messages: processing failures are not discarded, and retry until success

How to ensure the consistency of message delivery?

Target: local transaction, message delivery must be successful / failed at the same time

problem

Execute the local transaction first and then send the message, which may fail.

You can put the failed message in memory and try again later, but the success rate cannot reach 100%.

Solution `* send half a message (Half Msg, similar to Prepare operation) first and will not deliver it to consumers

Half message is sent successfully, and then DB operation is performed.

After the successful execution of the DB operation, submit a half message

What happens if you send an exception?

1 exception, the half message failed to be sent, the local DB was not executed, the whole operation failed, and the status of the DB/ message was consistent (none of them were submitted)

2 exception / supermarket producer thinks it has failed, does not execute DBbroker storage half message successfully, cannot wait for follow-up operation, will ask producer whether to submit or rollback (step 6)

3 DB operation failed: producer told broker to roll back half message in step 4

4 failed to submit / roll back half message: broker cannot wait for this operation, trigger backcheck (step 6)

Failure of the 5th, 6th and 7th recheck: RocketMQ recheck up to 15 times.

The above is the message middleware Kafka, RocketMQ shared by the editor. If you happen to have similar doubts, you might as well refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.