In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
In this issue, the editor will bring you how to understand the message middleware Kafka and RocketMQ. The article is rich in content and analyzes and narrates it from a professional point of view. I hope you can get something after reading this article.
Application scenario of message Middleware
Asynchronous decoupling
Cut peak and fill valley
Sequential transceiver
Distributed transaction consistency
Mainstream MQ Framework and its comparison
Description
Kafka: widely used in the whole industry
RocketMQ: Ali, hatched from apache
Pulsar: Yahoo open source, message queue in line with cloud native architecture, active community
RabbitMQ architecture is relatively old, and AMQP is not supported in mainstream MQ.
NSQ: memory type, not the best choice
ActiveMQ and ZeroMQ are negligible
Advantages of Kafka
Very mature, ecologically rich, closely connected with Hadoop
Very high throughput and high availability sharding increases replication speed
Main function: pub-sub, good compression support
It can be configured and used according to at least once and at most once, and exactly once needs the cooperation of Consumer.
The deployment of the cluster is simple, but the controller logic is very complex, so there are many copies of partition and data consistency.
Controller depends on ZooKeeper
Flash disk asynchronously (except for money business, there is little need for synchronous flush)
Shortcomings of Kafka
Write delay stability problem. Partition often uses a mechanical disk in Kafka. Random writing results in a drop in throughput and an increase in delay 100ms ~ 500ms.
Complexity of operation and maintenance the optimization of fast migration of supplementary replica data after a single machine failure: the old data will not move when the partition is migrated, and the new data will be directly switched after being written into the new partition for a certain period of time
RocketMQ
Ali adapts to online business scenarios such as e-commerce according to Kafka transformation.
At the expense of performance, enhance the function Ann key to query messages, maintain hash table, affect io in order to ensure write delay stability in multi-shard scenarios, at the broker level, put all the data currently written by shard into one file to form commitlog list, put several index files to maintain logical topic information, resulting in more random reads
There is no central management node, now it seems to be useless, there is not much metadata.
High-precision delay messages (Kuaishou has supported second-precision delay messages)
Pulsar
Storage and computing are separated to facilitate capacity expansion: bookkeeperMQ logic: stateless broker processing
Trend of development
Cloud origin
Batch integration: when you run a task, you need to → the Kafka data first, which consumes a lot of resources. If HDFS already exists, it can save a lot of resources.
Serverless
Development of each company
Kuaishou: all Kafka scenarios are using a special form of read-write separation data to be consumed to HDFS in real time. When consumer reads with obvious lag, broker forwards requests from the local disk without obvious disk random read and write because of consumer with lag. Due to self-transformation, it is difficult to introduce new features in the community.
Alibaba: open source RocketMQ
NSQ → RocketMQ offline scenario: BMQ of storage computing classification developed by Kafka → (the protocol layer is directly compatible with Kafka, and users can not change client)
Baidu: self-developed BigPipe, not so good
Meituan: refactoring with Java on the basis of Kafka architecture, internally called Mafka
Tencent: some use the self-developed PhxQueue, and the bottom layer is the KV system.
Didi: there may be problems with data consistency in multiple computer rooms when DDMQ encapsulates RocketMQ and Kafka.
Xiaomi: self-developed Talos architecture is similar to pulsar, storage is HDFS, and read scenarios are optimized.
Kafka
Kafka official website: https://kafka.apache.org/documentation/#uses
Latest version: 2.7
What is Kafka?
Open source message engine system (message queuing / message middleware)
Distributed stream processing platform
Publish / subscribe model
Cut peak and fill valley
Kafka terminology
Topic: topics for publishing subscriptions
Producer: the client that publishes the message to Topic
Consumer: consumer
Consumer Group: consumer group, where multiple consumers form a group
Service process of Broker:Kafka
Replication: backup, copy the same data to multiple machines Leader ReplicaFollower Replica, do not interact with the outside world
Partition: partition to solve scalability problems. Multiple Partition form a Topic.
Segment:partition consists of multiple segment
How is Kafka persisted?
Message log (Log) holds data, and disk overwrite (Append-only) avoids high throughput of slow random Istroke O operations.
Delete messages periodically (log segment)
Kafka file storage mechanism
Https://www.open-open.com/lib/view/open1421150566328.html
Each partition is equivalent to one giant file → in multiple equal-size segment data files
Each partition only needs to be read and written sequentially, and the life cycle of segment files is determined by configuration.
Segment file composition: index file: index file data file: data file
Segment file file naming rules: the first global segment is the maximum offset of each global partition followed by 0.
A pair of segment file
Physical structure of message
Partition Why is it partitioned?
Message organization of Kafka: topic-partition-message
A message that exists only in a partition
To improve scalability, different partitions can be put on different machines, and read and write operations are based on partition granularity.
Zoning strategy?
Polling
Random
Keep order according to key, single partition is orderly.
Will the message be lost by Kafka?
Limited persistence guarantee for "submitted" messages only committed messages: messages written to log files limited persistence guarantee: at least one of N broker survives
Producer lost data producer.send (msg) sends messages asynchronously, and there is no guarantee that the data will arrive at Kafkaproducer.send (msg, callback) judgment callback
When consumer programs lose data, they should "consume messages first, then update the order of displacements." New problem: repeated processing of messages, multithreaded asynchronous processing of messages, Consumer do not turn on automatic submission of displacements, applications manually submit displacements
Controller
Manage and coordinate the entire Kafka cluster with the help of ZooKeeper
During operation, only one Broker can be the controller.
How to choose the controller?
In the ZooKeeper creation / controller node, the first successfully created Broker is designated as the controller.
What's the use of the controller?
Theme management (create, delete, add partitions)
Partition redistribution
Leader election
Cluster member management (new Broker, Broker active shutdown, Broker downtime) (ZooKeeper temporary node)
Data service: the most complete cluster metadata information
Controller failover
Only one Broker is used as controller, single point of failure, enable backup controller immediately
ZooKeeper storage structure of Kafka
Application scenario of distributed transaction
Within the team, some operations update multiple data sources at the same time
After business team A completes an operation, a certain operation of business B must also be completed, and business A cannot directly access B's database.
Between companies, after the user pays, the payment system (Alipay / Wechat) must notify the merchant's system to update the order status.
The two stages are finally consistent.
First complete the transaction of data source A (phase 1)
After success, a mechanism is adopted to ensure that the transaction (phase two) of data source B will not be completed successfully and will be retried until successful or stopped after reaching a certain number of retries (reconciliation, manual processing).
How to ensure final consistency?
To ensure ultimate consistency, messaging systems and business programs need to ensure that:
Consistency of message delivery: when a message is sent, the first phase of the transaction and the message delivery must succeed or fail at the same time
Message storage is not lost: after the message is sent successfully, before the message is successfully consumed, the message server (broker) must store the good message to ensure that the message will not be lost in the event of a failure.
Consumers do not lose messages: processing failures are not discarded, and retry until success
How to ensure the consistency of message delivery?
Target: local transaction, message delivery must be successful / failed at the same time
problem
Execute the local transaction first and then send the message, which may fail.
You can put the failed message in memory and try again later, but the success rate cannot reach 100%.
Solution `* send half a message (Half Msg, similar to Prepare operation) first and will not deliver it to consumers
Half message is sent successfully, and then DB operation is performed.
After the successful execution of the DB operation, submit a half message
What happens if you send an exception?
1 exception, the half message failed to be sent, the local DB was not executed, the whole operation failed, and the status of the DB/ message was consistent (none of them were submitted)
2 exception / supermarket producer thinks it has failed, does not execute DBbroker storage half message successfully, cannot wait for follow-up operation, will ask producer whether to submit or rollback (step 6)
3 DB operation failed: producer told broker to roll back half message in step 4
4 failed to submit / roll back half message: broker cannot wait for this operation, trigger backcheck (step 6)
Failure of the 5th, 6th and 7th recheck: RocketMQ recheck up to 15 times.
The above is the message middleware Kafka, RocketMQ shared by the editor. If you happen to have similar doubts, you might as well refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.