In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article introduces the knowledge of "what is the most primitive message model of Kafka". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!
Why start with Kafka?
The beginning of "eat through MQ" revolves around the essence of "one send, one save, one consumption" of MQ, explains the general knowledge of MQ, and systematically answers: how to start designing a MQ?
Starting with this article, I will explain the specific messaging middleware. There are three reasons why I choose to start with Kafka:
First, RocketMQ and Kafka are the two most popular message middleware at present, and Internet companies are the most widely used, which will be the focus of this series.
Second, from the perspective of the development of MQ, Kafka was born before RocketMQ, and the Ali team fully drew lessons from the design idea of Kafka when implementing RocketMQ. Having mastered the design principles of Kafka, it will be much easier to understand RocketMQ later.
Third, Kafka is actually a lightweight MQ, which has the most basic capabilities of MQ, but does not support advanced features such as delay queuing and retry mechanism, so it reduces the complexity of implementation. Starting with Kafka, it is helpful for everyone to quickly grasp the core things of MQ.
After explaining the background, please follow my train of thought and analyze Kafka from shallow to deep.
02 remove the veil of Kafka
Before analyzing a technology in depth, it is not recommended to understand the architecture and technical details, but to figure out what it is. What problem is it created to solve?
After mastering these background knowledge, it is helpful for us to understand the design considerations and design ideas behind it.
In writing this article, I have consulted a lot of information. The definition of Kafka can be said to be varied, and it is easy to get confused without careful deliberation. I think it is necessary to take members to sort it out.
Let's first take a look at the definition of ourselves on Kafka's official website:
Apache Kafka is an open-source distributed event streaming platform.
Apache Kafka is an open source distributed stream processing platform.
Isn't Kafka a messaging system? Why is it called a distributed stream processing platform? Are the two the same thing?
Some readers must have such questions. To explain this problem, we need to start with the background of the birth of Kafka.
Kafka started as a project incubated within Linkedin and was designed as a "data pipeline" to handle the following two scenarios:
1. Operational activity scenarios: record users' browsing, searching, clicking, activity and other behaviors.
2. System OPS scenario: monitor the server's CPU, memory, request time and other performance metrics.
It can be seen that these two kinds of data belong to the log category, which is characterized by real-time data production and a large amount of data.
Linkedin initially tried to use ActiveMQ to solve data transfer problems, but the performance could not meet the requirements, and then decided to develop its own Kafka.
So from the beginning, Kafka was created for real-time log streams. With this background, it is not difficult to understand the relationship between Kafka and streaming data, and why Kafka is so widely used in the field of big data. It is also because it was originally born to solve big data's pipeline problem.
Then explain: why is Kafka officially defined as a streaming platform? Doesn't it provide a data channel capability? why does it have anything to do with the platform?
This is because Kafka has been providing some components related to data processing since version 0.8, such as:
1. Kafka Streams: a lightweight flow calculation library with properties similar to Spark and Flink.
2. Kafka Connect: a data synchronization tool that can import data from Kafka into relational databases, Hadoop, and search engines.
It can be seen that the ambition of Kafka is not just a messaging system, it has long been developing towards a "real-time streaming platform".
At this time, it is not difficult to understand the three abilities mentioned in Kafka's official website:
1. Data publishing and subscribing capabilities (message queuing)
2. Distributed storage capacity of data (storage system)
3. Real-time data processing capability (stream processing engine)
In this way, the development history and definition of kafka are basically clear. Of course, this series only focuses on the first two capabilities of Kafka, because both are strongly related to MQ.
03 start with the message model of Kafka
Understand the orientation of Kafka and the background of its birth, and then we analyze the design idea of Kafka.
As I mentioned in the last article: to get through a MQ, it is recommended to start from the core theoretical level of the "message model", rather than looking at the technical architecture as soon as possible, let alone directly into the technical details.
The so-called message model, can be understood as a logical structure, it is a further layer of abstraction of the technical architecture, often implies the core design ideas.
Let's try to analyze the message model of Kafka and see how it evolved.
First of all, in order to distribute a message data to multiple consumers, and each consumer can receive a full amount of messages, it is natural to think of broadcasting.
Then the problem arises: a message is broadcast to all consumers, but not every consumer wants all the news. For example, consumer An only wants message 1, 2, 3, and consumer B only wants message 4, 5, 6. What are we going to do then?
The crux of this problem is that MQ does not understand the semantics of messages, and it simply cannot classify and deliver messages.
At this point, MQ came up with a clever idea: it threw the problem directly to the producer, requiring the producer to logically classify the message when sending the message, so it went in and out of the well-known Topic and publish-subscribe model.
In this way, consumers only need to subscribe to the Topic they are interested in, and then get the message from Topic.
But after doing so, there is still a question: what if multiple consumers are interested in the same Topic (such as Consumer C in the figure below)?
If you use the traditional queuing mode (unicast), when a consumer takes a message from the queue, the message is deleted and the other consumer cannot get it.
At this point, it's natural to think of the following solution:
That is, every time Topic adds a new consumer, it "replicates" an exact data queue.
This problem is solved, but as the number of downstream consumers increases, it will lead to a rapid degradation of MQ performance. Especially for Kafka, it was born to deal with big data scene, this kind of replication operation is obviously too expensive.
At this time, there is a solution to the most finishing point of Kafka: it persists all messages, and consumers get what they need, and they can take any message they want, and they can take it whenever they want. You only need to pass the offset of a message.
Such a fundamental change completely shifts the complex consumption problems to consumers, which greatly reduces the complexity of Kafka itself, thus laying a good foundation for its high performance and high expansion. (this is the core where Kafka differs from ActiveMQ and RabbitMQ.)
Finally, to simplify, this is the following picture:
This is the end of the content of "what is the most primitive message model of Kafka". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.