In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-29 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article mainly explains "the summary of the core ideas and the underlying principles of Kafka". Interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Next let the editor to take you to learn the "Kafka core ideas summary and underlying principles" bar!
Summary of the core ideas of Kafka
All messages are stored in an "orderly log", the producer publishes the message to the end (which can be understood as an append), and the consumer reads it sequentially from a logical bit.
[scenario 1] message middleware
When choosing message middleware, our main concerns are: performance, message reliability, ordering.
1. Performance
The high performance of Kafka is mainly because it takes advantage of some low-level optimization techniques of the operating system in its implementation, although as a programmer writing business code, these low-level knowledge also need to be understood.
[optimization one] Zero copy
This is the optimization of Kafka on the consumer side. We use two pictures to compare the difference between the traditional method and the zero-copy method:
Traditional way:
Zero copy mode:
The ultimate goal: how to keep the data from passing through user space?
As can be seen from the figure, zero copy omits the step of copying to the user buffer and copies the data directly from the kernel space to the network card interface through the file descriptor.
[optimization 2] sequentially write to disk
When writing a message, the file is appended, and the message that has been written is not allowed to be modified, so the way to write to disk is to write sequentially. We generally think that disk-based read and write performance is poor, which refers to disk-based random read and write; in fact, based on disk sequential read and write, the performance is close to memory random read and write. The following is a performance comparison chart:
[optimization 3] memory mapping
Summary: a memory area of user space is mapped to kernel space, so that changes made by kernel space or user space to this memory area can be mapped directly to another area.
Advantages: if there is a large amount of data transmission in kernel state and user state, the efficiency is very high.
Why is it more efficient: generally speaking, the traditional method is read () system call with two copies of data, while the memory mapping method is mmap () system call with only one copy of data.
[optimization 4] batch compression
Producer: batch message set consumers: take the initiative to pull data, also using batch pull method
two。 Reliability.
The copy mechanism of Kafka is the core to ensure its reliability.
With regard to the replica mechanism, I understand it as the Leader-Follower mechanism, which means that there are multiple copies of the same data in multiple servers, and the granularity of partitioning is partitioning. Obviously, there are several problems that must be solved in such a strategy:
How do the copies synchronize?
ISR mechanism: Leader dynamically maintains an ISR (In-Sync Replica) list
Leader failure, how to elect a new Leader?
In order to solve this problem, it is necessary to lead to Zookeeper, which is the premise of Kafka to achieve the copy mechanism. About its principle and listen to the next decomposition, this article will analyze it from the perspective of Kafka. All we need to know here is that some meta-information about Broker, Topics, and Partitions is stored in Zookeeper, and when Leader fails, a new Leader is elected from the ISR collection.
Request.required.acks to set the reliability of data:
Knowledge points of partition mechanism and copy mechanism:
3. Sequentiality
The sequence guarantee mainly depends on the partition mechanism + offset.
When it comes to zoning, we should first explain the relevant concepts and the relationship between them. Personal summary is as follows:
Broker: refers to a stand-alone server
Topic (Topic): logical classification of messages across Broker
Partition: the physical classification of messages, the basic storage unit
Steal a picture here to illustrate the relationship between the above concepts
Why does the partitioning mechanism ensure the ordering of messages?
Kafka guarantees that messages within a partition are ordered and immutable.
Producer: the Kafka message is a key-value pair, and we specify that the message is sent to a specific partition of a particular topic by setting the key value.
You can set key to send messages of the same type to the same partition to ensure the order of the messages.
Consumer: consumers need to record where they consume by saving the offset. Before version 0.10, the offset was saved in zk and later in _ _ consumeroffsets topic.
[scenario 2] flow processing
After version 0.10, Kafka has built-in stream processing framework API--Kafka Streams, a stream processing class library based on Kafka, which makes use of the above. So far, Kafka has developed into a central stream processing platform including message system, storage system and flow processing system.
Unlike existing Spark Streaming platforms, Spark Streaming or Flink is a system architecture, while Kafka Streams belongs to a library. Kafka Streams adheres to the simple design principle, and the advantage lies in the operation and maintenance. At the same time, Kafka Streams maintains all the features mentioned above.
With regard to the suitable application scenarios of the two, some bosses have come to a conclusion, so they will not be forcibly summarized.
Kafka Streams: suitable for "Kafka-- > Kafka" scenario
Spark Streaming: suitable for "Kafka-- > database" or "Kafka-- > data science model" scenarios
At this point, I believe you have a deeper understanding of the "summary of the core ideas and underlying principles of Kafka". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.