In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
How to deeply understand kafka design principles, many novices are not very clear about this, in order to help you solve this problem, the following editor will explain for you in detail, people with this need can come to learn, I hope you can gain something.
Kafka design principle
Recently opened a study of kafka, let's share the design principles of kafka. The original design intention of kafka is that as a unified information collection platform, it can collect feedback information in real time, and it needs to be able to support a large amount of data and have good fault tolerance.
1. Persistence
Kafka uses files to store messages, which directly determines that kafka performance depends heavily on the characteristics of the file system itself. And no matter under any OS, it is almost impossible to optimize the file system itself. File caching / direct memory mapping is a commonly used means. Because kafka performs append operations on log files, the cost of disk retrieval is small; at the same time, in order to reduce the number of disk writes, broker will temporarily buffer messages, and then flush to disk when the number of messages (or size) reaches a certain threshold, thus reducing the number of disk IO calls.
2. Performance
There are many performance points to consider. In addition to disk IO, we also need to consider network IO, which is directly related to the throughput of kafka. Kafka does not provide many superb skills; for the producer side, you can buffer the messages. When the number of messages reaches a certain threshold, batch sending to the broker; is the same for the client side, batch fetch multiple messages. However, the size of the message volume can be specified through the configuration file. For the kafka broker side, it seems that a sendfile system call can potentially improve the performance of the network IO: the data of the file is mapped to the system memory, and the socket can read the corresponding memory area directly without the need for the process to copy and swap again. In fact, for the three producer/consumer/broker, the cost of CPU should be small, so enabling the message compression mechanism is a good strategy; compression requires a small amount of CPU resources, but for kafka, network IO should be considered. Any message transmitted on the network can be compressed. Kafka supports gzip/snappy and other compression methods.
3. Producer
Load balancer: producer will maintain a socket connection with all partition leader under Topic; messages will be sent by producer directly to broker via socket without going through any "routing layer". In fact, it is up to the producer client to decide which partition the message is routed to. For example, we can use "random", "key-hash", "polling" and so on. If there are multiple partitions in a topic, it is necessary to achieve balanced message distribution on the producer side.
The location (host:port) of partition leader is registered in zookeeper, and producer, as zookeeper client, has registered watch to listen for partition leader change events.
Asynchronous sending: buffer multiple messages on the client temporarily and send them to broker in batches. Too many small data IO will slow down the overall network delay. Batch delay actually improves the network efficiency. However, there are some hidden dangers, such as when producer fails, those messages that have not been sent will be lost.
4. Consumers
The consumer side sends a "fetch" request to the broker and tells the offset; who gets the message that the consumer will get a certain number of messages; the consumer side can also reset the offset to re-consume the message.
In the JMS implementation, the Topic model is based on push, that is, broker pushes the message to the client. However, in kafka, pull mode is adopted, that is, consumer actively goes to pull (or fetch) messages after establishing a connection with broker; this mode has some advantages, first of all, the client can timely fetch messages and process them according to their consumption capacity, and can control the progress of message consumption (offset); in addition, consumers can well control the number of messages consumed, batch fetch.
In other JMS implementations, the location of message consumption is reserved by prodiver, in order to avoid sending messages repeatedly or resend messages that have not been successfully consumed, and to control the status of messages. This requires too much extra work for JMS broker. In kafka, there is only one consumer consuming messages in partition, and there is no message state control and complex message confirmation mechanism, so it can be seen that the kafka broker side is quite lightweight. After the message is received by consumer, consumer can save the offset of the last message locally and intermittently register offset with zookeeper. Thus, the consumer client is also very lightweight.
5. Message transmission mechanism
For JMS implementations, the message transfer guarantee is straightforward: there is and only once (exactly once). There is a slight difference in kafka:
1) at most once: at most, this is similar to the "non-persistent" message in JMS. Send once, regardless of success or failure, will not be resent.
2) at least once: the message is sent at least once. If the message is not accepted successfully, it may be resent until it is received successfully.
3) exactly once: the message will only be sent once.
At most once: the consumer fetch the message, then save the offset, and then process the message; when client saves the offset, but an exception occurs during the message processing, resulting in some messages not being processed. Then the "outstanding" message will not be fetch, this is "at most once".
At least once: the consumer fetch the message, then processes the message, and then saves the offset. If the message is processed successfully, but the zookeeper exception in the save offset phase causes the save operation not to be performed successfully, this may cause the message that has been processed last time to be obtained when you fetch next time. This is "at least once", because the offset is not submitted to zookeeper,zookeeper in time to return to normal or the previous offset state.
Exactly once: there is no strict implementation in kafka (based on 2-phase commit, transaction), and we don't think this strategy is necessary in kafka.
Usually the word "at-least-once" is selected by us. Compared to at most once, repeatedly receiving data is better than losing data.
6. Copy backup
Kafka copies each partition data to multiple server, any partition has one leader and multiple follower (may not have); the number of backups can be set through the broker configuration file. Leader handles all read-write requests, follower needs to keep synchronized with leader. Follower and consumer, consume messages and save them in the local log. Leader is responsible for tracking all follower status, and if follower "lags" too far or fails, leader will remove it from the replicas synchronization list. When all follower successfully save a message, the message is considered "committed", then the consumer can consume it. Even if only one replicas instance survives, messages can be sent and received normally, as long as the zookeeper cluster survives. (unlike other distributed storage, for example, hbase requires a "majority" to survive)
When leader fails, you need to select a new leader from followers. Maybe follower lags behind leader at this time, so you need to choose a "up-to-date" follower. When choosing follower, one problem needs to be taken into account, that is, the number of partition leader already carried on the new leader server. If there are too many partition leader on a server, it means that the server will bear more IO pressure. In electing a new leader, load balancing needs to be taken into account.
7. Journal
If a topic is named "my_topic" and has two partitions, the log will be saved in both the my_topic_0 and my_topic_1 directories; a sequence of "log entries" (log entries) is saved in the log file, and each log entry format is "4-byte number N indicates the length of the message" >
The segments list information held in each partiton is stored in the zookeeper.
When the segment file size reaches a certain threshold (can be set through the configuration file, the default is 1G), a new file will be created; when the number of messages in buffer reaches the threshold, the log information flush will be triggered to the log file, and if the "time difference from the last flush" reaches the threshold, flush will also be triggered to the log file. If broker fails, it is very likely to lose messages that have not yet been flush to the file. Because the unexpected implementation of server will still lead to the destruction of the log file format (the tail of the file), it is required that when server Qidong needs to check whether the file structure of the last segment is legal and make necessary repairs.
When getting a message, you need to specify the offset and the maximum chunk size. Offset is used to represent the starting position of the message, and chunk size is used to represent the maximum total length of the message (indirectly represents the number of messages). According to offset, you can find the segment file where the message is located, then take the difference according to the minimum offset of segment, get its relative position in file, and read the output directly.
The strategy for deleting log files is very simple: start a background thread to scan the log file list periodically and delete files whose storage time exceeds the threshold directly (according to the creation time of the file). In order to avoid read operation (consumer consumption) when deleting files, copy-on-write mode is adopted.
8. Distribution
Kafka uses zookeeper to store some meta information, and uses the zookeeper watch mechanism to discover changes in meta information and take corresponding actions (such as consumer failure, triggering load balancing, etc.)
1) Broker node registry: when a kafka broker starts, it first registers its own node information (temporary znode) with zookeeper, and when broker and zookeeper are disconnected, the znode will also be deleted.
Format: / broker/ids/ [0.N]-> host:port; where [0.N] represents broker id, each broker configuration file needs to specify a numeric type of id (global non-repeatable), and the value of znode is the host:port information of this broker.
2) Broker Topic Registry: when a broker starts, it registers its topic and partitions information with zookeeper, which is still a temporary znode.
Format: / broker/topics/ [topic] / [0.N] where [0.N] represents the partition index number.
3) Consumer and Consumer group: when each consumer client is created, it registers its own information with zookeeper; this function is mainly for "load balancing".
Multiple consumer in a group can staggered consume all the partitions; of a topic. In short, ensure that all the partitions of the topic can be consumed by this group, and for the sake of performance, let the partition be relatively evenly distributed to each consumer.
4) Consumer id Registry: each consumer has a unique ID (host:uuid, which can be specified by the configuration file or generated by the system). This id is used to mark consumer information.
Format: / consumers/ [group _ id] / ids/ [consumer _ id]
It is still a temporary znode, and the value of this node is {"topic_name": # streams...}, which represents the list of topic + partitions currently consumed by this consumer.
5) Consumer offset Tracking: used to track the largest partition currently consumed by each consumer.
Format: / consumers/ [group _ id] / offsets/ [topic] / [broker_id-partition_id]-- > offset_value
This znode is a persistent node, so we can see that offset is related to group_id to show that when one consumer in the group fails, the other consumer can continue to consume.
6) Partition Owner registry: used to mark which consumer partition is consumed by. Temporary znode
Format: / consumers/ [group _ id] / owners/ [topic] / [broker_id-partition_id]-- > consumer_node_id the action triggered when consumer starts:
A) carry out "Consumer id Registry" first
B) then register a watch under the "Consumer id Registry" node to listen for "leave" and "join" of other consumer in the current group. Any change in the node list under this znode path will trigger the load balance of the consumer under this group. (for example, if one consumer fails, then other consumer takes over partitions).
C) under the "Broker id registry" node, register a watch to monitor the survival of the broker; if the broker list changes, it will trigger all consumer re-balance under the groups.
1) the Producer side uses zookeeper to "discover" the broker list, establish an socket connection with each partition leader under the Topic and send messages.
2) the Broker side uses zookeeper to register broker information, and partition leader viability has been monitored.
3) the Consumer side uses zookeeper to register consumer information, including the partition list consumed by consumer, etc., and is also used to find the broker list, establish an socket connection with partition leader, and obtain messages.
Is it helpful for you to read the above content? If you want to know more about the relevant knowledge or read more related articles, please follow the industry information channel, thank you for your support.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.