Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the basic knowledge points of Kafka

2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/01 Report--

Most people do not understand the knowledge points of this article "what are the basic knowledge points of Kafka", so the editor summarizes the following contents, detailed contents, clear steps, and has a certain reference value. I hope you can get something after reading this article. Let's take a look at this article "what are the basic knowledge points of Kafka?"

Kafka, originally developed by Linkedin, is a distributed, partitioned, multi-replica, multi-subscriber, distributed log system based on zookeeper coordination (which can also be used as a MQ system), which can be used for web/nginx logs, access logs, message services, etc., Linkedin contributed to the Apache Foundation in 2010 and became a top open source project.

The main application scenarios are: log collection system and message system.

The main design objectives of Kafka are as follows:

The ability of message persistence is provided in the way of time complexity O (1), which can guarantee the access performance of constant time even for data above TB level.

High throughput. Even on very cheap commercial machines, it is possible to support the transmission of 100K messages per second on a single machine.

Support message partitioning between Kafka Server, and distributed consumption, while ensuring the sequential transmission of messages within each partition.

Both offline data processing and real-time data processing are supported.

A typical kafka cluster contains several producer, several broker, several consumer, and a Zookeeper cluster. Kafka manages the cluster configuration through Zookeeper, elects leader, and rebalance when the consumer group changes. Producer publishes messages to broker,consumer using the push pattern subscribes and consumes messages from broker using the pull pattern.

Kafka terminology:

Broker: message middleware processing node. A Kafka node is a broker, and multiple broker can form a Kafka cluster.

Topic: a class of messages in which a Kafka cluster can be responsible for the distribution of multiple topic at the same time.

Partition:topic physical packets, a topic can be divided into multiple partition, each partition is an ordered queue.

Segment:partition is physically composed of multiple segment.

Offset: each partition consists of a series of ordered, immutable messages that are continuously appended to the partition. Each message in partition has a contiguous sequence number called offset, which is used by partition to uniquely identify a message.

Producer: responsible for publishing messages to Kafka broker.

Consumer: the message consumer, the client that reads the message to Kafka broker.

Consumer Group: each Consumer belongs to a specific Consumer Group.

At most once: at most, this is similar to the "non-persistent" message in JMS. Once sent, regardless of success or failure, it will not be retransmitted. The consumer fetch the message, then saves the offset, and then processes the message; when client saves the offset, but an exception occurs during the message processing, some of the messages cannot continue to be processed. Then the "outstanding" message will not be fetch to, this is "at most once".

At least once: the message is sent at least once, and if the message is not accepted successfully, it may be resent until it is successfully received. The consumer fetch the message, then processes the message, and then saves the offset. If the message is processed successfully, but during the save offset phase, the zookeeper exception causes the save operation not to be performed successfully, which results in the next time you fetch, you may get the message that has been processed last time, which is "at least once". The reason is that offset is not submitted to zookeeper,zookeeper in time to return to normal or previous offset state.

Exactly once: messages are sent only once. There is no strict implementation in kafka (based on 2-phase commit), and we don't think this strategy is necessary in kafka.

Usually "at-least-once" is our first choice.

Topic & Partition

A topic can be thought of as a class of messages, each topic will be divided into multiple partition, and each partition is an append log file at the storage level.

In Kafka file storage, there are several different partition under the same topic, each partition is a directory, the partiton naming rule is topic name + ordinal number, the first partiton serial number starts at 0, and the maximum serial number is the number of partitions minus 1.

Each partion (directory) is equivalent to a giant file that is evenly distributed among multiple equal-size segment (segment) data files. However, the number of segment file messages per segment is not necessarily equal, which makes it convenient for old segment file to be deleted quickly.

Each partiton only needs to support sequential read and write, and the life cycle of segment files is determined by server configuration parameters.

The advantage of this is that useless files can be deleted quickly and disk utilization can be effectively improved.

Segment file consists of two parts, index file and data file, which correspond to each other and appear in pairs. The suffixes ".index" and ".log" are represented as segment index files and data files respectively.

Segment file naming convention: the first segment of the partion global starts at 0, and each subsequent segment file is named as the offset value of the last message in the previous segment file. The maximum numeric value is 64-bit long size, 19-digit character length, and no numbers are filled with 0.

The above is about the content of this article on "what are the basic knowledge points of Kafka". I believe we all have a certain understanding. I hope the content shared by the editor will be helpful to you. If you want to know more about the relevant knowledge, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report