Kafka basic concept (component name role) 07/04 Update SLTechnology News&Howtos

Kafka basic concept (component name role)

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Write an introduction to several important concepts of kafka (you can refer to the brief introduction of the previous blog post Kafka):

Broker: message middleware processing node: a Kafka node is a broker, and multiple broker can form a Kafka cluster; Topic: a class of messages, such as page view logs, click logs, etc., can all exist in the form of topic, and Kafka clusters can be responsible for the distribution of multiple topic at the same time; Partition:topic physical grouping, a topic can be divided into multiple partition, each partition is an orderly queue; Segment: each partition is composed of multiple segment file. Offset: each partition consists of a series of ordered, immutable messages that are continuously appended to the partition. Each message in partition has a contiguous sequence number called offset, which is used by partition to uniquely identify a message; message: this is the smallest unit of storage in an kafka file, that is, a commit log. Topic: create topic name partition: partition number offset: indicate how many messagelogsize the partition has consumed: indicate how many messagelag has been produced by the paritition: indicate how many pieces of message have not been consumed owner: indicate the consumer create: indicate the partition creation time last seen: indicate the consumption status refresh the latest time reference link: you can view the production and consumption in kafka, how many message remaining are we using kafkaoffsetmonitor as a monitoring plug-in

Configuration and use of Kafka monitoring tool KafkaOffsetMonitor: https://www.cnblogs.com/dadonggg/p/8242682.html

What is topics? What is partition?

Topics is the basic unit of data storage in kafka.

To write data, specify which topic to write to read the data, and specify which topic to read from

We can understand it so simply.

Topic is similar to a table in a database. You can create any number of topic. Each topic has a unique name.

For example:

Program A generates a class of messages and then puts them in kafka group. The message generated by Program An is called a topic.

Program B needs to subscribe to this message in order to become a consumer of this topic

There will be one or more partitions (partitions) inside each topic

The data you write is actually written to one of the partition in every topic, and the current data is written to the paritition in an orderly manner.

Each partition maintains a growing ID, and every time you write new data, the ID grows, the id is called the offset of the paritition, and each message written into the partition corresponds to an offset.

Different partition will correspond to their own offset, and we can use offset to judge the order within the current paritition, but we can't compare the order from two different partition, which is meaningless.

The data in partition is ordered, and the data between different partition loses the order of the data. If the topic has multiple partition, the order of the data cannot be guaranteed when consuming the data. In scenarios where the consumption order of messages needs to be strictly guaranteed, the number of partition needs to be set to 1.

/ /

Each topic will be divided into multiple partition (zones)

Each topic will be divided into multiple partition (zones). In addition, kafka can also configure the number of backups required by the partitions (replicas).

Based on the replicated scheme, it means that multiple backups need to be scheduled; each partition has a server of "leader"; leader is responsible for all read and write operations, and if leader fails, other follower will take over (become the new leader); follower just monotonously follow up with leader and synchronize messages.. Thus it can be seen that server as a leader carries all the request pressure, so from the perspective of the cluster as a whole, how many partitions means how many "leader" there are. Kafka will evenly distribute the "leader" on each instance to ensure the stability of the overall performance.

Where the location of partition leader (host:port) is registered in zookeeper

When you write data to kafka, the changed data will be saved in kafka for 2 weeks by default. Of course, we can configure it. If the default is 2 weeks, more than 2 weeks, the data in kafka will be invalidated. At this point, the offset corresponding to the data has no other meaning.

Will the data be deleted automatically after it is read from kafka?

No, the deletion of data in kafka has nothing to do with whether there are consumers or not. The deletion of data is only related to the above two configurations on kafka broker:

Log.retention.hours=48 # data can be saved for up to 48 hours

Log.retention.bytes=1073741824 # data up to 1G

Tip: data written to kafka cannot be changed. One of his acquaintances is immutability. In other words, you have no way to change the data that has been written to kafka.

If you want to update a data memssage, you can only rewrite the memssage to the kafka, and the new message will have a new offset to distinguish it from the previously written message.

For each data written into the kafka, they will be randomly written to a partition in the current topic. With one exception, you provide a key to the current data, and at this time, you can use the current key to control which partition the current data should be passed into.

There can be multiple parititions in each topic, which is up to you to decide.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.