Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to understand the directory layout of messages in Kafka stored on disk

2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/03 Report--

This article focuses on "how to understand the directory layout of messages stored on disk in Kafka". Interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Let's let the editor learn how to understand the directory layout in which messages in Kafka are stored on disk.

Messages in Kafka are classified in terms of topics, which are logically independent of each other. Each topic can be divided into one or more partitions, and the number of partitions can be specified when the theme is created or modified later. Each message is appended to the specified partition according to the partition rules, and each message in the partition is assigned a unique sequence number, commonly known as offset. The logical structure of the topic with four partitions is shown in the figure below.

If the partition rules are set properly, all messages can be evenly distributed among different partitions, so that horizontal scaling can be achieved. Regardless of multiple replicas, a partition corresponds to a log (Log). In order to prevent the Log from being too large, Kafka introduces the concept of log segmentation (LogSegment), which divides the Log into multiple LogSegment, which means that a giant file is evenly distributed into multiple relatively small files, which is also convenient for message maintenance and cleaning.

In fact, Log and LogSegment are not purely physical concepts. Log is physically stored only as a folder, while each LogSegment corresponds to a log file and two index files on disk, as well as possible other files (such as transaction index files with the suffix ".txn index"). The following figure depicts the relationship between themes, partitions, replicas, Log, and LogSegment.

Image

Veteran drivers who have come into contact with Kafka generally know that Log corresponds to a folder named -. For example, suppose you have a theme called "topic-log" with four partitions, and the actual physical storage is represented as "topic-log-0", "topic-log-1", "topic-log-2", and "topic-log-3":

Messages are written sequentially when messages are appended to the Log, only the last LogSegment can perform the write operation, and before that, all LogSegment cannot write data. For ease of description, we call the last LogSegment "activeSegment", which represents the currently active log segment. As messages continue to be written, when the activeSegment meets certain conditions, a new activeSegment needs to be created, and then the appended messages will be written to the new activeSegment.

In order to facilitate message retrieval, log files in each LogSegment (with ".log" as file suffix) have two corresponding index files: offset index file (with ".index" as file suffix) and timestamp index file (with ".timeindex" as file suffix). Each LogSegment has a benchmark offset baseOffset, which is used to represent the offset of the first message in the current LogSegment. The offset is a 64-bit long integer, the log file and the two index files are named according to the base offset (baseOffset), the name is fixed to 20 digits, and the number of unreached digits is filled with 0. For example, the base offset of the first LogSegment is 0, and the corresponding log file is 00000000000000000000.log.

For example, send a certain amount of messages to the topic topic-log, and at some point the layout in the topic-log-0 directory is as follows.

In the example, the benchmark offset for the second LogSegment is 133, which also shows that the offset of the first message in the LogSegment is 133, and reflects that the first LogSegment has a total of 133 messages (messages with offsets from 0 to 132). Phellodendron mandshurica (Thunb.)

Note that each LogSegment contains not only ".log", ".index" and ".timeindex", but also temporary files such as ".log", ".cleaned" and ".swap", as well as possible ".snapshot", ".txn index", "leader-epoch-checkpoint" and other files.

From a broader perspective, the files in Kafka are not only those mentioned above, for example, there are also some checkpoint files. When a Kafka service is started for the first time, the following five files are created in the default root directory:

The displacement submitted by the consumer is saved in the topic _ _ consumer_offsets within Kafka, which does not exist initially and is automatically created when there is a consumer consumption message for the first time.

At some point, the file directory layout in Kafka looks like the figure above. Each root directory contains the four most basic checkpoint files (xxx-checkpoint) and meta.properties files. When creating a theme, if more than one root directory is configured in the current broker, the root directory with the least number of partitions will be selected to complete the creation task.

At this point, I believe you have a deeper understanding of "how to understand the directory layout of messages stored on disk in Kafka". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report