The principle and stand-alone deployment mode of Kafka 07/11 Update SLTechnology News&Howtos

The principle and stand-alone deployment mode of Kafka

2025-07-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces "Kafka's principle and stand-alone deployment mode". In daily operation, I believe many people have doubts about Kafka's principle and stand-alone deployment mode. Xiaobian consulted all kinds of information and sorted out simple and easy operation methods. I hope to help you answer the doubts about "Kafka's principle and stand-alone deployment mode"! Next, please follow the small series to learn together!

Introduction and Principle of Kafka

Kafka is an open source stream processing platform published by the Apache Software Foundation and written in Scala and Java. It is a high-throughput distributed published subscription messaging system that can handle all action flow data in consumer-scale websites.

This action (web browsing, searching and other user actions) is a key factor in many social functions on the modern web. This data is usually addressed by log processing and log aggregation due to throughput requirements. This is a viable solution for log data and offline analytics systems like Hadoop, but with real-time processing constraints. Kafka aims to unify online and offline message processing through Hadoop's parallel loading mechanism, and to provide real-time messages through clustering.

1. Characteristics of kafka

Kafka is a high-throughput distributed publish-subscribe messaging system with the following characteristics:

Persistence of messages is provided through disk data structures that maintain stable performance over time even for terabytes of message storage;

Persistence: File storage is used, log files store messages, which need to be written to the hard disk. It is written to the hard disk only after reaching a certain threshold, thus reducing disk I/O. If kafka suddenly goes down, some data will be lost.

High throughput: Even very ordinary hardware kafka can support millions of messages per second;

Support partitioning messages through kafka servers and consumer clusters;

Support Hadoop parallel data loading.

2. Kafka related terms

Broker: Message middleware processing node, a Kafka node is a broker, one or more brokers can form a Kafka cluster;

Topic: Kafka classifies messages according to topic, and each message published to Kafka cluster needs to specify a topic;

Producer: message producer, client that sends messages to Broker;

Consumer: message consumer, client that reads messages from Broker;

Consumer Group: Each Consumer belongs to a specific Consumer Group. A message can be sent to multiple Consumer Groups, but only one Consumer in a Consumer Group can consume the message.

Partition: Physical concept, a topic can be divided into multiple partitions, each partition is ordered internally.

Difference between Topic and Partition

A topic can be thought of as a class of messages, each topic will be divided into multiple partitions, and each partition is an append log file at the storage level. Any messages posted to this partition are appended to the end of the log file, and the location of each message in the file is called offset, which is a long number that uniquely marks a message. Each message is appended to partition, which is written sequentially to disk, so it is very efficient (sequential writing to disk is faster than random writing to memory, which is an important guarantee of kafka's high throughput).

Each message sent to broker selects which partition to store according to partition rules (polling is used by default to write data). If partition rules are set properly, all messages can be evenly distributed into different partitions, thus achieving horizontal scaling. (If a topic corresponds to a file, then the machine I/O where the file is located will become the performance bottleneck of the topic, and partition solves this problem), if the message is consumed, append.log is retained for two days.

4. Kafka's architecture

As shown in the figure above, a typical kafka architecture consists of several producers (which can be server logs, business data, page views generated at the front end, etc.), several brokers (kafka supports horizontal scaling, and generally the more brokers, the higher the cluster throughput), several consumers (groups), and a Zookeeper cluster. Kafka manages cluster configuration through Zookeeper, elects leaders, and adjusts when consumer groups change. Producer uses push mode to publish messages to broker, consumer uses pull mode to subscribe and consume messages from broker.

Zookeeper cluster has two roles: leader and follower, leader provides external services, follower is responsible for the content generated in the leader synchronization message writing generated replicas (copies);

Kafka's guarantee of high reliability stems from its robust replicas policy. By adjusting its copy-related parameters, kafka can be made to juggle performance and reliability. Kafka provides partition-level replication starting with version 0.8.x.

5. Kafka's file storage mechanism

Messages in kafka are classified by topic, producers send messages to kafka brokers through topic, and consumers read data through topic. However, topic can be grouped into partitions at the physical level. A topic can be divided into several partitions. Partition can also be subdivided into segments. A partition is physically composed of multiple segments.

To illustrate the problem, assume that there is only one kafka cluster, and this cluster has only one kafka broker, that is, only one physical machine. Define the log file storage path of kafka in the server.properties configuration file of kafka broker to set the storage directory of kafka message files, and create a topic: test at the same time. The number of partitions is 4. Start kafka and you can see 4 directories generated in the log storage path. In the kafka file storage, there are multiple partitions under the same topic, each partition is a directory, and the partition name rule is: topic name + sequential sequence number, and the first sequence number starts from 0.

What is segment?

If partition is the smallest storage unit, we can imagine that when Kafka producer keeps sending messages, it will inevitably cause infinite expansion of partition files, which will have a serious impact on the maintenance of message files and the cleaning of messages that have been consumed. Each partition(directory) is equivalent to a giant file is evenly distributed into multiple segment(segment) data files of equal size (the number of messages in each segment file is not necessarily equal) This feature also facilitates the deletion of old segments, that is, facilitates the cleaning of consumed messages, and improves disk utilization. Each partition only needs to support sequential reading and writing.

The segment file consists of two parts, the ".index" file and the ".log" file, which are represented as the segment index file and the data file, respectively. The command rules for these two files are: partition The first segment of the global file starts from 0, and the name of each subsequent segment file is the offset value (offset) of the last message of the previous segment file. The numerical size is 64 bits, and the character length is 20 digits. No number is filled with 0.

6. Data reliability and durability assurance

When the producer sends data to the leader, the reliability level of the data can be set through the request.required.acks parameter:

1 (default): The producer leader has successfully received and acknowledged the data. If the leader goes down, data will be lost.

0: Producer does not wait for confirmation from broker to send next batch of messages. In this case, data transmission efficiency is the highest, but data reliability is indeed the lowest;

-1: The producer needs to wait for all followers to confirm that the data has been received before it is considered a transmission completion, with the highest reliability.

7. leader election

A message is considered submitted only if all followers have copied past the leader. This prevents some data from being written into the leader, and before it can be copied by any follower, it will go down, causing data loss. For the producer, it can choose whether to wait for the message commit.

A very common way to elect a leader is "minority obeys majority." In the process of copying data, there are multiple followers, and the data speed of each follower is different. When the leader goes down, whoever has the most data on the current follower is the leader.

II. Deployment of stand-alone kafka1. Deployment of kafka

Kafka service depends on JAVA environment, which I have here by default.

The kafka installation package can be downloaded from my web link.

#Unpack [root@kafka src]# tar zxf kafka_2.11-2.2.1.tgz [root@kafka src]# mv kafka_2.11-2.2.1 /usr/local/kafka[root@kafka src]# cd /usr/local/kafka/bin/#Start zookeeper[root@kafka bin]# ./ zookeeper-server-start.sh ../ config/zookeeper.properties &#Start kafka[root@kafka bin]# ./ kafka-server-start.sh ../ config/server.properties &[root@kafka bin]# netstat -anpt| grep 9092 #Make sure the port is listening

Since kafka is scheduled through zookeeper, even stand-alone kafka needs to start zookeeper service. The installation directory of kafka is integrated with zookeeper by default, and it can be started directly.

2. Test kafka#Create kafka locally with 1 copy and 1 partition [root@kafka bin]# ./ kafka-topics.sh--create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic test#View native topic[root@kafka bin]# ./ kafka-topics.sh--list --bootstrap-server localhost:9092#Send message to test[root@kafka bin]# ./ kafka-console-producer.sh--broker-list localhost:9092 --topic test>aaaa>bbbb>cccc#Open a new terminal for reading message test,"--from-beginning" means reading [root@kafka bin]# ./ kafka-console-consumer.sh--bootstrap-server localhost:9092 --topic test --from-initiingaaabbbcccc At this point, the study of "Kafka's principle and stand-alone deployment method" is over, hoping to solve everyone's doubts. Theory and practice can better match to help everyone learn, go and try it! If you want to continue learning more relevant knowledge, please continue to pay attention to the website, Xiaobian will continue to strive to bring more practical articles for everyone!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.