What is the architecture of Kafka? 07/13 Update SLTechnology News&Howtos

What is the architecture of Kafka?

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

In this issue, the editor will bring you about the architecture of Kafka. The article is rich in content and analyzed and described from a professional point of view. I hope you can get something after reading this article.

What is Kafka?

One of the most challenging parts of data engineering is how to collect and transfer large amounts of data from different points to distributed systems for processing and analysis. A large amount of data needs to be separated correctly through message queuing, because if some of the data cannot be transferred, other data can be transferred and analyzed when the system recovers. There are two types of message queuing, both of which are reliable and asynchronous for these purposes. Point to point (Point to point) and publisher-subscriber (publisher-subscriber). The following figure shows a typical messaging system in which the producer of the message is responsible for generating the message and the consumer of the message is responsible for processing the message.

Kafka is a distributed publish-subscribe messaging system. Kafka is fast, scalable, and durable. It retains the source of the message in the topic. The producer writes the data to the topic, and the consumer reads the data from the topic.

Zookeeper needs to overwrite the Kafka ecosystem, so it is necessary to download it, change its properties, and eventually set up the environment. After running Zookeeper, you should download Kafka, and then developers can use some instructions to create agents, clusters, and themes.

II. Classification of message queues

Point to point (Queue)

In peer-to-peer or one-to-one, there is a sender and multiple consumers who are listening on the sender. When a consumer receives a message from the queue, the particular message disappears from the queue and other consumers cannot get the message.

Publish and subscribe system (Topic)

In a publisher-subscriber, the publisher sends messages to multiple consumers or subscribers who listen to the publisher at the same time, and each subscriber can get the same message. Data should be transmitted through a data pipeline, which is responsible for integrating data from the data source.

III. The architecture of Kafka

Topics and publishers

A publisher sends a message. Messages are classified by topic, each with one or more partitions and its own offset address. For example, if we assign a replication factor = 2 to a topic, Kafka will create two identical copies for each partition and find it in the cluster.

Cluster and Brokers

The Kafka cluster consists of agents-- servers or nodes, each of which can reside on a different machine and allow subscribers to select messages. Therefore, replication is like backing up partitions, which means that Kafka is persistent, which helps with fault tolerance.

Zookeeper

The Kafka cluster does not retain the metadata of its own ecosystem because it is stateless. Therefore, Kafka relies on Zookeeper to track metadata. Zookeeper should start first. In fact, Zookeeper is the interface between brokers and consumers, and its existence is a necessary condition for fault tolerance. The Kafka agent is responsible for load balancing, assuming that the topic has a topic and multiple partitions, each with a leader, periodically confirming its offset from Zookeeper. Therefore, if a node or agent fails, Kafka can continue to operate from the last offset address requested by Zookeeper, so Zookeeper plays a vital role in Kafka recovery in the event of a crash.

IV. Deployment of Kafka stand-alone Broker

Deploy ZooKeeper

Configure / root/training/zookeeper-3.4.6/conf/zoo.cfg file dataDir=/root/training/zookeeper-3.4.6/tmpserver.1=hadoop112:2888:3888 create an empty file for myid in the / root/training/zookeeper-3.4.6/tmp directory echo 1 > / root/training/zookeeper-3.4.6/tmp/myid launch ZooKeeperzkServer.sh start to view the status of ZooKeeper zkServer.sh status

Since we are deploying a single-node ZooKeeper, the state of the ZooKeeper will be Standalone.

Deploy Kafka

Modify the server.conf file broker.id=0port=9092log.dirs=/root/training/kafka_2.11-2.4.0/logs/broker0zookeeper.connect=localhost:2181 to start Kafkabin/kafka-server-start.sh config/server.properties &

Use JPS to view background processes

5. Test Kafka create Topicbin/kafka-topics.sh-- create-- zookeeper bigdata111:2181-- replication-factor 1-- partitions 3-- topic mytopic1 send message bin/kafka-console-producer.sh-- broker-list bigdata111:9092-- topic mytopic1 receive message consumption bin/kafka-console-consumer.sh-- bootstrap-server bigdata111:9092-- topic mytopic1 consume bin/kafka-console-consumer.sh-- bootstrap-server bigdata111:9092-- from-beginning-- from the start location Topic topicName shows key consumption bin/kafka-console-consumer.sh-- bootstrap-server bigdata111:9092-- property print.key=true-- topic mytopic1

This is what the architecture of Kafka is like. If you happen to have similar doubts, you might as well refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.