How to get started with kafka 07/04 Update SLTechnology News&Howtos

How to get started with kafka

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

Kafka how to get started, in response to this problem, this article details the corresponding analysis and solutions, hoping to help more small partners who want to solve this problem find a simpler and easier way.

Background:

In today's society, various application systems such as business, social networking, search, browsing, etc. are constantly producing various kinds of information like information factories. In the era of big data, we face the following challenges:

How to collect this huge amount of information

How to analyze it.

How to achieve the above two points in time

These challenges form a model of business requirements where producers produce information, consumers consume (process and analyze) it, and a message system is needed to bridge the gap between producers and consumers.

At a micro level, this requirement can also be understood as how messages are passed between different systems.

Kafka was born: open source by linked-in

Kafka is a framework for solving these problems, which enables seamless connections between producers and consumers.

kafka-A high-throughput distributed messaging system

Kafka Features: It describes its design as unique. Let's take a look at how it excels:

Fast: A single kafka service can process hundreds of megabytes of data per second from thousands of clients.

Scalability: A single cluster can act as a big data processing hub, centralizing various types of business

Persistence: Messages are persisted to disk (processing terabytes of data-level data while maintaining high data-processing efficiency) with backup fault tolerance

Distributed: Focus on big data domain, support distributed, cluster can process millions of messages per second

Real-time: messages produced are immediately consumed by consumers

Components of Kafka:

topic: the directory where the message is stored, i.e. topic

Producer: the party that produces messages to the topic

Consumer: The party subscribing to the topic consuming the message

Broker: Kafka's service instance is a broker

As shown below, messages produced by Producer are sent over the network to the Kafka cluster, from which Consumer consumes messages

Topic and Partition:

When messages are sent, they are sent to a topic, which is essentially a directory, and the topic is composed of some Partition Logs. Its organization structure is shown in the following figure:

(A theme can contain multiple partitions)

We can see that the messages in each Partition are ordered, and the produced messages are continuously appended to the Partition log, where each message is assigned a unique offset value.

The Kafka cluster saves all messages regardless of whether they are consumed; we can set expiration times for messages, and only expired data is automatically purged to free disk space. For example, if we set the message expiration time to 2 days, then all messages within these 2 days will be saved to the cluster, and the data will only be cleared after 2 days.

Kafka needs to maintain only one metadata--the offset value of the consumption message in Partition. For each message consumed by Consumer, offset will increase by 1. In fact, the state of the message is completely controlled by the Consumer. The Consumer can track and reset the offset value, so that the Consumer can read the message at any position.

There are multiple considerations for storing message logs in the form of Partitions. First, it is convenient to expand in the cluster. Each Partition can be adjusted to adapt to the machine it is located in, and a topic can be composed of multiple Partitions, so the whole cluster can adapt to any size of data. Second, it can improve concurrency because it can read and write in units of Partitions.

Distributed:

(master-slave cluster)

These Partitions are distributed on each server in the cluster, and each Partition can have multiple backups in the cluster, and the number of backups is configurable.

Each Partition has a leader server, and other backup servers are called followers, only the leader server will handle all read and write requests on this Partition, while other followers passively copy the data on the leader. If a leader dies, one of the servers in the followers is automatically promoted to leader. Thus, each server in the cluster acts as a leader server for a Partition and a follower server for other partitions.

Producers:

Producer can publish messages to a topic according to its own choice, Producer can also decide which Partition to publish messages to this topic, of course, we can choose the simple partition selection algorithm provided by API, or we can implement a partition selection algorithm ourselves.

Consumers:

Message delivery is typically based on two patterns, queuing and publish-subscribe.

queuing: Each Consumer takes one message from the message queue

pub-scrib: message is broadcast to every Consumer

Kafka implements both patterns by providing an abstraction for Consumer-Consumer Group. Consumer instances need to specify a ConsumerGroup name for themselves. If all instances use the same ConsumerGroup name, then consumers will work in queuing mode; if all instances use different ConsumerGroup names, then they will work in public-subscribe mode.

(The concept of group is only for clients. If multiple clients define multiple groups, broker will send messages to each group in the form of pub-scrib)

As shown in the following figure: A cluster containing two servers has four Partitions p0~p3 and two Consumer Groups. Partitions are consumed in queuing mode within a Group, and pub-scrib mode is consumed between Groups.

Message Sequentiality:

How does Kafka ensure sequential message consumption? As mentioned above, the order of messages in a Partition is ordered, but Kafka only ensures that messages are ordered in a Partition. If you want to make the messages in the whole topic orderly, then only one Partition can be set for a topic.

About kafka how to get started on the answer to the question shared here, I hope the above content can be of some help to everyone, if you still have a lot of doubts not solved, you can pay attention to the industry information channel to learn more related knowledge.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.