How to build a highly reliable and highly available messaging platform based on Kafka 04/12 Update SLTechnology News&Howtos

How to build a highly reliable and highly available messaging platform based on Kafka

2025-04-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

In this issue, the editor will bring you about how to build a highly reliable and highly available messaging platform based on Kafka. The article is rich in content and analyzes and narrates it from a professional point of view. I hope you can get something after reading this article.

Kafka is a very popular messaging system, which was originally used as the basis of LinkedIn's active streaming data and operational data processing pipeline. Activity streaming data mainly includes page visits PV, access content and retrieval content and so on. Operational data refers to the server's performance data (CPU, IO usage, request time, service log, and so on). The usual way to deal with these data is to write various activities to a certain file in the form of a log, and then analyze these files periodically.

In recent years, with the rapid development of the Internet, activity and operational data processing has become a vital part of the website software product features, which needs a huge infrastructure to support it.

Kafka is a distributed, partitioned, multi-copy, distributed messaging system based on Zookeeper coordination, which has been donated to the Apache Foundation. Its biggest feature is that it can process a large amount of data in real time and support dynamic horizontal expansion, which can meet a variety of demand scenarios, such as Hadoop-based batch processing system, low-latency real-time system, Storm/Spark streaming engine, log processing system, message service and so on.

As shown in the figure above, the Kafka architecture is essentially production-storage-consumption and consists of the following four parts:

Producer Cluster: producer cluster, which is responsible for publishing messages to Kafka broker and is generally composed of multiple applications.

Kafka Cluster:Kafka server cluster. This is the most important part of Kafka, which is responsible for receiving the data written by the producer, persisting it to the file store, and eventually providing the message to the Consumer Cluster.

Zookeeper Cluster:Zookeeper cluster. Zookeeper is responsible for maintaining the Topic information, Kafka Controller and other information of the entire Kafka cluster.

Consumer Cluster: consumer cluster, which is responsible for reading messages from Kafka broker, and generally consists of multiple applications to get the information they want.

Explanation of related concepts of Kafka

A Broker Kafka cluster contains one or more servers, which are called broker

Topic every message published to the Kafka cluster has a category, which is called Topic. (physically, messages with different Topic are stored separately. Logically, messages from a Topic are stored on one or more broker, but users only need to specify the Topic of the message to produce or consume data regardless of where the data is stored.)

Partition is a physical concept, and each Topic contains one or more Partition

Producer is responsible for publishing messages to Kafka broker.

Consumer message consumer, the client that reads messages to Kafka broker

Consumer Group each Consumer belongs to a specific Consumer Group (you can specify a group name for each Consumer, or the default group if you don't specify a group name).

In the design of Kafka architecture, both producers, consumers, and message stores can scale dynamically and horizontally, thus improving the throughput, scalability, persistence, and fault tolerance of the entire cluster. Kafka is born as a distributed system, which gives it the following characteristics:

High throughput and low latency: Kafka can handle hundreds of MB of message throughput per second at low resource consumption, with a latency of at least a few milliseconds.

Scalability: Kafka clusters support dynamic horizontal scaling.

Persistence, reliability: messages are persisted to the local disk, and data backup is supported to prevent data loss.

Fault tolerance: nodes in the cluster are allowed to fail (if the number of replicas is n, 1 node is allowed to fail).

High concurrency: supports thousands of clients reading and writing at the same time.

More and more open source distributed processing systems such as Cloudera, Apache Storm, Spark and Flink support integration with Kafka. Kafka can be widely used in the following scenarios:

Message system: asynchronously decouple producers and consumers, trim peak and fill valley fluctuating flow peaks.

Log aggregation: you can use Kafka to collect operation logs of various services, and open them to various consumers through Kafka as a unified interface service. You can use other systematic storage and analysis systems such as Hadoop to statistically analyze the aggregated logs.

User activity tracking: Kafka is often used to record various activities of web users or app users, such as browsing, searching, clicking and other activities. These activity information is published to Kafka by each server, and then subscribers subscribe to these topic to do real-time monitoring analysis, offline analysis and mining.

Operational indicators: Kafka is also often used to record operational monitoring data. It includes collecting data from various distributed applications and producing centralized feedback on various operations, such as alarm and monitoring.

Streaming: Kafka can support offline data and streaming data processing, and can easily carry out data aggregation, analysis and other operations. Such as Spark streaming and Storm.

JD.com Zhaopin Cloud message queue Kafka version not only hosts open source Apache Kafka, enabling users to seamlessly migrate their original business code to the cloud without modification, but also enhances the creation, management, operation, and monitoring of Kafka clusters. Users can gain the following advantages through JD.com Zhaopin Cloud message queue Kafka deployment:

Multi-version creation

Multiple versions of Kafka V0.10, V1.0 and V2.4 are supported, and both prepaid and postpaid modes are supported. The support of three large versions makes it easy for users who use kafka to migrate to the cloud seamlessly. The postpaid model allows users to carry out testing and trial, eliminating the cost of machine and eliminating the trouble of multiple deployments.

Elastic expansion

Easy to expand, convenient and fast. Users can expand capacity on demand according to the use of resources, without affecting the existing business at the same time to meet the needs of business growth. It avoids the complex operation and business risk of users building their own Kafka for capacity expansion.

Management component

Each Kafka cluster created by the user is configured with Kafka Manager, which makes it convenient for users to interface and visualize the cluster management, eliminating the complex operation of API calls or command line tools.

Operation and maintenance monitoring

Cluster-level OPS-free, equipped with health check, unhealthy status automatic failover, no need for user OPS to ensure service availability. And monitor the status of the cluster, support multi-dimensional monitoring and early warning. It eliminates the worry that the service is unavailable but the user is not aware of it.

Let's take a look at how to build a highly reliable and high-throughput Kafak service through JD.com Zhaopin Cloud message queue Kafka.

one

Improve the throughput of JD.com Zhaopin Cloud Kafka

Topic topic in Kafka, as the main carrier for receiving messages, is generally divided into one or more partition partition, each partiton is equivalent to a sub-queue, and multiple partition is equivalent to multiple sub-queue working at the same time for disk writing and interactive processing, so increasing partition can increase the throughput of a single topic topic.

Physically, each partition corresponds to a physical file, and messages in Kafka are persisted to the local file system, keeping o (1) extremely efficient. Disk IO read and write is a very resource-consuming performance, so improving the disk iops and throughput can improve the speed of message writing to the disk, and accordingly improve the throughput.

The themes in Kafka are consumed by the consumer group consumer group. If the number of consumer in this consumer group is less than the number of partition in topic, consumer thread will process multiple partition at the same time. If the number of consumer in the consumer group is greater than the number of partition in the topic, the extra consumer thread will be idle, and the rest is a consumer thread processing a partition, which results in a waste of resources, because a partition cannot be processed by two consumer thread.

Recommendations:

1) increasing the number of partitions partition can effectively improve message throughput, and the number of partitions is preferably an integral multiple of the cluster processing node broker, so that the number of partitions allocated to each replica is more uniform.

2) use high iops and high throughput disk specifications and SSD type disks

3) increase the number of producer producer and consumer consumer, and the number of consumers is preferably equal to the number of partitions.

two

Improve the reliability of JD.com Zhaopinyun Kafka

The number of copies of the Kafka topic topic is the number of backup replica. If the number of replicas is 1, even if there are multiple Kafka node machines, when the machine in which the replica resides down, the corresponding data access will fail. However, the more copies, the better. The number of copies cannot be more than the number of broker processing nodes in Kafka. More copies will affect the performance of the service when data synchronization is carried out.

When a topic has multiple copies, the replica data synchronization strategy also affects the high reliability of the data when sending messages, which is mainly controlled by the parameter ack.

When ack=1 indicates that producer is successful in writing leader, broker returns success, regardless of whether other follower is successful or not

When ack=2 indicates that producer is successful in writing leader and another follower, broker returns success, regardless of whether the other partition follower is successful or not

When ack=-1, it means that only if the producer is written successfully will it be considered successful, and the kafka broker will return the success message.

It can be seen that when ack=-1, the reliability of the data is certainly the best, but this will affect the availability of the service, because all follower must be written successfully to be considered successful, and a problem with a follow will cause the service to be unavailable.

Recommendations:

1) when creating a Kafka theme, if there are requirements for data reliability, it is recommended to set the number of copies of the theme topic not less than three copies

2) when setting the ack parameters of sending messages, it is recommended to set more than half of the copies even if the copies are sent successfully, so that the reliability of the message can be taken into account without reducing the availability of the service.

3) choose synchronous or asynchronous mode when sending messages, care about the results of sending messages and deal with messages that fail to be sent.

The above is how to build a highly reliable and highly available messaging platform based on Kafka. If you happen to have similar doubts, please refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.