Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the characteristics of Kafka

2025-02-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

What are the characteristics of Kafka? In view of this problem, this article introduces the corresponding analysis and answers in detail, hoping to help more partners who want to solve this problem to find a more simple and feasible way.

Kafka features: 1, high throughput for both publish and subscribe; 2, persistence operation to persist messages to disk, so it can be used for batch consumption; 3, distributed system, easy to scale out; 4, scenarios that support online and offline.

The characteristics and usage scene of Kafka

Kafka is a distributed publish-subscribe messaging system. It was originally developed by LinkedIn and later became part of the Apache project. Kafka is a distributed, divisible, redundant backup persistent logging service.

It is mainly used to deal with active streaming data. In the big data system, we often encounter a problem, the whole big data is composed of various subsystems, and the data needs to flow continuously with high performance and low delay in each subsystem.

The traditional enterprise message system is not very suitable for large-scale data processing. Kafka has emerged in order to handle both online applications (messages) and offline applications (data files, logs). Kafka can serve two purposes:

Reduce the complexity of system networking.

Reduce the programming complexity, each subsystem is no longer to negotiate the interface with each other, each subsystem is like a socket plugged into the socket, and Kafka acts as a high-speed data bus.

The main features of Kafka:

Provide high throughput for both publish and subscribe. It is understood that Kafka can produce about 250000 messages per second (50 MB) and process 550000 messages per second (110 MB).

Persistence operations can be carried out. Persist messages to disk, so they can be used for bulk consumption, such as ETL, and real-time applications. Prevent data loss by persisting data to the hard disk and replication.

Distributed system, easy to expand outward. All producer, broker, and consumer will have multiple, all distributed. The machine can be expanded without downtime.

The state in which the message is processed is maintained on the client side, not on the server side. It can balance automatically when it fails.

Scenarios that support online and offline.

The main design points of Kafka:

1. Use the cache of the linux file system directly to cache data efficiently.

2. Linux Zero-Copy is used to improve the transmission performance. The traditional data transmission needs to send 4 context switches. After using sendfile system call, the data is exchanged directly in the kernel state, and the system context switching is reduced to 2 times. According to the test results, data transmission performance can be improved by 60%.

3. The cost of accessing data on disk is O (1). Kafka uses topic for message management. Each topic contains multiple part (ition), and each part corresponds to a logical log, which is composed of multiple segment. Multiple messages are stored in each segment (see figure below), and the message id is determined by its logical location, that is, the message id can be located directly to the storage location of the message, avoiding the additional mapping of id to location. Each part corresponds to an index in memory, recording the offset of the first message in each segment. The messages sent by the publisher to a topic will be evenly distributed to multiple part (randomly or according to the callback function specified by the user). The broker receives the release message to add the message to the last segment of the corresponding part. When the number of messages on a segment reaches the configuration value or the message publishing time exceeds the threshold, the messages on the segment will be flush to disk, and only the message subscribers on the flush to disk can subscribe. When the segment reaches a certain size, no more data is written to the segment, and the broker creates a new segment.

4. Explicit distribution, that is, there will be multiple producer, broker and consumer, all of which are distributed. There is no load balancing mechanism between Producer and broker. Zookeeper is used for load balancing between broker and consumer.

All broker and consumer are registered in zookeeper, and zookeeper saves some of their metadata information. If a broker and consumer changes, all other broker and consumer are notified.

The answers to the questions about the characteristics of Kafka are shared here. I hope the above content can be of some help to you. If you still have a lot of doubts to be solved, you can follow the industry information channel for more related knowledge.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report