Advantages and disadvantages of important concepts in the basic principles of kafka 02/08 Update SLTechnology News&Howtos

Advantages and disadvantages of important concepts in the basic principles of kafka

2026-02-08 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Thursday, 2019-2-21, advantages and disadvantages of important concepts of basic principles of kafka

Official website: https://kafka.apache.org/

* × ×: http://archive.apache.org/dist/kafka/

Introduction to kafka

Kafka is a distributed message caching system for log processing of distributed message queues. Log data has a large capacity, but does not require high reliability. The log data mainly includes user behavior (login, browsing, clicking, sharing, liking) and system operation log (CPU, memory, disk, network, system and process status). At present, many message queuing services provide reliable delivery guarantee, and the default is instant consumption (not suitable for offline). High reliable delivery is not necessary for logs, so performance can be improved by reducing reliability. At the same time, by building distributed clusters, messages are allowed to accumulate in the system, so that kafka supports both offline and online log processing. One of the design concepts of Kafka is to provide both offline processing and real-time processing. According to this feature, real-time streaming systems such as Storm can be used for real-time online processing of messages, while batch processing systems such as Hadoop can be used for offline processing, and data can be backed up to another data center in real time at the same time. You only need to make sure that the Consumer used in these three operations belongs to different Consumer Group.

Kafka architecture

1. The servers in the kafka cluster are all called broker.

2. Kafka has two types of clients, one is called producer (message producer), and the other is called consumer (message Consumer). The tcp protocol is used to connect the client and broker server.

3. Messages from different business systems in kafka can be distinguished by topic, and each message topic is partitioned to share the load of message reading and writing.

4. Each partition can have multiple copies to prevent data loss.

5. If the data in a partition needs to be updated, it must be updated through the leader in all replicas of that partition.

6. Consumers can be grouped (Consumer Group), for example, there are two consumer groups An and B, and the messages consumed by a topic:order_info,An and B are not duplicated.

For example, there are 100 messages in order_info, and each message has an id with a number from 0 to 99. If group A consumes numbers 0-49, group B consumes numbers 50-99.

/ / in a production environment, you can also allow multiple consumer to consume data in the same topic together. You need to set an adjustment / / code snippet to achieve this.

7. Consumers can specify the starting offset when consuming messages in a certain topic.

Kafka Core Foundation of Kafka Series Video tutorials-Bobin

Why use kafka

1. As a cache

2. Decoupling

3. Time less than 10ms is basically a real-time

It can simplify the design of our system and prompt the development speed and efficiency of the company.

Why use message system

Decoupling

It is extremely difficult to predict what requirements the project will encounter in the future at the beginning of the project. The message system inserts an implicit data-based interface layer in the middle of the processing process, and both sides of the processing process should implement this interface. This allows you to extend or modify the processes on both sides independently, as long as you make sure they follow the same interface constraints.

redundancy

In some cases, the process of processing data will fail. Unless the data is persisted, it will be lost. Message queues persist data until they have been fully processed, avoiding the risk of data loss. In the insert-get-delete paradigm used by many message queues, before deleting a message from the queue, your processing system needs to clearly indicate that the message has been processed to ensure that your data is safely saved until you have finished using it.

Expansibility

Because message queues decouple your processing, it is easy to increase the frequency of queuing and processing of messages, as long as you add additional processing. There is no need to change the code or adjust the parameters. Expanding is as simple as turning up the power button.

Flexibility & peak processing power

In the case of a sharp increase in traffic, applications still need to continue to play a role, but such sudden traffic is not common; it is undoubtedly a huge waste to put resources on standby to be able to handle such peak visits. The use of message queuing enables key components to withstand sudden access pressure without completely collapsing due to sudden overloaded requests.

Recoverability

When some components of the system fail, the whole system will not be affected. Message queuing reduces the coupling between processes, so even if a process that processes messages dies, messages added to the queue can still be processed after the system is restored.

Sequence guarantee

In most usage scenarios, the order of data processing is important. Most message queues are sorted by nature, and it is guaranteed that the data will be processed in a specific order. Kafka guarantees the ordering of messages within a Partition.

Buffer

In any important system, there will be elements that require different processing times. For example, loading an image takes less time than applying a filter. Message queuing uses a buffer layer to help tasks perform as efficiently as possible-the processing of writing to the queue is as fast as possible. This buffer helps to control and optimize the speed at which data flows through the system.

Asynchronous communication

In many cases, users do not want or need to process messages immediately. Message queuing provides an asynchronous processing mechanism that allows users to put a message on the queue, but does not process it immediately. Put as many messages as you want in the queue, and then process them when needed.

Important concepts of kafka

Introduce several important concepts of kafka

Broker: message middleware processing node. A server node of Kafka is a broker, and multiple broker can form a Kafka cluster.

Topic: a class of messages, such as page view logs, click logs, etc., can exist in the form of topic. Kafka clusters can be responsible for the distribution of multiple topic at the same time.

Physical grouping of Partition:topic. A topic can be divided into multiple partition, and each partition is an ordered team.

Segment: each partition is made up of multiple segment file

Offset: each partition consists of a series of ordered, immutable messages that are continuously appended to the partition. Each message in partition has a contiguous sequence number called offset, which is used by partition to uniquely identify a message

Message: this is the smallest unit of storage in a kafka file, that is, a commit log.

Topic: create topic name

Partition: partition number

Offset: indicates how much message has been consumed by the partition

Logsize: indicates how many message have been produced by the paritition

Lag: indicates how many message are not consumed

Owner: represents the consumer

Create: indicates the time when the partition was created

Last seen: indicates the latest time when consumption status is updated

Advantages of kafka:

Message queuing kafka feature https://blog.csdn.net/qq_36236890/article/details/81174504

1. Stand-alone throughput:

Level 100000, which is the biggest advantage of kafka, that is, its high throughput. It generally cooperates with big data systems to implement data calculation, log collection and other scenarios.

2. The impact of topic data on throughput:

Topic ranges from dozens to hundreds, but the more topic, it will greatly affect the throughput, so under the same machine, kafka throughput ensures that the number of topic is not excessive. More cluster resources are needed to support large-scale topic.

3. Timeliness:

Delay is controlled within ms

4. Availability:

Very high, kafka is distributed Yes, multiple copies of a data, a small number of machine downtime, will not lose data, will not cause unavailability

5. Message reliability

After parameter optimization, the message can be lost at zero.

6. Functional support

The function is relatively simple and mainly supports simple MQ functions. Real-time computing and log collection in big data's field are used on a large scale, which is the de facto standard.

7. Summary of advantages and disadvantages

In fact, the characteristic of kafka is very obvious, that is, it only provides less core functions, but it provides high throughput, ms-level latency, high availability and reliability, and it is distributed and can be expanded arbitrarily. At the same time, kafka is also good at supporting a small number of topic to ensure its throughput, and the only disadvantage of kafka is that there may be repeated consumption of messages. Then it will have an impact on the accuracy of the data, which can be ignored in big data's field and log collection.

The characteristic of kafka is that it is naturally suitable for big data's real-time computing and log collection.

Kafka is inherently a distributed message queue, which can be composed of multiple broker, each broker is a node; you create a topic, this topic can be divided into multiple partition, each partition can exist on a different broker, each partition put a part of the data.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.