Why use such a powerful distributed messaging middleware kafka 07/01 Update SLTechnology News&Howtos

Why use such a powerful distributed messaging middleware kafka

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article shows you why to use such a powerful distributed messaging middleware kafka, the content is concise and easy to understand, can definitely make your eyes bright, through the detailed introduction of this article, I hope you can get something.

Why kafka?

When we use a lot of distributed databases and distributed computing clusters, will we encounter some problems like this:

We want to analyze user behavior (pageviews) so that we can design better advertising space.

I want to make statistics on users' search keywords and analyze the current popular trend.

Some data are wasted to store the database, and the efficiency of storing the hard disk directly is low.

All these scenes have one thing in common:

The data is generated by the upstream module, which uses the data calculation, statistics and analysis of the upstream module. At this time, you can use the message system, especially the distributed message system!

We know that it is necessary to use a messaging system in the data processing system, but why do we have to choose kafka? Kafka is not the only messaging system today.

Introduction to Kafka

Kafka is an open source messaging system created by Linkedin in December 2010, which is mainly used to deal with active streaming data. Active streaming data is very common in web website applications, including page visits (Page View), information about the content being viewed, and searches. These data are usually recorded in the form of logs and then statistically analyzed at regular intervals.

The traditional log analysis system is a way to process log information offline, but there is usually a big delay if it is to be processed in real time. The existing message queuing system can well handle real-time or near real-time applications, but the unprocessed data is usually not written to disk. For offline applications such as Hadoop with a long interval, there will be problems in data security. Kafka is designed to solve the above problems, and it can be used for offline and online applications.

Kafka deployment structure

Message queue (Message Queue, referred to as MQ) is literally a queue, FIFO first-in, first-out, but the content stored in the queue is message. Its main purpose: communication between different processes Process/ threads Thread.

Several characteristics

High throughput: can meet the production and consumption of millions of messages per second-production consumption.

Load balancing: manage the dynamic joining and leaving of Producer,Broker,Consumer through zookeeper.

Pull system: because kafka broker persists data and broker has no memory pressure, consumer is very suitable for pull consumption of data.

Dynamic expansion: when you need to add broker nodes, the new broker will register with zookeeper, and producer and consumer will be aware of these changes through zookeeper and make adjustments in time.

Message deletion policy: the data file will be deleted after a certain period of time according to the configuration requirements in broker. Kafka frees up disk space in this simple way.

Message sending and receiving process

Start Zookeeper and Broker.

After Producer connects to the Broker, it publishes the message to the specified Topic in the Broker (Patition can be specified).

After receiving the message from Producer, the Broker cluster persists it to the hard disk and retains the message for a specified length of time (configurable), regardless of whether the message is consumed.

After the Consumer is connected to the Broker, start the message pump to listen to the Broker. When a message arrives, it will trigger the message pump loop to get the message. After obtaining the message, the Zookeeper will record the message Offset of the Consumer.

Kafka service

To kafka, the kafka service is like a big pool. Constantly producing, storing and consuming all kinds of messages. So what does kafka consist of?

Broker: Kafka message server, message center. A Broker can hold multiple Topic.

Producer: the message producer is the client that sends messages to Kafka broker.

Consumer: the message consumer, the client that fetches the message from Kafka broker.

Zookeeper: manage the dynamic joining and leaving of Producer,Broker,Consumer.

Topic: messages can be divided into several different topics, and Topic is the topic name. Producer can produce for a topic, and Consumer can subscribe to a topic.

Consumer Group: Kafka distributes messages by broadcasting. When a Consumer cluster consumes a certain Topic, Zookeeper establishes an Offset consumption offset for the cluster. When the latest Consumer joins and consumes this topic, it can start to consume from the latest offset point.

Partition:Kafka uses the way of data file slicing (Partition) that a Topic can be distributed and stored on multiple Broker, and a Topic can be divided into multiple Partition. Simultaneous access to a single partition in multiple Consumer will be controlled by synchronous locks.

Sometimes, it is not only the brightly lit world that can be addicted, but also the world of technology. And sometimes, the world of technology is more terrible than the former. it can not only make you fall into it quietly, but also give you the illusion that you are motivated and working so hard that it is too late to repent by the day you suddenly realize it.

Future Kafka middleware

At present, the middleware has only completed the functions of the primary stage, and many functions are not perfect and not in-depth. With the expansion of application business and the functional support of future versions of Kafka. Big data processing platform with Kafka message middleware as the center still has many tasks to achieve.

In general, the data that flows on the Internet consists of the following types:

For the transaction data that needs real-time response, the user submits a form and enters a piece of content. This kind of data is finally stored in the relational database (Oracle, MySQL), some of which need transaction support.

Activity stream data, quasi-real-time, such as page visits, user behavior, search, etc. We can broadcast, sort, personalize recommendations, monitor operations, and so on. This kind of data is generally the front-end server to write files first, and then pour the files into the big data analyzer such as Hadoop (offline data Analysis platform) in batches for slow analysis.

Logs generated by programs at all levels, such as http logs, tomcat logs, and other logs generated by various programs. One of this kind of data is used to monitor the alarm, and the other is used for analysis.

The above is why to use such a powerful distributed messaging middleware kafka, have you learned the knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.