Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the kafka analysis and stand-alone usage record?

2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

In this issue, the editor will bring you about kafka analysis and stand-alone usage records. The article is rich in content and analyzed and described from a professional point of view. I hope you can get something after reading this article.

1. The system environment used

Root@heidsoft:~# uname-a

Linux heidsoft 4.4.0-63-generic # 84-Ubuntu SMP Wed Feb 1 17:20:32 UTC 2017 x86 "64 GNU/Linux

2.JDK environment

Root@heidsoft:~# java-version

Java version "1.8.0,131"

Java (TM) SE Runtime Environment (build 1.8.0_131-b11)

Java HotSpot (TM) 64-Bit Server VM (build 25.131-b11, mixed mode)

3. Software version environment

4. Profile environment

Kafka: server.properties default configuration

Zookeeper: zoo.cfg default configuration

5. Start the application

Zookeeper: sh zkServer.sh start

Kafka: bin/kafka-server-start.sh config/server.properties &

6.kafka test

Producer testing-sending messages

Echo "Hello, World" | bin/kafka-console-producer.sh-- broker-list localhost:9092-- topic TutorialTopic > / dev/null

Consumer testing-receiving messages

Bin/kafka-console-consumer.sh-new-consumer-topic TutorialTopic-from-beginning-bootstrap-server localhost:9092

7.ShowDemo

8. Concept cognition

Broker

A Kafka cluster contains one or more servers, which are called broker

Topic

Every message posted to the Kafka cluster has a category, which is called Topic. (physically, messages with different Topic are stored separately. Logically, messages from a Topic are stored on one or more broker, but users only need to specify the Topic of the message to produce or consume data regardless of where the data is stored.)

Partition

Parition is a physical concept. Each Topic contains one or more Partition.

Producer

Responsible for releasing messages to Kafka broker

Consumer

Message consumer, the client that reads the message to Kafka broker.

Consumer Group

Each Consumer belongs to a specific Consumer Group (you can specify a group name for each Consumer, or the default group if you do not specify group name).

9. Frame cognition

Kafka is a distributed publish-subscribe messaging system. It was originally developed by LinkedIn and later became part of the Apache project. Kafka is a distributed, divisible, redundant backup persistent logging service. It is mainly used to deal with active streaming data.

In the big data system, we often encounter a problem, the whole big data is composed of various subsystems, and the data needs to flow continuously with high performance and low delay in each subsystem. The traditional enterprise message system is not very suitable for large-scale data processing. Kafka has emerged in order to handle both online applications (messages) and offline applications (data files, logs). Kafka can serve two purposes:

Reduce the complexity of system networking.

To reduce the programming complexity, each subsystem is no longer to negotiate the interface with each other, each subsystem is like a socket plugged into the socket, and Kafka undertakes the role of high-speed data bus.

10. Frame characteristics

Provide high throughput for both publish and subscribe. It is understood that Kafka can produce about 250000 messages per second (50 MB) and process 550000 messages per second (110 MB).

Persistence operations can be carried out. Persist messages to disk, so they can be used for bulk consumption, such as ETL, and real-time applications. Prevent data loss by persisting data to the hard disk and replication.

Distributed system, easy to expand outward. All producer, broker, and consumer will have multiple, all distributed. The machine can be expanded without downtime.

The state in which the message is processed is maintained on the client side, not on the server side. It can balance automatically when it fails.

Scenarios that support online and offline.

12.Kafka topology

As shown in the figure above, a typical Kafka cluster contains several Producer (which can be Page View generated by the front end of web, or server logs, system CPU, Memory, etc.), several broker (Kafka supports horizontal scaling, generally, the higher the number of broker, the higher the cluster throughput), several Consumer Group, and a Zookeeper cluster. Kafka manages the cluster configuration through Zookeeper, elects leader, and rebalance when the Consumer Group changes. Producer publishes messages to broker,Consumer using the push pattern subscribes and consumes messages from broker using the pull pattern.

The design of 13.Kafka

1. Throughput

High throughput is one of the core goals that kafka needs to achieve. For this reason, kafka has made the following designs:

Data disk persistence: messages are not cache in memory and are directly written to disk, making full use of the sequential read and write performance of disk

Zero-copy: reduce the IO procedure

Batch sending of data

Data compression

Topic is divided into multiple partition to improve parallelism

Load balancing

Producer sends the message to the specified partition according to the algorithm specified by the user

There are multiple partiiton, each partition has its own replica, and each replica is distributed on different Broker nodes

Multiple partition needs to select lead partition,lead partition for reading and writing, and zookeeper is responsible for fail over

Manage the dynamic joining and leaving of broker and consumer through zookeeper

Pull system

Because kafka broker persists data and broker has no memory pressure, consumer is very suitable for pull consumption of data, with the following benefits:

Simplify kafka design

Consumer independently controls the pulling speed of messages according to consumption capacity.

Consumer chooses its own consumption mode according to its own conditions, such as bulk consumption, repeated consumption, consumption from the end, etc.

Expandability

When there is a need to add broker nodes, the new broker will register with zookeeper, and producer and consumer will perceive these changes according to the watcher registered on zookeeper and make timely adjustments.

Application scenarios of Kafka

1. Message queue

Compared with most messaging systems, Kafka has better throughput, built-in partitioning, redundancy and fault tolerance, which makes Kafka a good solution for large-scale message processing applications. Messaging systems generally have relatively low throughput but require less end-to-end latency and rely on the powerful persistence guarantees provided by Kafka. In this area, Kafka is comparable to traditional messaging systems, such as ActiveMR or RabbitMQ.

two。 Behavior tracking

Another application scenario of Kafka is to track users' browsing, searching, and other behaviors, which are recorded in the corresponding topic in a publish-subscribe mode in real time. After these results are obtained by the subscribers, they can be further processed in real time, or monitored in real time, or put into the Hadoop/ offline data warehouse for processing.

3. Meta-information monitoring

It is used as a monitoring module for operation records, that is, to collect and record some operation information, which can be understood as data monitoring in the nature of operation and maintenance.

4. Log collection

In terms of log collection, there are actually many open source products, including Scribe and Apache Flume. Many people use Kafka instead of log aggregation (log aggregation). Log aggregation generally collects log files from the server and puts them in a centralized location (file server or HDFS) for processing. However, Kafka ignores the details of the file and abstracts it more clearly into a message flow of logs or events. This makes Kafka processing less latency and easier to support multiple data sources and distributed data processing. Compared to log-centric systems such as Scribe or Flume, Kafka provides the same efficient performance and higher durability due to replication, as well as lower end-to-end latency.

5. Stream processing

This scene may be more numerous and easy to understand. Save the collected stream data to provide later docking Storm or other streaming computing framework for processing. Many users will periodically process, summarize, expand or otherwise convert the data from the original topic to the new topic before continuing the later processing. For example, the recommended processing flow of an article may be to grab the content of the article from the RSS data source and then throw it into a topic called "article". The subsequent operation may be to clean up the content, such as restoring normal data or deleting duplicate data, and finally return the content matching result to the user. This creates a series of real-time data processing processes in addition to a separate topic. Strom and Samza are well-known frameworks for implementing this type of data conversion.

6. Event source

An event source is an application design approach in which state transitions are recorded as a sequence of records sorted in chronological order. Kafka can store a large amount of log data, which makes it an excellent background for applications in this way. Such as dynamic summarization (News feed).

7. Persistent log (commit log)

Kafka can serve a distributed system with external persistence logs. This kind of log can back up data between nodes and provide a resynchronization mechanism for data recovery of failed nodes. The log compression feature in Kafka provides conditions for this use. In this usage, Kafka is similar to an Apache BookKeeper project.

The above is the kafka analysis and stand-alone usage records shared by the editor. If you happen to have similar doubts, you might as well refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report