In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
In this issue, the editor will bring you about kafka analysis and stand-alone usage records. The article is rich in content and analyzed and described from a professional point of view. I hope you can get something after reading this article.
1. The system environment used
Root@heidsoft:~# uname-a
Linux heidsoft 4.4.0-63-generic # 84-Ubuntu SMP Wed Feb 1 17:20:32 UTC 2017 x86 "64 GNU/Linux
2.JDK environment
Root@heidsoft:~# java-version
Java version "1.8.0,131"
Java (TM) SE Runtime Environment (build 1.8.0_131-b11)
Java HotSpot (TM) 64-Bit Server VM (build 25.131-b11, mixed mode)
3. Software version environment
4. Profile environment
Kafka: server.properties default configuration
Zookeeper: zoo.cfg default configuration
5. Start the application
Zookeeper: sh zkServer.sh start
Kafka: bin/kafka-server-start.sh config/server.properties &
6.kafka test
Producer testing-sending messages
Echo "Hello, World" | bin/kafka-console-producer.sh-- broker-list localhost:9092-- topic TutorialTopic > / dev/null
Consumer testing-receiving messages
Bin/kafka-console-consumer.sh-new-consumer-topic TutorialTopic-from-beginning-bootstrap-server localhost:9092
7.ShowDemo
8. Concept cognition
Broker
A Kafka cluster contains one or more servers, which are called broker
Topic
Every message posted to the Kafka cluster has a category, which is called Topic. (physically, messages with different Topic are stored separately. Logically, messages from a Topic are stored on one or more broker, but users only need to specify the Topic of the message to produce or consume data regardless of where the data is stored.)
Partition
Parition is a physical concept. Each Topic contains one or more Partition.
Producer
Responsible for releasing messages to Kafka broker
Consumer
Message consumer, the client that reads the message to Kafka broker.
Consumer Group
Each Consumer belongs to a specific Consumer Group (you can specify a group name for each Consumer, or the default group if you do not specify group name).
9. Frame cognition
Kafka is a distributed publish-subscribe messaging system. It was originally developed by LinkedIn and later became part of the Apache project. Kafka is a distributed, divisible, redundant backup persistent logging service. It is mainly used to deal with active streaming data.
In the big data system, we often encounter a problem, the whole big data is composed of various subsystems, and the data needs to flow continuously with high performance and low delay in each subsystem. The traditional enterprise message system is not very suitable for large-scale data processing. Kafka has emerged in order to handle both online applications (messages) and offline applications (data files, logs). Kafka can serve two purposes:
Reduce the complexity of system networking.
To reduce the programming complexity, each subsystem is no longer to negotiate the interface with each other, each subsystem is like a socket plugged into the socket, and Kafka undertakes the role of high-speed data bus.
10. Frame characteristics
Provide high throughput for both publish and subscribe. It is understood that Kafka can produce about 250000 messages per second (50 MB) and process 550000 messages per second (110 MB).
Persistence operations can be carried out. Persist messages to disk, so they can be used for bulk consumption, such as ETL, and real-time applications. Prevent data loss by persisting data to the hard disk and replication.
Distributed system, easy to expand outward. All producer, broker, and consumer will have multiple, all distributed. The machine can be expanded without downtime.
The state in which the message is processed is maintained on the client side, not on the server side. It can balance automatically when it fails.
Scenarios that support online and offline.
12.Kafka topology
As shown in the figure above, a typical Kafka cluster contains several Producer (which can be Page View generated by the front end of web, or server logs, system CPU, Memory, etc.), several broker (Kafka supports horizontal scaling, generally, the higher the number of broker, the higher the cluster throughput), several Consumer Group, and a Zookeeper cluster. Kafka manages the cluster configuration through Zookeeper, elects leader, and rebalance when the Consumer Group changes. Producer publishes messages to broker,Consumer using the push pattern subscribes and consumes messages from broker using the pull pattern.
The design of 13.Kafka
1. Throughput
High throughput is one of the core goals that kafka needs to achieve. For this reason, kafka has made the following designs:
Data disk persistence: messages are not cache in memory and are directly written to disk, making full use of the sequential read and write performance of disk
Zero-copy: reduce the IO procedure
Batch sending of data
Data compression
Topic is divided into multiple partition to improve parallelism
Load balancing
Producer sends the message to the specified partition according to the algorithm specified by the user
There are multiple partiiton, each partition has its own replica, and each replica is distributed on different Broker nodes
Multiple partition needs to select lead partition,lead partition for reading and writing, and zookeeper is responsible for fail over
Manage the dynamic joining and leaving of broker and consumer through zookeeper
Pull system
Because kafka broker persists data and broker has no memory pressure, consumer is very suitable for pull consumption of data, with the following benefits:
Simplify kafka design
Consumer independently controls the pulling speed of messages according to consumption capacity.
Consumer chooses its own consumption mode according to its own conditions, such as bulk consumption, repeated consumption, consumption from the end, etc.
Expandability
When there is a need to add broker nodes, the new broker will register with zookeeper, and producer and consumer will perceive these changes according to the watcher registered on zookeeper and make timely adjustments.
Application scenarios of Kafka
1. Message queue
Compared with most messaging systems, Kafka has better throughput, built-in partitioning, redundancy and fault tolerance, which makes Kafka a good solution for large-scale message processing applications. Messaging systems generally have relatively low throughput but require less end-to-end latency and rely on the powerful persistence guarantees provided by Kafka. In this area, Kafka is comparable to traditional messaging systems, such as ActiveMR or RabbitMQ.
two。 Behavior tracking
Another application scenario of Kafka is to track users' browsing, searching, and other behaviors, which are recorded in the corresponding topic in a publish-subscribe mode in real time. After these results are obtained by the subscribers, they can be further processed in real time, or monitored in real time, or put into the Hadoop/ offline data warehouse for processing.
3. Meta-information monitoring
It is used as a monitoring module for operation records, that is, to collect and record some operation information, which can be understood as data monitoring in the nature of operation and maintenance.
4. Log collection
In terms of log collection, there are actually many open source products, including Scribe and Apache Flume. Many people use Kafka instead of log aggregation (log aggregation). Log aggregation generally collects log files from the server and puts them in a centralized location (file server or HDFS) for processing. However, Kafka ignores the details of the file and abstracts it more clearly into a message flow of logs or events. This makes Kafka processing less latency and easier to support multiple data sources and distributed data processing. Compared to log-centric systems such as Scribe or Flume, Kafka provides the same efficient performance and higher durability due to replication, as well as lower end-to-end latency.
5. Stream processing
This scene may be more numerous and easy to understand. Save the collected stream data to provide later docking Storm or other streaming computing framework for processing. Many users will periodically process, summarize, expand or otherwise convert the data from the original topic to the new topic before continuing the later processing. For example, the recommended processing flow of an article may be to grab the content of the article from the RSS data source and then throw it into a topic called "article". The subsequent operation may be to clean up the content, such as restoring normal data or deleting duplicate data, and finally return the content matching result to the user. This creates a series of real-time data processing processes in addition to a separate topic. Strom and Samza are well-known frameworks for implementing this type of data conversion.
6. Event source
An event source is an application design approach in which state transitions are recorded as a sequence of records sorted in chronological order. Kafka can store a large amount of log data, which makes it an excellent background for applications in this way. Such as dynamic summarization (News feed).
7. Persistent log (commit log)
Kafka can serve a distributed system with external persistence logs. This kind of log can back up data between nodes and provide a resynchronization mechanism for data recovery of failed nodes. The log compression feature in Kafka provides conditions for this use. In this usage, Kafka is similar to an Apache BookKeeper project.
The above is the kafka analysis and stand-alone usage records shared by the editor. If you happen to have similar doubts, you might as well refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.