In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
Get started with https://www.cnblogs.com/tree1123/p/11150927.html through Kafka
You can learn about the basic deployment and use of Kafka, but how is it different from other messaging middleware?
What are the basic principles, terminology, versions, etc., of Kafka? What on earth is Kafka?
A brief introduction to Kafka
Http://kafka.apache.org/intro
In 2011, LinkIn is open source, November 1, 2017 is released, July 30 is released, and 2018 is released
Refer to the picture on the official website:
Kafka ®is used to build real-time data pipelines and streaming applications. It has horizontal scalability, fault tolerance, high speed, and has been put into production in thousands of companies.
The latest definition of kafka official website: Apache Kafka ®is a distributed streaming platform
That is, distributed streaming platform.
Introduction:
There are three characteristics:
Publish and subscribe to streams of records, similar to a message queue or enterprise messaging system.Store streams of records in a fault-tolerant durable way.Process streams of records as they occur.
Message persistence flow processing
Two types of applications:
Building real-time streaming data pipelines that reliably get data between systems or applications
Building real-time streaming applications that transform or react to the streams of data
Real-time streaming data pipeline real-time streaming application
Several concepts
Kafka is run as a cluster on one or more servers that can span multiple datacenters.
The Kafka cluster stores streams of records in categories called topics.
Each record consists of a key, a value, and a timestamp
Cluster topic record
Four core api
The Producer API allows an application to publish a stream of records to one or more Kafka topics.The Consumer API allows an application to subscribe to one or more topics and process the stream of records produced to them.The Streams API allows an application to act as a stream processor, consuming an input stream from one or more topics and producing an output stream to one or more output topics, effectively transforming the input streams to output streams.The Connector API allows building and running reusable producers or consumers that connect Kafka topics to existing applications or data systems. For example, a connector to a relational database might capture every change to a table.
Producer API Consumer API Streams API Connector API
Client server supports multiple languages through tcp protocol
Topics and logs
A topic can have zero, one or more consumer subscriptions to the data written to it
For each topic, the Kafka cluster maintains a partition log
Each partition is an ordered, immutable sequence of records that is constantly attached to the structured commit log.
Each record in the partition is assigned a sequential ID number called an offset, which uniquely identifies each record in the partition.
The Kafka cluster persistently retains all published records-whether or not they have been consumed-using a configurable retention period. This time can be configured.
The performance of Kafka is actually constant in terms of data size, so storing data for a long time is not a problem.
The only metadata that each consumer retains is the offset or location of the consumer in the log.
This offset is controlled by the consumer: usually the consumer linearly increases the offset when reading the record, but in fact, because the consumer controls the location, it can consume the records in any order they like. For example, consumers can reset to older offsets to reprocess past data, or skip to recent records and start spending "now".
This makes it easy for consumers to use.
Producer:
Producers publish data to the topic of their choice.
For load balancing, multiple partitions can be selected.
Consumers:
Consumer group
Traditional message queuing publish and subscribe has its drawbacks.
Queues can be expanded but not multiple users, publish subscriptions each consumption to each consumer, can not be expanded.
But the kafka model solves these problems.
Kafka ensures that the consumer is the only reader of the partition and uses the data sequentially, which is still possible because there are many partitions
Balance the load of many consumer instances.
As a storage system
As a stream processing system
Second, common use
Http://kafka.apache.org/uses
Message
Kafka can replace the more traditional message broker. Message agents are used for a variety of reasons (separating processing from data generators, buffering unprocessed messages, and so on). Compared with most messaging systems, Kafka has better throughput, built-in partitioning, replication, and fault tolerance, which makes it an ideal solution for large-scale messaging applications.
In our experience, the use of messaging is usually relatively low, but may require low end-to-end latency, and usually depends on the strong durability guarantee provided by Kafka.
In this area, Kafka is comparable to traditional messaging systems such as ActiveMQ or RabbitMQ.
Website activity tracking
Site activities (page views, searches, or other actions that users may take) are published to a central topic, and each activity type contains a topic. Real-time processing, real-time monitoring and loading into Hadoop or offline data warehouse systems for offline processing and reporting.
Measurement
Kafka is commonly used for operational monitoring data.
Log aggregation
Many people use Kafka as an alternative to log aggregation solutions. Log aggregation typically collects physical log files from the server and puts them in a central location (possibly a file server or HDFS) for processing. Kafka abstracts the details of the file and abstracts the log or event data more clearly into a message flow.
Stream processing
Starting with 0.10.0.0, this is a lightweight but powerful stream processing library called Kafka Streams
III. Official documents-Core Mechanism
Http://kafka.apache.org/documentation/
Introduction to the use of Quick start has been learned
Ecology: there is some ecology of kafka, all kinds of Connector can connect directly to database, es, etc., and can also connect to other stream processing and various management tools.
Confluent specializes in the ecology of kafka
Https://cwiki.apache.org/confluence/display/KAFKA/Ecosystem
Kafka connect stream management
Several issues that kafka considers:
Throughput: page cache is used, not hard disk read and write
Message persistence: this still depends on his unique offset design.
Load balancing: partition replica mechanism
Due to the application of zero-copy technology client application epoll, kafka deployment on linux has higher performance.
Message: the message of kafka is made up of key value timestamp. Some information of compressed version number is defined in the header.
Crc version number attribute timestamp length key length key value length value
Use binary instead of java class
Topic and partition:
This is the core and most important mechanism of kafka, which distinguishes it from others.
Offset refers to the offset of a partition.
Topic partition offset, these three are the only ones that confirm a message.
The producer's offset is actually the latest offset.
The consumer's offset is maintained by himself, and he can choose to partition the beginning, the latest, and also remember where he is spending.
If the number of consumers is greater than the division, there will be consumers vacant. If the number of consumers is less than the division, there will be a balanced consumption.
Because kafka is designed to not allow concurrency on a partition, the number of consumer should not be greater than the number of partition, which is wasteful.
If consumer reads data from multiple partition and does not guarantee the ordering of data, kafka only guarantees that the data is ordered on one partition, but multiple partition will vary depending on the order in which you read it.
Adding or decreasing consumer,broker,partition will lead to rebalance, so the corresponding partition of consumer will change after rebalance.
Consumer groups are designed so that consumers in different groups can consume a partition message at the same time.
Replica
This is to prevent the server from hanging up.
There are two types of leader replica and follow replica
Only leader replica responds to the client.
Once the broker where the leader replica is located goes down, a new leader is selected.
Kafka guarantees that multiple replica of a partition will not be assigned to the same broker.
Follow synchronizes with leader in real time.
ISR
In-sync replica collection of replica that is synchronized with leader replica
Normally, all replica is in ISR, but if the response is too slow, ISR will be kicked out. And then catch up and add it in.
At least one replica in the ISR is alive.
All replica in the ISR receive the message, which is the submitted status.
For more blog posts about real-time computing, welcome to pay attention to real-time streaming computing
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.