Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to get started with Kafka

2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

This article is about how to achieve Kafka introduction to you, the editor feels very practical, so share with you to learn, I hope you can learn something after reading this article, say no more, follow the editor to have a look.

First, getting started 1. Brief introduction

Kafka is a distributed, partitioned, replicated commit log service . It provides features similar to JMS, but is completely different in design and implementation, and it is not an implementation of the JMS specification. Kafka classifies messages according to Topic when they are saved, the sender becomes Producer and the message receiver becomes Consumer. In addition, the kafka cluster is composed of multiple kafka instances, and each instance (server) becomes broker. Both kafka clusters, producer and consumer rely on zookeeper to ensure system availability that the cluster holds some meta information.

The picture below is more accurate.

Main features:

1) message persistence

If you want to get real value from big data, you can't lose any information. Apache Kafka is designed as a disk structure with time complexity O (1), which provides constant time performance, even if it stores huge amounts of information (TB level).

2) High throughput

Remember big data, Kafka is designed to work on standard hardware, supporting millions of messages per second.

3) distributed

Kafka explicitly supports message partitioning on the Kafka server and distribution consumption on the consumer machine cluster, maintaining the sorting semantics of each partition.

4) Multi-client support

Kafka systems support integration with clients from different platforms such as java, .NET, PHP, Ruby, or Python.

5) Real-time

Messages generated by the producer thread should be immediately visible to the consumer thread, which is critical for event-based systems such as CEP systems.

two。 Concept Topics/logs

A Topic can be thought of as a class of messages, each topic will be divided into multiple partition (regions), and each partition is an append log file at the storage level. Any message published to this partition is appended directly to the end of the log file. The location of each message in the file is called offset (offset). Offset is a long number, which is the only message marked. It only marks a message. Kafka does not provide any additional indexing mechanism to store offset, because there are few "random reads and writes" of messages in kafka.

The difference between kafka and JMS implementation (activeMQ) is that even if the message is consumed, the message is still not deleted immediately. The log file will be deleted after a certain period of time according to the configuration requirements in broker. For example, if the log file is retained for 2 days, the file will be deleted after 2 days, regardless of whether the messages in it are consumed or not. Kafka uses this simple means to free disk space and reduce disk IO expenses for changes to the file content after message consumption.

For consumer, it needs to save the offset of consuming messages, and consumer controls the preservation and use of offset. When consumer consumes messages normally, offset will be "linearly" driven forward, that is, messages will be consumed sequentially. In fact, consumer can consume messages in any order, it just needs to reset offset to any value. (offset will be saved in zookeeper, see below)

The kafka cluster hardly needs to maintain any consumer and producer state information, which is saved by zookeeper; so the client implementations of producer and consumer are very lightweight, and they can leave at will without additional impact on the cluster.

Partitions is designed for many purposes. The most fundamental reason is that kafka is based on file storage. Through partitioning, the log content can be distributed to multiple server to prevent the file size from reaching the upper limit of a stand-alone disk, and each partiton will be saved by the current server (kafka instance). A topic can be split into as many partitions as you want to save / consume messages. In addition, more partitions means more consumer can be accommodated, effectively improving the ability of concurrent consumption. See the following for the specific principle.

Distribution

Multiple partitions of a Topic are distributed on multiple server in a kafka cluster; each server (kafka instance) is responsible for reading and writing messages in partitions; in addition, kafka can also configure the number of backups required by partitions (replicas), and each partition will be backed up to multiple machines to improve availability.

Based on the replicated scheme, it means that multiple backups need to be scheduled; each partition has a server of "leader"; leader is responsible for all read and write operations, and if leader fails, other follower will take over (become the new leader); follower just monotonously follow up with leader and synchronize messages.. Thus it can be seen that server as a leader carries all the request pressure, so from the perspective of the cluster as a whole, how many partitions means how many "leader" there are. Kafka will evenly distribute the "leader" on each instance to ensure the stability of the overall performance.

Producers

Producer publishes the message to a specified Topic, and Producer can also decide which partition; to attribute the message to, such as based on "round-robin" or through other algorithms.

Consumers

In essence, kafka only supports Topic. Each consumer belongs to one consumer group;. Conversely, there can be multiple consumer in each group. Messages sent to Topic will only be subscribed to one consumer consumption in each group of this Topic.

If all consumer have the same group, this is similar to the queue pattern; messages will be load balanced between consumers.

If all consumer have different group, this is publish-subscribe; the message will be broadcast to all consumers.

In kafka, messages in a partition are consumed by only one consumer in group; consumer message consumption in each group is independent of each other; we can think of a group as a "subscriber", and each partions in a Topic is consumed by only one consumer in a "subscriber", but a consumer can consume messages in multiple partitions. Kafka can only guarantee that messages in a partition are consumed by a consumer, the messages are sequential. In fact, from a Topic point of view, the message is still not orderly.

The design principle of kafka determines that for a topic, there cannot be more than the number of partitions in the same group to consume at the same time, otherwise it will mean that some consumer will not get the message.

Guarantees

1) messages sent to partitions will be appended to the log in the order in which they are received

2) for consumers, the order in which they consume messages is the same as that in the log.

3) if the "replication factor" of Topic is N, then one kafka instance is allowed to fail.

3. Applicable scenario 1, Messaging

For some conventional messaging systems, kafka is a good choice, while partitons/replication and fault tolerance can make kafka have good scalability and performance advantages. However, up to now, we should be well aware that kafka does not provide enterprise-level features such as "transactional", "message transmission guarantee (message confirmation mechanism)" and "message grouping" in JMS; kafka can only use the message system as a "routine". To a certain extent, it has not ensured the absolute reliability of message sending and receiving (for example, message retransmission, message transmission loss, etc.)

2 、 Websit activity tracking

Kafka can be used as the best tool for website activity tracking; information such as web pages / user actions can be sent to kafka. And real-time monitoring, or offline statistical analysis.

3 、 Metrics

Kafka is usually used for operational monitoring data. This includes aggregate statistics from distributed applications used to produce centralized operational data feeds.

4 、 Log Aggregation

The characteristics of kafka determine that it is very suitable to be a "log collection center"; application can "batch" and "asynchronously" send operation logs to the kafka cluster instead of saving them locally or DB; kafka can batch submit messages / compress messages, etc., which is almost inexpensive to the producer side. At this time, the server side can make hadoop and other systematic storage and analysis systems.

4. Command

1. Start Server

Kafka relies on ZK services

Nohup bin/kafka-server-start.sh config/server.properties &

two。 Create Topic

Bin/kafka-topics.sh-create-zookeeper localhost:2181-replication-factor 1-partitions 1-topic page_visits

3. View command

Bin/kafka-topics.sh-list-zookeeper localhost:2181

4. Send a message

Bin/kafka-console-producer.sh-broker-list localhost:9092-topic page_visits

5. Consumption message

Bin/kafka-console-consumer.sh-zookeeper localhost:2181-topic page_visits-from-beginning

6. Multi-Broker mode

Bin/kafka-server-start.sh config/server-1.properties &

Bin/kafka-server-start.sh config/server-2.properties &

Bin/kafka-topics.sh-create-zookeeper localhost:2181-replication-factor 3-partitions 1-topic visits

Bin/kafka-topics.sh-describe-zookeeper localhost:2181-topic visits

Bin/kafka-console-producer.sh-broker-list localhost:9092-topic visits

My message test1

My message test2

Bin/kafka-console-consumer.sh-zookeeper localhost:2181-from-beginning-topic visits

7. Out of Service

Pkill-9-f config/server.properties

8. Delete useless topic

Bin/kafka-run-class.sh kafka.admin.DeleteTopicCommand-topic visits-zookeeper sjxt-hd02:2181,sjxt-hd03:2181,sjxt-hd04:2181

Beta in 0.8.1

Bin/kafka-topics.sh-- zookeeper zk_host:port-- delete-- topic my_topic_name is the introduction to how to realize Kafka. The editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report