Ubuntu16.04 installs Kafka cluster 10/31 Update SLTechnology News&Howtos

Ubuntu16.04 installs Kafka cluster

2025-10-31 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

download

Http://kafka.apache.org/downloads.html

Http://mirror.bit.edu.cn/apache/kafka/0.11.0.0/kafka_2.11-0.11.0.0.tgz

Root@master:/usr/local/kafka_2.11-0.11.0.0/config# vim server.properties

Each node of broker.id=2 is different.

Log.retention.hours=168

Message.max.byte=5242880

Default.replication.factor=2

Replica.fetch.max.bytes=5242880

Zookeeper.connect=master:2181,slave1:2181,slave2:2181

Copy to other nodes

Note that the / kafka node must be created in zk beforehand, otherwise an error will be reported: java.lang.IllegalArgumentException: Path length must be > 0

Root@master:/usr/local/zookeeper-3.4.9# bin/zkCli.sh-server master

[zk: master (CONNECTED) 7] create / kafka''

Created / kafka

[zk: master (CONNECTED) 8] ls /

[cluster, controller, controller_epoch, brokers, zookeeper, kafka, admin, isr_change_notification, consumers, latest_producer_id_block, config]

[zk: master (CONNECTED) 9] ls / kafka

[]

Start kafka in daemon mode

Root@master:/usr/local/kafka_2.11-0.11.0.The nohup bin/kafka-server-start.sh config/server.properties &

Create a topic:

Root@slave2:/usr/local/kafka_2.11-0.11.0.The bin/kafka-topics.sh-- create-- zookeeper master:2181-- replication-factor 1-- partitions 1-- topic test

Created topic "test".

List all topic:

Root@slave2:/usr/local/kafka_2.11-0.11.0. Bin/kafka-topics.sh-- list-- zookeeper master:2181

Test

Send a message

Root@master:/usr/local/kafka_2.11-0.11.0. Bin/kafka-console-producer.sh-- broker-list master:9092-- topic test

> this is a message

> this is ant ^ H message

Consumption message

Root@slave1:/usr/local/kafka_2.11-0.11.0. Bin/kafka-console-consumer.sh-- zookeeper master:2181-- topic test-- from-beginning

Using the ConsoleConsumer with old consumer is deprecated and will be removed in a future major release. Consider using the new consumer by passing [bootstrap-server] instead of [zookeeper].

This is a message

This is an message

View cluster status information

Root@slave1:/usr/local/kafka_2.11-0.11.0. Bin/kafka-topics.sh-- describe-- zookeeper slave1:2181-- topic my-replicated-topic

Topic:my-replicated-topic PartitionCount:1 ReplicationFactor:3 Configs:

Topic: my-replicated-topic Partition: 0 Leader: 3 Replicas: 1,3,2 Isr: 3,2

Install kafka-manager

Root@master:/usr/local/kafka_2.11-0.11.0. Git clone https://github.com/yahoo/kafka-manager

Root@master:/usr/local/kafka_2.11-0.11.0.0/kafka-manager# cd kafka-manager/

Root@master:/usr/local/kafka_2.11-0.11.0.0/kafka-manager#. / sbt clean dist

[success] Total time: 3453 s, completed Aug 7, 2017 8:48:15 PM

Packaged files exist

Root@master:/usr/local/kafka_2.11-0.11.0.0/kafka-manager/target/universal# ls

Kafka-manager-1.3.3.12.zip tmp

Modify kafka-manager configuration file

Root@master:/usr/local/kafka_2.11-0.11.0.0/kafka-manager-1.3.3.12# vim conf/application.conf

Kafka-manager.zkhosts= "192.168.117.243 purl 2181192.168.117.45 purl 2181192.168.117.242"

Start kafka-manager

Root@master:/usr/local/kafka_2.11-0.11.0.0/kafka-manager-1.3.3.12# bin/kafka-manager-Dconfig.file=conf/application.conf

Recommended startup method:

Root@master:/usr/local/kafka_2.11-0.11.0.0/kafka-manager-1.3.3.12# nohup bin/kafka-manager-Dconfig.file=conf/application.conf-Dhttp.port=7778 &

Http://192.168.117.243:7778/

Root@master:/usr/local/kafka_2.11-0.11.0.0/kafka-manager-1.3.3.12# netstat-antlup | grep 7778

Tcp6 00: 7778: * LISTEN 100620/java

Root@master:/usr/local/kafka_2.11-0.11.0.0/kafka-manager-1.3.3.12# bin/kafka-manager-Dconfig.file=conf/application.conf

This application is already running (Or delete / usr/local/kafka_2.11-0.11.0.0/kafka-manager-1.3.3.12/RUNNING_PID file).

Stop kafka-manager

Root@master:/usr/local/kafka_2.11-0.11.0.0/kafka-manager-1.3.3.12# rm RUNNING_PID

Root@master:/usr/local/kafka_2.11-0.11.0. Cd kafka-manager-1.0-SNAPSHOT/

Production server configuration

# Replication configurations

Num.replica.fetchers=4

Replica.fetch.max.bytes=1048576

Replica.fetch.wait.max.ms=500

Replica.high.watermark.checkpoint.interval.ms=5000

Replica.socket.timeout.ms=30000

Replica.socket.receive.buffer.bytes=65536

Replica.lag.time.max.ms=10000

Replica.lag.max.messages=4000

Controller.socket.timeout.ms=30000

Controller.message.queue.size=10

# Log configuration

Num.partitions=8

Message.max.bytes=1000000

Auto.create.topics.enable=true

Log.index.interval.bytes=4096

Log.index.size.max.bytes=10485760

Log.retention.hours=168

Log.flush.interval.ms=10000

Log.flush.interval.messages=20000

Log.flush.scheduler.interval.ms=2000

Log.roll.hours=168

Log.retention.check.interval.ms=300000

Log.segment.bytes=1073741824

# ZK configuration

Zookeeper.connection.timeout.ms=6000

Zookeeper.sync.time.ms=2000

# Socket server configuration

Num.io.threads=8

Num.network.threads=8

Socket.request.max.bytes=104857600

Socket.receive.buffer.bytes=1048576

Socket.send.buffer.bytes=1048576

Queued.max.requests=16

Fetch.purgatory.purge.interval.requests=100

Producer.purgatory.purge.interval.requests=100

Kafka is a high-throughput distributed publish and subscribe message queuing system, originally developed from LinkedIn and used as the basis of LinkedIn's activity flow (ActivityStream) and operational data processing pipeline (Pipeline). It is now used by many different types of companies as multiple types of data pipelines and messaging systems.

1 introduction to Kafka message queuing 1.1 basic terminology

Broker

A Kafka cluster contains one or more servers, which are called broker [5]

Topic

Every message posted to the Kafka cluster has a category, which is called Topic. (physically, messages with different Topic are stored separately. Logically, messages from a Topic are stored on one or more broker, but users only need to specify the Topic of the message to produce or consume data regardless of where the data is stored.)

Partition

Partition is a physical concept, and each Topic contains one or more Partition. (generally, it is the total number of cores of kafka nodes cpu)

Producer

Responsible for releasing messages to Kafka broker

Consumer

Message consumer, the client that reads the message to Kafka broker.

Consumer Group

Each Consumer belongs to a specific Consumer Group (you can specify a group name for each Consumer, or the default group if you do not specify group name).

1.2 message queuing 1.2.1 basic features

Scalable

Expand capacity without going offline

Data flow partitions (partition) are stored on multiple machines

High performance

A single broker can serve thousands of clients.

A single broker can read / write hundreds of megabytes per second

The cluster composed of multiple brokers will achieve very strong throughput.

Stable performance, no matter how large the data

Kafka abandoned the Java heap cache mechanism at the bottom, adopted the page cache at the operating system level, and changed the random write operation to sequential write, which combined with the characteristics of Zero-Copy greatly improved the performance of IO.

Persistent storage

Store on disk

Redundant backup to other servers to prevent loss

1.2.2 message format

A topic corresponds to a message format, so messages are classified with topic

A message represented by a topic consists of one or more patition (s)

In a partition

A partition should be stored on one or more server

A server is leader

Other servers is followers

Leader needs to accept read and write requests

Followers only makes redundant backups

If leader fails, a follower will be automatically selected as leader to ensure that the service will not be interrupted.

Each server may play the role of leader of some partitions and follower of other partitions, so that the whole cluster can achieve the effect of load balancing.

If there is only one server, there is no redundant backup, it is a stand-alone machine rather than a cluster

If there are multiple server

Messages are stored in order. Messages can only be appended. Each message cannot be inserted with an offset, which is used as a message ID. The only offset in a partition is saved and managed by consumer, so the reading order is actually completely determined by consumer. It is not necessarily a linear message that has a timeout date. If it expires, delete 1.2.3 producer producer.

Producer writes messages to kafka

Write to specify topic and partition

How messages are assigned to different partition, the algorithm is specified by producer

1.2.4 Consumer consumer

Consumer reads messages and processes them

Consumer group

Can be processed concurrently according to the number of partition

Each partition is read by only one consumer, which ensures that the order in which messages are processed is in the order in which partition is stored. Note that this order is affected by the algorithm in which producer stores messages.

This concept was introduced to support two scenarios: one consumer per message, and each message broadcast to all consumers.

Multiple consumer group subscribes to a topic, and messages from that topci are broadcast to all consumer group

After a message is sent to a consumer group, it can only be received and used by one consumer of that group.

Each consumer in a group corresponds to a partition that brings the following benefits

A Consumer can have multiple threads to consume, and the number of threads should not be more than the number of partition of the topic, because for a consumer group containing one or more consuming threads, a partition can only be allocated to one of the consuming threads, and as many threads as possible can be allocated to the partition (although in fact, the actual number of threads to consume is still determined by the thread pool scheduling mechanism). In this way, if the number of threads is greater than the number of partition, then there will be more threads in the single shot allocation, and they will not consume any partition data and idle resources.

If consumer reads data from multiple partition and does not guarantee the ordering of data, kafka only guarantees that the data is ordered on one partition, but multiple partition will vary according to the order in which you read it.

Adding or decreasing consumer,broker,partition will lead to rebalance, so the corresponding partition of consumer will change after rebalance.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.