The configuration method of Kafka in CDP 07/02 Update SLTechnology News&Howtos

The configuration method of Kafka in CDP

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

This article mainly explains "the configuration method of Kafka in CDP". The content of the explanation is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "the configuration method of Kafka in CDP".

Apache Kafka is a high performance, high availability, redundant streaming message platform.

Kafka functions much like a publish / subscribe messaging system, but with higher throughput, built-in partitioning, replication, and fault tolerance. Kafka is a good solution for large-scale message processing applications. It is usually used with Apache Hadoop and Spark Streaming.

Operating system requirements

A collection of operating system requirements for Kafka.

SUSE Linux Enterprise Server (SLES)

Unlike CentOS, SLES limits virtual memory by default. Change this default requirement to add the following entry to the / etc/security/limits.conf file:

* hard as unlimited* soft as unlimited kernel restrictions

You must correctly configure the following three settings for the kernel.

File descriptor (File Descriptors)

You can set the file descriptor in Cloudera Manager by going to Kafka > Configuration > Maximum Process File Descriptors and setting the desired value. Cloudera recommends that you configure with a value of 100000 or higher.

Maximum memory Mapping (Max Memory Map)

You must configure the maximum number of memory maps in specific kernel settings. Cloudera recommends configuring version 32000 or later.

Maximum socket buffer size (Max Socket Buffer Size)

Set the buffer size to be larger than any Kafka send buffer you define.

Performance consideration

A collection of basic recommendations about Kafka clustering.

The simplest recommendation for running Kafka for best performance is to use a dedicated host for the Kafka agent and a dedicated ZooKeeper cluster for the Kafka cluster. If this is not an option, consider the following additional guidelines for sharing resources with the Kafka cluster:

Running in a virtual machine

In modern data centers, it is common to run processes in a virtual machine. In general, this allows for better sharing of resources. Kafka is sensitive enough to I / O throughput that VM interferes with the normal operation of the agent. Therefore, it is generally not recommended to run Kafka in VM. However, if you are running Kafka in a virtual environment, you need to rely on VM vendors to help you optimize Kafka performance.

Do not use Brokers or ZooKeeper to run other processes

Because of I / O contention with other processes, it is generally recommended to avoid running other such processes on the same host as the Kafka agent.

Keep the Kafka-ZooKeeper connection stable

Kafka relies heavily on stable ZooKeeper connections. Placing an unreliable network between Kafka and ZooKeeper will appear as ZooKeeper offline to Kafka. Examples of unreliable networks include:

Do not put Kafka / ZooKeeper nodes on different networks

Do not place Kafka / ZooKeeper nodes on the same network as other high network loads

Quota

Learn about quotas and how to set them.

Kafka can enforce quotas on production and get requests. Producers and consumers have access to a large amount of data. This monopolizes agent resources, causes the network to become saturated, and usually refuses to provide services to other clients and agents themselves. Quotas prevent these problems and are important for large multi-tenant clusters where a small number of clients using a small amount of data may reduce the user experience.

Quotas are byte rate thresholds defined by client ID. The client ID logically identifies the application that made the request. A client ID can span multiple producer and consumer instances. This quota applies to all instances as a single entity. For example, if the production quota for the client ID is 10 MB / s, the quota is shared among all instances with the same ID.

Quotas can enforce API restrictions when Kafka is run as a service. By default, each unique client ID receives quotas with a fixed number of bytes per second, such as through cluster configuration (quota.producer.default,quota.consumer.default). This quota is defined on a per agent basis. Before each client is restricted, each agent can publish or get up to X bytes per second.

When a client exceeds its quota, the agent does not return an error, but attempts to slow down the client. The agent calculates the amount of delay required for the client to reach its quota and delays the response by that amount of time. This approach makes quota violations transparent to clients (outside of client metrics). This also avoids the need for the client to implement special Backoff and retry behavior.

You can override the default quota for client ID that requires higher or lower quotas. This mechanism is similar to log configuration overrides by topic. Write your client ID overwrite to ZooKeeper's / config/clients. All agents will read overrides and these overrides will take effect immediately. You can change the quota without having to scroll and restart the entire cluster.

By default, each client ID receives an unrestricted quota. The following configuration sets the default quota for each producer and consumer client ID to 10 MB / s.

Quota.producer.default=10485760quota.consumer.default=10485760

To set quotas using Cloudera Manager, open the Kafka Configuration page and search for Quota. Use the fields provided to set the default user quota or default producer quota.

JBOD

JBOD refers to a system configuration in which disks are used independently rather than organized into redundant arrays (RAID). Even if a single disk is unreliable, using RAID usually results in a more reliable hard disk configuration. Such RAID settings are common in large-scale big data environments built on commercial hardware. RAID-enabled configurations are more expensive and more complex to set up. In many environments, JBOD configuration is preferred for the following reasons:

Reduce storage costs: RAID-10 is recommended to prevent disk failures. However, extending the RAID-10 configuration can become very expensive. Redundant storage data on each node means that storage space requirements must be multiplied because the data is also replicated between nodes.

Improved performance: like HDFS, the slowest disk in the RAID-10 configuration limits the overall throughput. Writing needs to be done through the RAID controller. On the other hand, when using JBOD, IO performance is improved because writes are isolated across disks without a controller.

Set user limits for Kafka

Learn about the limitations of Kafka users and how to monitor them.

Kafka can open many files at the same time. For most Unix-like systems, the default setting of 1024 for the maximum number of open files is not enough. Any heavy load can cause failures and error messages, such as java.io.IOException... (too many files open) are recorded in Kafka or HDFS log files. You may also notice the following errors:

ERROR Error in acceptor (kafka.network.Acceptor)

Java.io.IOException: Too many open files

Cloudera recommends that you set this value to a higher starting point, for example, 32768.

You can monitor the number of file descriptors in use on the Kafka Broker dashboard. In Cloudera Manager:

Go to the Kafka service.

Select a Kafka proxy.

Open Charts Library > Process Resources and scroll down to the File Descriptors diagram.

Thank you for reading, the above is the content of "the configuration method of Kafka in CDP". After the study of this article, I believe you have a deeper understanding of the configuration method of Kafka in CDP, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.