Is it appropriate to run Kafka on Kubernetes 07/16 Update SLTechnology News&Howtos

Is it appropriate to run Kafka on Kubernetes

2025-07-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article mainly explains "is it appropriate to run Kafka on Kubernetes"? the content in the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "is it appropriate to run Kafka on Kubernetes"?

Introduction

Kubernetes was originally designed to run stateless workloads. These workloads, which usually use a micro-service architecture, are lightweight, horizontally scalable, follow 12-element applications, and can handle ring breakers and random Monkey tests.

On the other hand, Kafka is essentially a distributed database. This means that you have to deal with state, which is more heavyweight than microservices. Kubernetes supports stateful workloads, but you have to be careful with it, as Kelsey Hightower pointed out in two recent tweets:

Should you run Kafka on Kubernetes now? My rhetorical question is: would Kafka run better without it? This is why I want to point out the complementarity between Kafka and Kubernetes and the pitfalls you may encounter.

Run time

Let's first look at the basics-- the runtime itself.

Process

Kafka brokers is friendly to CPU. TLS may introduce some overhead. If the Kafka client uses encryption, more CPU is required, but this does not affect brokers.

Memory

Kafka brokers is a major memory consumer. The JVM heap can usually be limited to 4-5 GB, but because Kafka uses a lot of page caching, it also needs enough system memory. In Kubernetes, container resource limits and requests can be set accordingly.

Storage

The storage in the container is short-lived-data will be lost after reboot. You can use emptyDir volumes for Kafka data, which will have the same effect: brokers data will be lost after downtime. Your message can be used as a copy on other broker. Therefore, after a reboot, the failed broker must copy all the data, which can be a time-consuming process.

This is why you should use persistent storage. Non-local persistent block storage using XFS or ext4 is more appropriate. I warn you: don't use NFS. Neither NFS v3 nor v4 will work. In short, Kafka broker cannot delete the data directory and terminates itself because of NFS's "stupid renaming" problem. If you still don't believe me, please read this blog post carefully. The storage must be non-local so that Kubernetes can have more flexibility in selecting another node when restarting or relocating.

The network

Like most distributed systems, Kafka performance depends largely on low network latency and high bandwidth. Do not try to put all agents on the same node, as this will reduce availability. If the Kubernetes node fails, the entire Kafka cluster fails. Do not extend Kafka clusters across data centers. The same applies to Kubernetes clusters. Different availability areas are a good trade-off.

Configuration

List

The Kubernetes website contains a very good tutorial on how to set up ZooKeeper using manifests. Because ZooKeeper is part of Kafka, you can learn which Kubernetes concepts are applied here. Once understood, you can also use the same concept for Kafka clusters.

Pod:Pod is the smallest deployable unit in Kubernetes. It contains your workload, which represents a process in the cluster. A Pod contains one or more containers. Each ZooKeeper server in the whole and each Kafka broker in the Kafka cluster will run in a separate Pod.

StatefulSet:StatefulSet is a Kubernetes object that handles multiple stateful workloads that need to be coordinated. StatefulSets guarantees the orderliness and sex of Pod.

Headless Services: the service separates the Pod from the client by logical name. Kubernetes is responsible for load balancing. However, for stateful workloads such as ZooKeeper and Kafka, the client must communicate with a specific instance. This is where Headless Services comes in: as a client, you can still get a logical name, but you don't have to access Pod directly.

Persistent volumes: as mentioned above, non-local persistent block storage needs to be configured.

Yolean provides a comprehensive list to help you get started with Kafka on Kubernetes.

Helm Charts

Helm is the package manager for Kubernetes, similar to the OS package manager for yum,apt,Homebrew or Chocolatey. It allows you to install the predefined packages described in Helm Charts. A well-designed Helm Charts simplifies the complex task of correctly configuring all parameters to run Kafka on Kubernetes. There are several charts available for Kafka to choose from: one is the official chart in the evolving state, one is from Confluent, the other is from Bitnami, to name just a few.

Operators

Due to some limitations of Helm, another tool has become very popular: Kubernetes Operators. Operators can not only package software for Kubernetes, but also deploy and manage a software for Kubernetes.

The highly rated Operators list mentions that there are two Kafka, one of which is that Strimzi,Strimzi makes it very easy to start a Kafka cluster in a few minutes, requiring almost no configuration, and it adds some beautiful features, such as intercluster point-to-point TLS encryption. Confluent also announced the upcoming launch of a new Operator.

Performance

It is important to run performance tests to benchmark the Kafka installation. It will provide you with information about possible bottlenecks before you get into trouble. Fortunately, Kafka already provides two performance testing tools: kafka-producer-perf-test.sh and kafka-consumer-perf-test.sh. Remember to use them often. For reference, you can use Jay Kreps's blog results or St é phane Maarek's comments on Amazon MSK.

Operation and maintenance

Monitor and control

Visibility is very important, otherwise you won't know what's going on. Today, there is a good tool for monitoring metrics native to the cloud. Prometheus and Grafana are two popular tools. Prometheus can collect metrics for all Java processes (Kafka,ZooKeeper,Kafka Connect) directly from the JMX exporter. Add cAdvisor metrics to provide additional information about Kubernetes resource usage.

Strimzi provides an elegant Grafana dashboard example for Kafka. It visualizes key metrics such as unreplicated and offline partitions in a very intuitive way. It complements these indicators through resource use and performance as well as stability indicators. Therefore, you can get basic Kafka cluster monitoring for free!

Source: https://strimzi.io/docs/master/#kafka_dashboard

This task can be accomplished through client monitoring (consumer and producer metrics), Burrow lag monitoring, and end-to-end monitoring using Kafka Monitor

Log record

Logging is another key part. Ensure that all containers in the Kafka installation are logged to standard output (stdout) and standard error output (stderr), and that the Kubernetes cluster aggregates all logs into a central logging facility such as Elasticsearch.

Health examination

Kubernetes uses activity and readiness detectors to determine whether Pod is healthy. If the activity probe fails, Kubernetes will terminate the container and restart automatically when the restart policy is set accordingly. If the readiness probe fails, Kubernetes removes the Pod from the service request through the service. This means that human intervention is no longer needed in this case, which is a big advantage.

Scrolling update

StatefulSets supports automatic updates: the rolling update policy will update one Kafka Pod at a time. In this way, zero downtime can be achieved, which is another big advantage brought by Kubernetes.

Expansion

Expanding the Kafka cluster is not easy. However, Kubernetes can easily scale Pod to a certain number of copies, which means that you can declaratively define the number of Kafka brokers you want. The difficult part is to redistribute the parts before zooming in or out. Again, Kubernetes can help you with this task.

Administration and Management

By opening shell in Pod, you can use existing shell scripts to perform administrative tasks of the Kafka cluster, such as creating themes and reassigning partitions. This is not a very good solution. Strimzi supports another Operator management topic. There is still room for improvement.

Backup and restore

Now the availability of Kafka also depends on the availability of Kubernetes. If the Kubernetes cluster fails, the Kafka cluster will also fail in the worst case. Murphy's law tells us that this will happen to you, too, and you will lose data. To reduce this risk, make sure you have backup ideas. MirrorMaker is an option, and another possibility is to use S3 for connection backup, as described in Zalando's blog post.

Thank you for your reading, the above is the content of "is it appropriate to run Kafka on Kubernetes". After the study of this article, I believe you have a deeper understanding of whether it is appropriate to run Kafka on Kubernetes, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.