How does Kafka handle 2 trillion messages a day from Netflix 10/18 Update SLTechnology News&Howtos

How does Kafka handle 2 trillion messages a day from Netflix

2025-10-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

Kafka is how to deal with Netflix 2 trillion messages a day, I believe many inexperienced people can do nothing about it, for this reason this article summarizes the causes and solutions of the problem, through this article I hope you can solve this problem.

From the beginning, microservices needed to communicate with each other in different ways.

Some people prefer HTTP REST APIs, but they may run into their own queue problems; others prefer older message queues such as RabbitMQ, but they have to consider scaling and operational issues.

Therefore, Kafka as the core architecture was born, which aims to solve the above two aspects.

We'll discuss how Apache Kafka improves on the HTTP REST API and message queue architecture used in microservices in the past, and how it extends its service capabilities further.

The story of the two camps

The first camp refers to communications that are handled directly by invoking other services such as HTTP REST APIs or Remote Procedure Calls (RPC).

The second camp borrows the concept of an Enterprise Service Bus from Service-Oriented Architecture (SOA) and uses a message queue (such as RabbitMQ) responsible for communicating with other services as a message broker to implement various operations.

While this method relieves communications of the burden of direct service-by-service "communication," it adds an extra "hop" cost to the network.

Microservices using HTTP REST APIs

HTTP REST APIs are a popular way to RPC between services. Its main benefits are simplified initialization setup and increased relative efficiency in sending messages.

However, this pattern requires its implementer to consider issues such as queues and how to deal with the number of incoming requests exceeding the capacity of the node.

For example, suppose you have a long chain of services in which one preceding exceeds the processing capacity of the node.

Then we need to apply the same type of back pressure handling to all preceding services in the service chain to cope with this problem.

In addition, this pattern requires high availability for all individual HTTP REST API services. And in those long pipelines of microservices, no microservice can afford to lose all its components.

Thus, this communication can still function as long as at least one process in a given group is still functioning properly.

Of course, we usually need to configure a Load Balancer module on the front end of these microservices. At the same time, since different microservices need to know where to communicate through calls, service discovery modules are often necessary.

One of the advantages of this mode is that the latency is very low. Since the middleman role is largely eliminated on a given request path, components such as Web servers and load balancing stand the test of the game and perform well.

As you can see, for microservices of different RPC types, we need to deal with common dependencies between them, so they tend to get quite complex quickly and eventually affect or even slow down the development process.

Today, the industry has also introduced some new solutions. Envoy, for example, uses a service mesh to solve such problems.

While this pattern solves problems such as Load Balancer and Service Discovery, the overall complexity of our system is increased considerably relative to simple and straightforward RPC calls.

As the diagram below shows, many companies may start with just a few microservices that need to communicate with each other, but as their systems "grow," the call relationships and communication channels between them eventually become as intricate as a bowl of spaghetti.

message queue

Another way to structure communication between microservices is based on the use of a message bus or message queue system.

Older service-oriented architectures called this approach an enterprise service bus (ESB). Typically, they require RabbitMQ or ActiveMQ as message brokers.

As a centralized messaging service, Message Broker facilitates communication between all connected microservices.

At the same time, with the queuing mechanism and high availability of message services, communication between services can also be guaranteed.

For example, with the support of message queues, various messages can be received in order for the system to perform post-processing.

When the request peak occurs and exceeds the limit of processing capacity, the system will not directly discard the subsequent queue.

However, many message brokers have explicitly informed users that they lack scalability or are even limited in their ability to handle message delivery and persistence in a clustered environment.

Another area worth focusing on for message queues is how they handle errors when they occur.

For example, is the system reliable in message delivery at least once? Or is it guaranteed only once?

Of course, the choice of semantics depends entirely on the implementation of message queues. That is, you must be familiar with the messaging you choose and the semantic rules that go with it.

In addition, adding message queues to the architecture of an existing system inevitably adds new components to be operated and maintained.

At the same time, in order to send all kinds of messages, adding "one hop" to the network will also cause some extra delay and waiting for the website.

Objectively speaking, this pattern simplifies security matters by using centralized access control lists (ACL) for various message queuing systems.

That is, this centralized control method uniformly applies various rules to limit who can read and write what kind of messages.

Another benefit of centralized communications: network security. For example, all microservices used to communicate with each other on their own.

With Message Broker, you can transfer all connections through Message Queue service, filter out direct contact between other microservices through firewall-like rule settings, and thus reduce the attack surface.

Kafka-centric advantages

Apache Kafka, created by LinkedIn, is an open source event streaming platform. What is different from older message queuing systems is the ability to completely separate sender from receiver. That is, the sender does not need to know who will receive the message he sends.

In many other message broker systems, they must know in advance who will read the message. This somewhat prevents us from adding new and unknown use cases to traditional queuing systems.

With Apache Kafka, messages are written by the sender into a log stream called a topic, and they don't have to care who or what applications will actually read the message.

Therefore, this leaves room for new use cases to consider how to handle Kafka's related topic content according to their new uses.

For Kafka, not only does it ignore the specific payload of various messages sent, it also allows messages to be serialized in arbitrary ways.

Therefore, most users will still use JSON, AVRO, or Protobufs for serialization on their data formats.

Alternatively, you can easily restrict which topics in the system various producers and consumers can read or write to by setting up an ACL, giving you centralized security control over all messages.

As a result, you will often see Kafka used as a firehose-style data pipe to receive potentially large amounts of data.

Netflix, for example, claims that it is using Kafka to process two trillion messages per day.

It is worth noting that Kafka's consumers have an important feature: as the message load increases, Kafka's consumers will change according to the increase in failures and capacity requirements, and Kafka will automatically rebalance the processing load among the various consumers.

Developers have shifted from the need to ensure high availability within microservices to Apache Kafka itself.

Accordingly, Kafka's ability to process streaming data has evolved from a messaging system to a streaming data platform.

And thankfully, Apache Kafka's use adds an extra hop to the network, but it doesn't add (or reduce) any latency as a microservice communication bus for requests.

After reading all this, do you have a clue how Kafka handles Netflix's 2 trillion messages a day? If you still want to learn more skills or want to know more related content, welcome to pay attention to the industry information channel, thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.