How to configure High availability in Fluentd 07/12 Update SLTechnology News&Howtos

How to configure High availability in Fluentd

2025-07-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

This article will explain in detail how to configure high availability in Fluentd, and the content of the article is of high quality, so the editor will share it with you for reference. I hope you will have a certain understanding of the relevant knowledge after reading this article.

For high-traffic web sites or services, we can adopt the high-availability configuration mode of Fluentd.

Message distribution semantics

Fluentd was originally designed to be used as an event log distribution system. Such systems support several different distribution modes:

Once at the most. The message is sent immediately, and if the transmission is successful, the message will not be sent again. If the delivery fails, the message will be lost. In the real world, there will be many situations that lead to transmission failure, such as the temporary unavailability of the network.

At least once. The message will be sent at least once, and if it fails, the message will be resent. This ensures that the message will not be lost, but may cause the receiver to receive duplicate messages.

Send it only once exactly. The message is sent just once to ensure delivery and does not repeat. This is the expected distribution model. Implementing this pattern may require synchronous log processing, informing the business layer that more logs can no longer be received when the sending bottleneck is reached.

In order to collect a large number of logs without affecting business performance, the log layer must run asynchronously. Therefore, Fluentd provides only the first two transmission modes.

Network topology

To make Fluentd highly available, a typical deployment architecture requires Fluentd modules with two different roles: forwarder and aggregator. Its topology is shown in the following figure.

The transponder is deployed on the business node to collect local log events generated by the business side and send the events to the aggregator.

The aggregator continuously receives logs from the transponder, caches the logs, and periodically uploads the logs to the next processor (typically storage).

The aggregator adopts the active and standby mode. As shown in the figure above, 192.168.0.1 is the main and 192.168.0.2 is the backup.

Transponder configuration

A typical configuration of a transponder is as follows:

# TCP input @ type forward port 24224

# HTTP input @ type http port 8888

# Log Forwarding @ type forward

# primary host host 192.168.0.1 port 24224 # use secondary host host 192.168.0.2 port 24224 standby

# use longer flush_interval to reduce CPU usage. # note that this is a trade-off against latency. Flush_interval 60s

There are two input sources that use the forward plug-in to send log events to two aggregator server, where 192.168.0.2 is designated as the backup aggregator through standby. If both aggregator nodes are not available, the log will be cached in the forwarder node.

Aggregator configuration

A typical configuration of an aggregator is as follows:

# Input @ type forward port 24224

# Output...

This is relatively simple, using the forward plug-in as the input source. The log is cached locally and is delivered to the destination through a retransmission mechanism.

Failure scenario hint

Forwarding failed

After the transponder receives the log event at the application layer, it first writes the event to the local disk cache (specified by buffer_path). When each flush_interval arrives, the cache event is forwarded to the aggregator.

If the transponder process crashes, the cached log will be automatically resent after the process is restarted; if the transponder and aggregator network fails, the transponder will also retransmit the log. This ensures the robustness of the transponder to a certain extent.

However, there are still some situations that can lead to data loss:

The transponder received the business layer log and crashed before writing the log to the cache

Disk damage

Aggregation failed

The aggregator uses the same failure handling mechanism as the transponder, and the failure scenario is similar.

Error troubleshooting

When deploying with this architecture, you sometimes encounter an error prompt of "no nodes are available". This may be caused by the lack of network between nodes. It is important to note that data is transferred between nodes through port 24224, using both TCP and UDP.

It can be checked with the following command:

$telnet host 24224$ nmap-p 24224-sU host

On how to configure high availability in Fluentd to share here, I hope the above content can be of some help to you, you can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.