How to do the best practice of K8s log collection 10/27 Update SLTechnology News&Howtos

How to do the best practice of K8s log collection

2025-10-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article will explain in detail the best practices on how to collect K8s logs. The content of the article is of high quality, so the editor will share it with you for reference. I hope you will have a certain understanding of the relevant knowledge after reading this article.

Difficulties in Kubernetes log collection

In Kubernetes, log collection is much more complex than traditional virtual machines and physical machines. The most fundamental reason is that Kubernetes shields the underlying exceptions, provides more fine-grained resource scheduling, and provides a stable and dynamic environment. Therefore, log collection is facing a richer and dynamic environment, and there are more points to consider.

For example:

For Job applications with very short running time, it takes only a few seconds from start to stop, how to ensure that the real-time performance of log collection can keep up with and not lose data?

K8s is generally recommended to use large nodes. Each node can run 10-100 + containers. How to collect 100 + containers with as little resource consumption as possible?

In K8s, applications are deployed in the way of yaml, but log collection is mainly in the form of manual configuration files. How can log collection be deployed in K8s mode?

Kubernetes traditional log category files, stdout, host files, journal files, journallog source business containers, system components, host services, host collection methods Agent (Sidecar, DaemonSet), direct writing (DockerEngine, business) Agent, direct writing stand-alone applications 10-1001-10 applications dynamic high and low nodes dynamic collection deployment manual, Yaml manual, custom collection: active or passive

Log collection is divided into passive collection and active push. In K8s, passive collection is generally divided into Sidecar and DaemonSet, and active push is divided into DockerEngine push and business direct writing.

DockerEngine itself has the function of LogDriver, and the stdout of the container can be written to remote storage through DockerEngine by configuring different LogDriver, so as to achieve the purpose of log collection. The customization, flexibility and resource isolation of this method are very low, and it is not recommended to be used in a production environment.

Business direct writing integrates the SDK of log collection in the application, and sends the log directly to the server through SDK. This method eliminates the logic of disk collection and does not require additional deployment of Agent, so it consumes the lowest resources for the system. However, due to the strong binding of business and log SDK, the overall flexibility is very low, and it is generally only used in scenarios with a large number of logs.

DaemonSet runs only one log agent on each node node and collects all logs on that node. DaemonSet takes up much less resources, but its scalability and tenant isolation are limited, so it is more suitable for clusters with single function or not many businesses.

Sidecar deploys log agent separately for each POD. This agent is only responsible for log collection of one business application. Sidecar takes up more resources, but it has strong flexibility and multi-tenant isolation. It is recommended to use this method for large K8s clusters or clusters serving multiple business parties on the PaaS platform.

To sum up:

DockerEngine direct writing is generally not recommended.

Business direct writing is recommended for use in scenarios with a large number of logs.

DaemonSet is generally used in small and medium-sized clusters

Sidecar is recommended for very large clusters.

The detailed comparison of various collection methods is as follows:

DockerEngine business direct write DaemonSet mode Sidecar mode collection log type standard output business log standard output + some files deployment operation and maintenance is low, native support is low, it is only necessary to maintain the configuration file, the DaemonSet is high, each POD that needs to collect logs needs to deploy sidecar container log classification storage can not achieve business independent configuration generally, each POD can be configured separately through container / path mapping High flexibility and weak multi-tenant isolation, log direct writing can compete with business logic for resources generally, only strong isolation between configurations, isolation through containers, resources can be allocated separately to support cluster size, local storage is unlimited, if you use syslog and fluentd, there will be a single point of restriction and unlimited depending on the unlimited number of configurations and low resource usage, docker

Engine provides the lowest overall cost, eliminating collection overhead. It is higher to run one container per node and one container per POD.

The query convenience is low, only the grep raw log is high, it can be customized according to the business characteristics, the query and statistics are high, the customizability is low and high, the free expansion is low and high, each POD is individually configured with high coupling, strong binding with DockerEngine, the modification needs to restart the DockerEngine is high, the collection module modification / upgrade needs to reissue the business is low, and the Agent can be upgraded independently. Default collection Agent upgrade corresponds to Sidecar business restart (there are some expansion packages that can support Sidecar hot upgrade) suitable for scenario testing, POC and other non-production scenarios that require high performance scenarios with clear log classification, single cluster large, hybrid, Paas cluster log output: Stdout or file

Unlike virtual machines / physical machines, K8s containers provide both standard output and files. In the container, standard output outputs logs directly to stdout or stderr, while DockerEngine takes over stdout and stderr file descriptors and processes logs according to the LogDriver rules configured by DockerEngine. The way logs are printed to files is basically similar to that of virtual machines / physical machines, except that logs can be stored in different ways, such as default storage, EmptyDir, HostVolume, NFS, etc.

Although printing logs using Stdout is officially recommended by Docker, you should note that this recommendation is based on scenarios where containers are only used as simple applications. In actual business scenarios, we recommend you to use files as much as possible for the following main reasons:

Stdout performance problems, from the application output stdout to the server, there are several processes (such as the commonly used JSON LogDriver): apply stdout-> DockerEngine-> LogDriver-> serialize to JSON-> save to file-> Agent collect files-> parse JSON-> upload server. The whole process is much more expensive than the file. During the stress test, the output of 100,000 lines of log per second will take up an extra CPU core of DockerEngine.

Stdout does not support classification, that is, all the outputs are mixed in one stream and cannot be classified like files. Usually, there are AccessLog, ErrorLog, InterfaceLog (logs calling external interfaces), TraceLog, etc. in an application, but these logs have different formats and uses, so it will be difficult to collect and analyze them if mixed in the same stream.

Stdout only supports the output of the main program of the container. Programs running in daemon/fork mode will not be able to use stdout.

The Dump mode of files supports a variety of strategies, such as synchronous / asynchronous writes, cache size, file rotation strategy, compression policy, purge policy, etc., which is relatively more flexible.

Therefore, we recommend that online applications use files to output logs. Stdout is only used in single-function applications or in some K8s system / operation and maintenance components.

CICD Integration: Logging Operator

Kubernetes provides a standardized service deployment method. Through yaml (K8s API), you can declare routing rules, expose services, mount storage, run business, define scaling and expansion rules, etc., so Kubernetes is easy to integrate with CICD system. Log collection is also an important part of the monitoring process of operation and maintenance, and all logs should be collected in real time after the business is online.

The original way is to manually deploy the logic of log collection after release, which requires manual intervention, which runs counter to the purpose of CICD automation. In order to achieve automation, someone begins to package an automatic deployment service based on the API/SDK of log collection, and triggers the call through CICD's webhook after release, but this method is very expensive to develop.

In Kubernetes, the most standard way to integrate logs is to register a new resource in the Kubernetes system and manage and maintain it in the way of Operator (CRD). In this way, the CICD system does not require additional development, but can be implemented by attaching log-related configurations when deploying to the Kubernetes system.

Kubernetes log collection scheme

Long before the emergence of Kubernetes, we began to develop a log collection solution for the container environment. With the gradual stability of K8s, we began to migrate a lot of business to the K8s platform, so we also developed a set of log collection scheme on K8s based on the previous foundation. The main functions are:

Supports real-time collection of all kinds of data, including container files, container Stdout, host files, Journal, Event, etc.

Multiple acquisition and deployment methods are supported, including DaemonSet, Sidecar, DockerEngine LogDriver, etc.

Supports enriching log data, including additional information such as Namespace, Pod, Container, Image, Node, etc.

Stable, highly reliable, based on Ali's self-developed Logtail collection Agent implementation, there are millions of deployment examples in the whole network.

Based on CRD, log collection rules can be deployed using Kubernetes deployment and release, which is perfectly integrated with CICD.

Install the log collection component

At present, this collection solution is available to the public. We provide a Helm installation package, including Logtail's DaemonSet, AliyunlogConfig's CRD statement and CRD Controller. After installation, you can directly use DaemonSet collection and CRD configuration. The installation is as follows:

Ali Cloud Kubernetes cluster can be installed automatically when it is activated, so that the above components will be installed automatically when the cluster is created. If it is not installed when it is activated, it can be installed manually.

If it is a self-built Kubernetes, whether it is built on Aliyun or in other clouds or offline, you can also use this collection method.

After the above components are installed, Logtail and the corresponding Controller will run in the cluster, but these components will not collect any logs by default. You need to configure log collection rules to collect all kinds of logs of the specified Pod.

Collection rule configuration: environment variable or CRD

In addition to manual configuration on the log service console, two additional configuration methods are supported for Kubernetes: environment variables and CRD.

Environment variables have been used since the swarm era. You only need to declare the address of the data to be collected on the container environment variables you want to collect, and Logtail will automatically collect these data to the server.

This method has the advantages of simple deployment, low learning cost and easy to use. However, there are few configuration rules that can be supported, and many advanced configurations (such as parsing method, filtering method, blacklist and whitelist, etc.) are not supported. This declaration method does not support modification / deletion. Each modification actually creates a new collection configuration, and the historical collection configuration needs to be cleaned manually, otherwise resources will be wasted.

CRD configuration is very much in line with the standard extension recommended by Kubernetes. It allows collection configuration to be managed in the way of K8s resources, and declares the data to be collected by deploying AliyunLogConfig, a special CRD resource, to Kubernetes.

For example, the following example is to deploy the collection of standard output of a container, where the definition requires both Stdout and Stderr to be collected, and excludes containers that contain COLLEXT_STDOUT_FLAG:false in the environment variables.

CRD-based configuration is managed by Kubernetes standard extended resources, supports the complete semantics of adding, deleting, changing and querying configuration, and supports a variety of advanced configurations, which is highly recommended for collection and configuration.

Configuration recommended by collection rules

In practical application scenarios, DaemonSet or a mixture of DaemonSet and Sidecar is generally used. The advantage of DaemonSet is high resource utilization, but there is a problem that all Logtail of DaemonSet share global configuration, while a single Logtail has an upper limit of configuration support, so it cannot support clusters with a large number of applications.

The above is the recommended configuration, and the core idea is:

One configuration collects as much of the same kind of data as possible, reduces the number of configurations, and reduces DaemonSet pressure.

The core application collection should be given sufficient resources, and the Sidecar mode can be used.

Configuration mode use CRD mode as much as possible

Sidecar because each Logtail is a separate configuration, there is no limit on the number of configurations, which is suitable for very large clusters.

Practice 1-small and medium-sized clusters

Most of the Kubernetes clusters are small and medium-sized, and there is no clear definition for small and medium-sized ones. Generally, the number of applications is less than 500, the node size is less than 1000, and there is no well-defined Kubernetes platform operation and maintenance. The number of applications in this scenario will not be very large, and DaemonSet can support all collection configurations:

The data of most business applications are collected by DaemonSet.

Core applications (such as order / transaction system) are collected separately using Sidecar.

Practice 2-large clusters

For some large / super-large clusters used as PaaS platform, the general business is more than 1000, the node size is also more than 1000, and there are special Kubernetes platform operation and maintenance personnel. There is no limit on the number of applications in this scenario, and DaemonSet cannot support it, so you must use Sidecar. The overall plan is as follows:

The system component logs and kernel logs of the Kubernetes platform are relatively fixed. These logs are collected by DaemonSet, which mainly provides services for the operation and maintenance personnel of the platform.

The logs of each service are collected by Sidecar, and the collection destination address of Sidecar can be set independently for each service, which provides enough flexibility for the DevOps personnel of the business.

On how to carry out K8s log collection best practices are shared here, I hope that the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.