Detailed description of logs in Kubernetes 04/24 Update SLTechnology News&Howtos

Detailed description of logs in Kubernetes

2025-04-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/03 Report--

The importance of diary in Kubernetes

Usually, the most basic function of log is to record the trajectory of the program, from which a lot of functions will be derived. for example, online monitoring, alarm, operation analysis, security analysis, etc. (for details, please see the first article, "typical problems in the construction of 6 K8s log systems. How many have you encountered?" These functions, in turn, have certain requirements for logs, and we need to standardize logs as much as possible to reduce the cost of collection, parsing and analysis.

In Kubernetes, the environment is very dynamic, and logs are basically volatile, so it is necessary to collect logs to the central storage in real time. In order to cooperate with log collection, there will be more requirements for log output and collection.

Below we list the common considerations for log output in Kubernetes (where the items marked (*) are unique to Kubernetes):

How to select log level log content specification reasonably control log output select multiple log output targets control log performance consumption how to select log database log form selection (*) log storage cycle (*) how to ensure log storage cycle (*) how to select log level

The log level is a description used to distinguish the severity of the corresponding events in the log, which is an option that must be available in all logs. Logs are usually divided into six different levels:

FATAL (fatal): used to output errors that are very serious or unanticipated, which should be immediately reported to the police and manually dealt with; ERROR (errors): unexpected errors that may cause some system anomalies but will not affect the normal operation of the core business and systems; WARN (warnings): potentially dangerous or noteworthy information (compared to the core path) INFO (information): details of application execution, through which you can see the main execution process of each request; DEBUG (debugging): log information for offline debugging, used to analyze application execution logic, and do not enable online applications; TRACE (tracking): output the most detailed running track, which may include the data content involved.

As a programmer, you must set the log level reasonably. I have summed up the following experiences in the process of development:

FATAL type logs must be printed with very serious errors and scenarios that need to be processed manually; the difference between ERROR and WARNING is difficult for many programmers to choose, so they can be considered from the perspective of alarm: ERROR is generally required for alarm, WARNING is not needed; log level, on the one hand, is to indicate the severity of the log, on the other hand, it is also to control the log output of the application. Usually only INFO or WARN logs can be opened online. DEBUG logs can be typed more frequently to facilitate the analysis of problems; all user request logs must be recorded; for uncertain external system calls, the logs should be as comprehensive as possible; the log library in the program needs to have the ability to change the log level during operation, so that it is convenient to temporarily change the log level when problems need to be analyzed. Usually when new features are launched, the logs involved can be appropriately upgraded to facilitate real-time observation and monitoring, and then adjusted to normal after stabilization (remember to add comments to facilitate change back). Log content specification

Usually in the case of no constraints, programmers play wild, all kinds of log content will appear, these logs that only developers can understand are difficult to analyze and alarm. So we need a log top-down specification to constrain developers in the project so that all logs appear to be printed by one person and easy to analyze.

Fields of the log

The fields that are usually required in the log are: Time, Level, Location. For a specific module / process / business, some Common fields are also required, such as:

If you use the Trace system, you can attach TraceID to the log. A fixed process needs to attach corresponding fields, such as order number and user ID during the life cycle of the order, which can be attached to the log instance of the corresponding process through Context. HTTP requests need to be recorded: URL, Method, Status, Latency, Inflow, OutFlow, ClientIP, UserAgent, etc. For more information, please see Nginx log format. If the logs of multiple modules are printed to the same stream / file, there must be a field identification module name.

The field specification of the log is best driven by the operation and maintenance platform / middleware platform from top to bottom, constraining the programmers of each module / process to print the log in accordance with the regulations.

Log expression form

Generally speaking, we recommend using the log format of KeyValue pairs, for example, our Ali Feitian log database uses this form:

[2019-12-30 21 path:pangu://localcluster/index/3/prom/7/1577711464522767696_0_1577711517 min_time:1577712000000000 max_time:1577715600000000 normal_count:27595 config:prom start_line:57315569 end_line:57343195 latency 45 30.611992] [WARNING] [958] [block_writer.cpp:671] path:pangu://localcluster/index/3/prom/7/1577711464522767696_0_1577711517 min_time:1577712000000000 max_time:1577715600000000 normal_count:27595 config:prom start_line:57315569 end_line:57343195 latency (ms): 42 type:AddBlock

The log of the KeyValue pair can be completely self-parsed and easy to understand, and it is convenient for automatic parsing when the log is collected.

In addition, JSON log format is recommended, there are many log libraries output in JSON format, and most log collection Agent supports log collection in JSON format.

{"addr": "tcp://0.0.0.0:10010", "caller": "main.go:98", "err": "listen tcp: address tcp://0.0.0.0:10010: too many colons in address", "level": "error", "msg": "Failed to listen", "ts": "2019-03-08T10:02:47.469421Z"}

Note: non-readable log formats (such as ProtoBuf, Binlog, etc.) are not recommended in most scenarios.

The problem of single log line wrapping

If it is not necessary, try not to output a log into multiple lines, which is expensive for collection, parsing and indexing.

Reasonable control of log output

The output of logs directly affects the use of disks and the performance consumption of applications. Too many logs are not conducive to viewing, collection and analysis; too few logs are not conducive to monitoring, and there is no way to investigate when problems arise.

In general, online applications need to reasonably control the amount of data in the log:

The request and response logs of the service entry are output and collected for no special reason, and the collected fields can be adjusted according to the demand; error logs are generally printed, and if there are too many, you can use sampling to print; reduce invalid log output, especially the printing of logs in the loop should be reduced as much as possible. Request logs (such as Ingress, Nginx access logs) generally do not exceed 5MB/s (500byte each, no more than 1W/s), and application logs do not exceed 200KB/s (2KB each, no more than 100s / s). Select multiple log output targets

It is recommended to apply different types of logs to different targets (files), so as to facilitate classified collection, viewing and monitoring. For example:

If the access log is placed in a separate file, if there are not many domain names, it can be in the form of one file for one domain name; a separate file for the log of the error class, and a separate file for monitoring alarms; and a separate file for calling the log of the external system to facilitate subsequent reconciliation and audit. Middleware is usually provided by a unified platform, and the log is usually printed separately. Control log performance consumption

Logs, as an auxiliary module of the business system, must not affect the normal operation of the business, so the performance consumption of the log module requires separate attention. Generally, when selecting / developing a log library, you need to test the performance of the log library to ensure that the performance consumption of the log does not exceed 5% of the overall CPU.

Note: make sure that log printing is asynchronous and does not block the operation of the business system.

How to select a log library

There are many open source log libraries, and there are dozens of them in almost every language. Choosing a log library that meets the needs of the company / business requires careful selection. A simple guideline is to use a stable version of the popular log library as much as possible, and you are less likely to enter the pit. For example:

Java uses Log4J and LogBack;Golang uses go-kit;Python to integrate log libraries by default in most scenarios. It is recommended to read the CookBook;C++ recommendation to use spdlog, which is high-performance and cross-platform. Log form selection

In virtual machine / physical machine scenarios, most applications output logs in the form of files (only some system applications output to syslog/journal). In container scenarios, there is one more standard output. If the application types the logs to stdout or stderr, the logs will automatically enter into the log module of docker and can be viewed directly through docker logs or kubectl logs.

The standard output of the container is only suitable for relatively single applications, such as some system components in K8s. Online service applications usually involve multiple levels (middleware) and interact with various services, and logs are generally divided into several categories. If all are printed to the standard output of the container, it is difficult to distinguish processing. At the same time, the standard output of the container consumes the performance of DockerEngine, and the log volume of the measured 10W/s will take up an additional CPU (100% per core) of DockerEngine.

Whether the log is down and the media is off the disk.

In Kubernetes, you can also connect the log library directly to the log system, and transfer the log directly to the back end of the log system without dropping the disk when the log is printed. This method eliminates the process of log storage and Agent collection, and the overall performance will be much higher.

In this way, we generally only recommend the use of scenarios with a large log volume. In general, the disk is set directly. Compared with sending it directly to the backend, the disk is added with a layer of file cache, and certain data can be cached in the case of network failure. When the log system is not available, our R & D OPS students can directly view the log of the file to improve the overall reliability.

Kubernetes provides a variety of storage methods, such as local storage, remote file storage, object storage and so on. Because the QPS of log writing is very high, and it is also directly related to the application, if you use remote type of storage, there will be 2-3 additional network communication overhead. It is generally recommended to use local storage, either HostVolume or EmptyDir, so that the performance impact on writes and collections will be as small as possible.

How to ensure the storage cycle of logs

Compared with the traditional virtual machine / physical machine scenario, Kubernetes provides powerful scheduling, fault tolerance, shrinking / expanding capabilities for nodes and application layers. We can easily make applications run with high reliability and extreme flexibility through Kubernetes. One of the phenomena brought about by these advantages is that nodes and containers are created / deleted dynamically, so that logs can be destroyed at any time, and there is no way to ensure that the storage cycle of logs can meet DevOps, audit and other related requirements.

The long-term storage of logs in a dynamic environment can only be realized through centralized log storage. Through real-time log collection, the logs of each node and container can be collected to the log center system within seconds. Even if the node / container hangs, it can also restore the scene through the log.

Summary

Log output is a very important link in the construction of log system, the company / product line must follow a unified log specification, so as to ensure that subsequent log collection, analysis, monitoring and visualization can be carried out smoothly.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.