Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the typical problems in the construction of K8s log system

2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

In this issue, Xiaobian will bring you typical problems in the construction of K8s log system. The article is rich in content and analyzed and described from a professional perspective. After reading this article, I hope you can gain something.

Why we need a logging system

Generally, the process of locating an online problem is: finding the problem through Metric, locating the problem module according to Trace, and locating the cause of the problem according to the specific log of the module. The log includes errors, key variables, code running paths and other information, which are the core of troubleshooting, so the log is always the only way to troubleshoot online problems.

During Ali's decade, the journal system evolved with the development of computational forms, roughly divided into three main stages:

In the standalone era, almost all applications were standalone deployments, and when service pressure increased, only higher-specification IBM minicomputers could be switched. Log as a part of the application system, mainly used as a program Debug, usually combined with grep and other common Linux text commands for analysis;

With the stand-alone system becoming the bottleneck restricting the development of Ali's business, in order to truly Scale out, Feitian project started: Feitian 5K project officially launched in 2013. At this stage, each business began to be transformed into a distributed one, and the calls between services were also changed from local to distributed. In order to better manage, debug and analyze distributed applications, we developed Trace (distributed link trace) system and various monitoring systems. The unified feature of these systems is to store all logs (including Metric, etc.) in a centralized manner;

In order to support faster development and iterative efficiency, in recent years we have started containerization transformation, and started to embrace Kubernetes ecosystem, full cloud service, Serverless and other work. At this stage, the log shows explosive growth in terms of scale and type, and the demand for digital and intelligent analysis of logs is getting higher and higher, so a unified log platform emerges.

The Ultimate Interpretation of Observability

In CNCF, the main role of observability is to diagnose problems, rising to the overall level of the company. Observability includes not only DevOps field, but also business, operation, BI, audit, security and other fields. The ultimate goal of observability is to realize digitalization and intelligence in all aspects of the company.

In Alibaba, almost all business roles involve a variety of log data. In order to support various application scenarios, we have developed a lot of tools and functions: log real-time analysis, link tracing, monitoring, data processing, stream computing, offline computing, BI system, audit system, etc. Log system mainly focuses on real-time data collection, cleaning, intelligent analysis and monitoring, as well as docking various stream computing and offline systems.

Kubernetes log system construction difficulties

There are many simple logging system solutions, which are relatively mature, so we won't go into details here. We only talk about the logging system construction on Kubernetes this time. The logging scheme on Kubernetes is very different from our previous logging scheme based on physical machine and virtual machine scenarios. For example:

The log format becomes more complex, not only the logs on the physical machine/virtual machine, but also the standard output of the container, files in the container, container events, Kubernetes events and other information to be collected;

The dynamic nature of the environment becomes stronger. In Kubernetes, machine downtime, offline, online, Pod destruction, expansion/reduction, etc. are all normal. In this case, the existence of logs is instantaneous (for example, if the Pod log is invisible after destruction), so log data must be collected to the server in real time. At the same time, it is necessary to ensure that the collection of logs can adapt to this highly dynamic scene;

There are many types of logs. The above figure is a typical Kubernetes architecture. A request from the client needs to pass through multiple components such as CDN, Ingress, Service Mesh, Pod, etc., involving various infrastructures. The types of logs have increased a lot, such as K8s various system component logs, audit logs, ServiceMesh logs, Ingress, etc.;

Business architecture changes, now more and more companies begin to implement microservice architecture on Kubernetes, in the microservice system, the development of services is more complex, the dependencies between services and the dependencies of underlying products of services are more and more, at this time, the problem troubleshooting will be more complex, if the logs of various dimensions are associated, it will be a difficult problem;

It is difficult to integrate log solutions. Usually, we will build a set of CICD system on Kubernetes. This CICD system needs to complete the integration and deployment of services as automatically as possible. The collection, storage and cleaning of logs also need to be integrated into this system, and the declarative deployment mode of K8s should be as consistent as possible. However, the existing log systems are usually relatively independent systems, and it costs a lot to integrate them into CICD.

Log size problem: Usually in the early stage of the system, we will choose to build our own open source logging system. This method has no problems in the test verification stage or the early stage of company development. However, when the business gradually grows and the log volume grows to a certain scale, the self-built open source system will often encounter various problems, such as tenant isolation, query delay, data reliability, system availability, etc. Although the log system is not the most core path in IT, once these problems occur at critical moments, they will have a very terrible impact. For example, when there is an emergency problem, multiple engineers will query the log system concurrently during troubleshooting, resulting in a longer failure recovery time.

The above is what are the typical problems in the construction of K8s log system shared by Xiaobian. If there are similar doubts, please refer to the above analysis for understanding. If you want to know more about it, please pay attention to the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report