How to use KubeEye, an automatic inspection tool for Kubernetes cluster in KubeSphere 07/15 Update SLTechnology News&Howtos

How to use KubeEye, an automatic inspection tool for Kubernetes cluster in KubeSphere

2025-07-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

Today, I will talk to you about how to use KubeEye, a Kubernetes cluster automatic inspection tool in KubeSphere. Many people may not know much about it. In order to make you understand better, the editor has summarized the following contents for you. I hope you can get something according to this article.

Why open source KubeEye

Although Kubernetes is the de facto standard of container orchestration, although the architecture is elegant and powerful, there are always some problems and hidden problems in the daily operation of Kubernetes that make cluster administrators and Yaml engineers have a great headache.

Infrastructure daemon problem: ntp service outage

Hardware problems: such as CPU, memory or disk exception

Kernel problem: kernel deadlock, file system corruption

Container runtime problem: the runtime daemon is not responding

There are still many such problems, and these hidden abnormal problems are invisible to the control plane of the cluster, so Kubernetes will continue to schedule Pod to abnormal nodes, thus causing the cluster and running applications to bring great security and stability risks.

What is KubeEye?

KubeEye is an open source automatic cluster inspection tool designed to find a variety of problems on Kubernetes, such as application configuration errors, unhealthy cluster components and node problems. KubeEye is developed using the Go language based on Polaris and Node-Problem-Detector, with a series of anomaly detection rules built into it. In addition to predefined rules, it also supports custom rules.

What can KubeEye do?

Find and detect problems with the control plane of Kubernetes cluster, including kube-apiserver/kube-controller-manager/etcd, etc.

Help you detect various node problems in Kubernetes, including memory / CPU/ disk pressure, unexpected kernel error logs, etc.

Validate your workload yaml specifications against industry best practices to help you stabilize your cluster.

Architecture diagram

KubeEye obtains the cluster diagnosis data by calling Kubernetes API and matching the key error messages in the regular matching log with the rules of the container syntax. For more information, please see the architecture.

The built-in check item is / whether the check item describes √ ETCDHealthStatus if etcd starts and runs √ ControllerManagerHealthStatus, if kubernetes kube-controller-manager starts and runs √ SchedulerHealthStatus, if kubernetes kube-schedule starts and runs √ NodeMemory, if node memory usage exceeds threshold √ DockerHealthStatus, if docker runs normally √ NodeDisk, if node disk usage exceeds threshold √ KubeletHealthStatus if kubelet activation state and normal √ NodeCPU if node CPU usage Exceeding the threshold √ NodeCorruptOverlay2Overlay2 unavailable √ NodeKernelNULLPointernode shows that a NotReady √ NodeDeadlock deadlock is a phenomenon in which two or more processes wait for each other while competing for resources. √ NodeOOM monitors processes that consume too much memory, especially those that consume a lot of memory very quickly, and the kernel kills them Prevent them from running out of memory √ NodeExt4ErrorExt4 mount failed √ NodeTaskHung check whether there is a process longer than 120s in state D √ NodeUnregisterNetDevice check corresponding network √ NodeCorruptDockerImage check docker image √ NodeAUFSUmountHung check storage √ NodeDockerHungDocker hang resides Check the docker log √ PodSetLivenessProbe if the livenessProbe √ PodSetTagNotSpecified image address is set for each container in pod, no label is declared or the label is the latest √ PodSetRunAsPrivileged running Pod in privileged mode means that Pod can access the host's resources and kernel functions √ PodSetImagePullBackOffPod cannot pull out the image correctly Therefore, you can manually pull out the image √ PodSetImageRegistry on the corresponding node to check whether the image form is in the corresponding warehouse √ PodSetCpuLimitsMissing does not declare CPU resources restrict √ PodNoSuchFileOrDirectory to enter the container to see if there is a √ PodIOError. This is usually due to file IO performance bottleneck √ PodNoSuchDeviceOrAddress check corresponding network √ PodInvalidArgument check corresponding storage √ PodDeviceOrResourceBusy check corresponding directory and PID √ PodFileExists check existing files √ PodTooManyOpenFiles programs open files / sockets Number of connections exceeds system setting value √ PodNoSpaceLeftOnDevice check disk and Inode usage √ NodeApiServerExpiredPeriod will check ApiServer certificate expiration date less than 30 days √ PodSetCpuRequestsMissing undeclared CPU resource request value √ PodSetHostIPCSet setting host IP √ PodSetHostNetworkSet setting host network √ PodHostPIDSet setting host PID √ PodMemoryRequestsMiss does not declare memory resource request value √ PodSetHostPort setting host port √ PodSetMemoryLimitsMissing does not declare memory resource limit value √ PodNotReadOnlyRootFiles file system is not set to Read-only √ PodSetPullPolicyNotAlways image pull policy is not always the case √ PodSetRunAsRootAllowed executes √ PodDangerousCapabilities as root users you have a dangerous choice in functions such as ALL / SYS_ADMIN / NET_ADMIN √ PodlivenessProbeMissing does not declare that ReadinessProbe √ privilegeEscalationAllowed allows privilege escalation

NodeNotReadyAndUseOfClosedNetworkConnectionhttp 2-max-streams-per-connection

NodeNotReady cannot start ContainerManager cannot set property TasksAccounting or unknown property

Note: unmarked projects are under development

How to use it

Install KubeEye on the machine

Git clone https://github.com/kubesphere/kubeeye.gitcd kubeeye make install

Download the pre-built executable file from Releases.

Or you can build it from the source code

[optional] install Node-problem-Detector

Note: this line will install npd on your cluster and only need it if you want a detailed report. Ke install npd

KubeEye performs automatic inspection:

Root@node1:# ke diagNODENAME SEVERITY HEARTBEATTIME REASON MESSAGEnode18 Fatal 2020-11-19T10:32:03+08:00 NodeStatusUnknown Kubelet stopped posting node status.node19 Fatal 2020-11-19T10:31:37+08:00 NodeStatusUnknown Kubelet stopped posting node status.node2 Fatal 2020-11-19T10:31:14+08:00 NodeStatusUnknown Kubelet stopped posting node status.node3 Fatal 2020-11-27T17:36:53+08:00 KubeletNotReady Container runtime not ready: RuntimeReady=false reason:DockerDaemonNotReady message:docker: failed to get docker version: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?NAME SEVERITY TIME MESSAGEscheduler Fatal 2020-11-27T17:09:59+08:00 Get http://127.0.0.1:10251/healthz: dial tcp 127.0.0.1 Is the docker daemon running?NAME SEVERITY TIME MESSAGEscheduler Fatal 10251: connect: connection refusedetcd-0 Fatal 2020-11-27T17:56:37+08:00 Get https://192.168.13.8:2379/health: dial Tcp 192.168.13.8 connection refusedNAMESPACE SEVERITY PODNAME EVENTTIME REASON MESSAGEdefault Warning node3.164b53d23ea79fc7 2379: connection refusedNAMESPACE SEVERITY PODNAME EVENTTIME REASON MESSAGEdefault Warning node3.164b53d23ea79fc7 2020-11-27T17:37:34+08:00 ContainerGCFailed rpc error: code = Unknown desc = Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?default Warning node3.164b553ca5740aae 2020-11-27T18:03:31+08:00 FreeDiskSpaceFailed failed to garbage collect required amount of images. Wanted to free 5399374233 bytes, but freed 416077545 bytesdefault Warning nginx-b8ffcf679-q4n9v.16491643e6b68cd7 20-11-27T17:09:24+08:00 Failed Error: ImagePullBackOffdefault Warning node3.164b5861e041a60e 20-11-27T19:01:09+08:00 SystemOOM SystemOOM encountered, victim process: stress Pid: 16713default Warning node3.164b58660f8d4590 2020-11-27T19:01:27+08:00 OOMKilling Out of memory: Kill process 16711 (stress) score 205or sacrifice child Killed process 16711 (stress), UID 0, total-vm:826516kB, anon-rss:819296kB, file-rss:0kB Shmem-rss:0kBinsights-agent Warning workloads-1606467120.164b519ca8c67416 2020-11-27T16:57:05+08:00 DeadlineExceeded Job was active longer than specified deadlinekube-system Warning calico-node-zvl9t.164b3dc50580845d 2020-11-27T17:09:35+08:00 DNSConfigForming Nameserver limits were exceeded, some nameservers have been omitted The applied nameserver line is: 100.64.11.3 114.114.114 119.29.29.29kube-system Warning kube-proxy-4bnn7.164b3dc4f4c4125d 2020-11-27T17:09:09+08:00 DNSConfigForming Nameserver limits were exceeded, some nameservers have been omitted The applied nameserver line is: 100.64.11.3 114.114.114 119.29.29.29kube-system Warning nodelocaldns-2zbhh.164b3dc4f42d358b 2020-11-27T17:09:14+08:00 DNSConfigForming Nameserver limits were exceeded, some nameservers have been omitted The applied nameserver line is: 100.64.11.3 114.114.114 119.29.29.29NAMESPACE SEVERITY NAME KIND TIME MESSAGEkube-system Warning node-problem-detector DaemonSet 2020-11-27T17:09:59+08:00 [livenessProbeMissing runAsPrivileged] kube-system Warning calico-node DaemonSet 2020-11-27T17: Default Warning nginx Deployment 2020-11-27T17:09:59+08:00 [cpuLimitsMissing livenessProbeMissing tagNotSpecified] insights-agent Warning workloads CronJob 2020-11-27T17:09:59+08:00 [livenessProbeMissing] insights-agent Warning Cronjob-executor Job 2020-11-27T17:09:59+08:00 [livenessProbeMissing] kube-system Warning calico-kube-controllers Deployment 2020-11-27T17:09:59+08:00 [cpuLimitsMissing livenessProbeMissing] kube-system Warning coredns Deployment 2020-11-27T17:09:59+08:00 [cpuLimitsMissing]

You can optimize your cluster by referring to common FAQ content.

Add a custom inspection rule

In addition to the above preset inspection items and rules, KubeEye also supports custom inspection rules. Let's take an example:

Add npd custom inspection rules

Install the NPD directive ke install npd

Configmap kube-system/node-problem-detector-config edited by kubectl

Kubectl edit cm-n kube-system node-problem-detector-config

Add exception log information under the rules of configMap, which follow regular expressions.

Customize best practice rules

Prepare a rule yaml, for example, the following rule will validate your Pod specification to ensure that the image only comes from the authorized registry.

Checks: imageFromUnauthorizedRegistry: warningcustomChecks: imageFromUnauthorizedRegistry: promptMessage: When the corresponding rule does not match. Show that image from an unauthorized registry. Category: Images target: Container schema:'$schema': http://json-schema.org/draft-07/schema type: object properties: image: type: string not: pattern: ^ quay.io

Save the above rule as yaml, for example, rule.yaml.

Run KubeEye with rule.yaml.

Root:# ke diag-f rule.yaml-- kubeconfig ~ / .kube/configNAMESPACE SEVERITY NAME KIND TIME MESSAGEdefault Warning nginx Deployment 2020-11-27T17:18:31+08:00 [imageFromUnauthorizedRegistry] kube-system Warning node-problem-detector DaemonSet 2020-11-27T17:18:31+08:00 [livenessProbeMissing runAsPrivileged] kube- System Warning calico-node DaemonSet 2020-11-27T17:18:31+08:00 [cpuLimitsMissing runAsPrivileged] kube-system Warning calico-kube-controllers Deployment 2020-11-27T17:18:31+08:00 [cpuLimitsMissing livenessProbeMissing] kube-system Warning nodelocaldns DaemonSet 2020-11-27T17:18:31+08:00 [runAsPrivileged cpuLimitsMissing] default Warning nginx Deployment 2020-11- 27T17:18:31+08:00 [livenessProbeMissing cpuLimitsMissing] kube-system Warning coredns Deployment 2020-11-27T17:18:31+08:00 [cpuLimitsMissing] Roadmap

Support more fine-grained inspection items, such as slow cluster response

Support to generate cluster inspection reports for inspection results

Support for cluster inspection reports to be exported to CSV format or HTML files

After reading the above, do you have any further understanding of how to use KubeEye, a Kubernetes cluster automatic inspection tool in KubeSphere? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.