Default Monitoring key and Chinese interpretation 02/09 Update SLTechnology News&Howtos

Default Monitoring key and Chinese interpretation

2026-02-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/02 Report--

Alarm key value-Chinese interpretation- -- failure duration

The instance configuration of the AlertmanagerConfigInconsistentAlertManager cluster xxxx is out of sync. 5mAlertmanagerDownAlertManager has disappeared from the target discovery of Prometheus. 15mAlertmanagerFailedReload failed to reload the configuration of AlertManager 10mAlertmanagerMembersInconsistentAlertManager could not find all other members of the cluster. 5mCPUThrottlingHighcpu has high throttling and low CPU limits, even during peak periods There are still a lot of idle resources in the whole cluster, 15metcdGRPCRequestsSlowETCD GRPC requests slow 10metcdHighCommitDurationsETCD submission time too long 10metcdHighFsyncDurationsETCD synchronization time too long GRPC requests too many 10metcdHighNumberOfFailedHTTPRequestsETCD failed HTTP requests too many 10metcdHighNumberOfFailedProposalsETCD failed plans many 15metcdHighNumberOfLeaderChangesETCD Leader changes too many 15metcdHTTPRequestsSlowETCD HTTP requests slow 10metcdInsufficientMembersETCD member insufficient 3metcdMemberCommunicationSlowETCD member communication slow 10metcdNoLeaderETCD no Leader1mKubeAPIDownKubeAPI down or no 15mKubeAPIErrorsHighAPI server returning an error for the requested value. The 10mKubeAPILatencyHighAPI server has a delay of more than 99%10mKubeClientCertificateExpiration per second. The client certificate used to authenticate to APIServer expires within 7 days. 5mKubeClientErrors connection client API error 15mKubeControllerManagerDownKubeControllerManager down 15mKubeCPUOvercommit the cluster cpu exceeds the resource limit 5mKubeCronJobRunningCronJob runs for more than 1 hour 1hKubeDaemonSetMisScheduledDaemonSet scheduling error, did not run to the correct machine where the 10mKubeDaemonSetNotScheduledDaemonSet bit setting runs, that is, allocation error 10mKubeDaemonSetRolloutStuckDaemonSet is stuck when starting or scrolling? The 15mKubeDeploymentGenerationMismatchDeployment generation deployment does not match, and the deployment failure 15mKubeDeploymentReplicasMismatchDeployment does not match the expected number of copies for more than an hour. It took more than an hour for 1hKubeJobCompletionJob to complete the 1hKubeJobFailedJob failure. The number of 15mKubeletTooManyPodspods is too large. The memory resources of 110%15mKubeMemOvercommit clusters that exceed the limit are overused, and can no longer tolerate the failure node drift 5mKubeNodeNotReadynode failure for more than 1 hour. 1hKubePersistentVolumeErrors persistent volume Volume exception 5mKubePersistentVolumeFullInFourDays according to the recent sampling, a Volume will fill up the 5mKubePersistentVolumeUsageCritical within 4 days. Insufficient permission to use the persistent volume. You can only use 1mKubePodCrashLooping in xxx space to restart the pod CrashLoopBackOff state for more than 5 minutes and the 1hKubePodNotReadypod is not ready for more than one hour. 1hKubeQuotaExceededKube used 15mKubeSchedulerDownKubeScheduler over quota to crash 15mKubeStatefulSetGenerationMismatchStatefulSet error, but 15mKubeStatefulSetReplicasMismatchstatefulset replica set mismatch has not been rolled back 15mKubeStatefulSetUpdateNotRolledOutstatefulset update has not finished for more than 15 minutes (update timeout) 15mKubeStateMetricsDownKubeStateMetrics down 15mKubeVersionMismatchkube version does not match 1hNodeDiskRunningFullnode disk space exceeds 85%10mNodeExporterDownNodeExporter down 15mPrometheusConfigReloadFailedPrometheus overload configuration failure 10mPrometheusDownPrometheus down 15mPrometheusErrorSendingAlerts sends alert from Prometheus to AlertManager error 10mPrometheusNotConnectedToAlertmanagersPrometheus cannot connect to AlertManager10mPrometheusNotificationQueueRunningFullPrometheus alert notification queue full 10mPrometheusNotIngestingSamples information storage opentsdb exception 10mPrometheusOperatorDownPrometheusOperator down 15mPrometheusOperatorNodeLookupErrorsPrometheusOperator node error 10mPrometheusOperatorReconcileErrorsPrometheusOperator has error log 10mPrometheusTargetScrapesDuplicate due to duplicate timestamp but different value Rejected a lot of collected data 10mPrometheusTSDBCompactionsFailing compression instance block has a problem more than 4 hours 12hPrometheusTSDBReloadsFailing disk reload data block has a problem more than 4 hours 12hPrometheusTSDBWALCorruptionsTSDB wal prewrite log has been corrupted 4hTargetDown overall labels.jobvolume has dropped by 10% 10m

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.