In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-29 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
Puppet monitoring quick check the cause of the problem and what the solution is, I believe that many inexperienced people do not know what to do, so this paper summarizes the causes of the problem and solutions, through this article I hope you can solve this problem.
Puppet is a centralized configuration management system based on Cramp S architecture. Based on its own descriptive language, it can manage configuration files, users, timing tasks, software packages and system services, and ensure the consistency of basic configuration of large-scale clusters.
We use Puppet to manage thousands of servers, and after many optimized monitoring, automatic grayscale publishing ensures the consistency of the basic configuration of all clusters. This article discusses how to monitor the Puppet system and share the typical problems and solutions with you.
Monitoring and selection
Foreman provides comprehensive interaction facilities, including Web front end, CLI and RESTful API. On this basis, we can build a monitoring and management system, as well as achieve alarm and other functions.
Core business process
You can simply abstract the workflow of Puppet into four parts:
Request phase: Agent sends its own information to Server based on SSL
Response phase: Server parses the corresponding configuration based on the client information, and finally sends the pseudo code (catalog) back to Agent
Execution phase: Agent receives catalog and executes commands or updates files
Reporting phase: Agent reports the results to Server.
Figure 1 Puppet workflow
Monitoring Overview
The core monitoring of Puppet mainly covers the following links:
Is the communication between Agent and Master normal?
Whether Agent policy enforcement is effective or not
The effective time and scope of the policy issued by Puppet
The running status of Master and the clusters it manages.
Black box monitoring
Puppet black box monitoring indicators do not meet expectations, indicating that the cluster does not work properly or is abnormal. Black box monitoring indicators include: whether all policies are effective, whether the effective scope of policies is in line with expectations, and whether the effective results of policies are in line with expectations.
Are all policies effective?
Note: add a batch of test nodes to the online Puppet cluster and run check scripts regularly to verify that all policies are effective.
Effective scope of the policy
Note: after the policy is launched, you need to confirm whether its effective scope is in line with expectations, that is, whether the policy only takes effect on the specified node.
Implementation: check the task regularly through the Puppet module MCollective (check whether the list of machines in effect is consistent with the list of machines in the service tree). As shown in the figure below, 98% of the machines in the cluster hn-xdata meet expectations and 2% do not.
Figure 2 Puppet policy effective scope monitoring
Is the effective result of the policy in line with expectations?
Note: after the policy is online, you need to make sure that all policies take effect on all machines.
Implementation: check the task regularly through the Puppet module MCollective, (check whether the list of machines in effect is consistent with the list of machines in the service tree), as shown in the figure below, each policy has a pie chart.
Figure 3 Puppet policy result monitoring
White box monitoring
White-box monitoring is a supplement to black-box monitoring, which serves for fault location and combs from four aspects: cluster capacity, traffic, delay and errors.
Data collection method:
Through Foreman API
Master log analysis
Table 1 Overview of white box indicators acquired through Foreman API
Index
Description
No reports
No reported mainframe
Error
Connected, but there was an error in the execution policy.
Out of sync
The execution policy timed out; the hostname is duplicate; the host cannot be connected
Active
Agent pull policy is normal
Pending
Capacity index, which can not be handled by Master
No changes
Agent normal pull policy but no change
Puppet_report_time_total
Total time for Agent to execute policies
Pv
Visits per minute
Capacity
CPU of the instance where Master resides, number of network connections indicator, Nic
Flow
Agent PV, which calculates traffic based on Puppet Master's access log puppetserver-access.log
Figure 4 Agent PV traffic diagram
Delay
Time required for a single Agent update policy: puppet_report_time_total
Description: puppet_report_time_total is the total time from Agent connecting to Master to sending the report to Master. 0-3s account for 50% of the report, 0-11s, 90% of the time, and 0-15s of 99%.
Figure 5 Agent delay
Error
No reports: number of unreported instances
Error agent: the number of instances with errors in policy enforcement
Out of sync: the number of instances in which the execution policy timed out, the hostname is duplicated, and the host is not connected to the Master.
Figure 6 Foreman error monitoring metrics
Problems found by Puppet monitoring
Agent covers all machines
Problem: there is no guarantee that all machine Agent will work properly.
Solution: add all machines to Agent process monitoring based on service tree or CMDB related systems.
Agent Enforcement Policy timed out
Problem: timeout alarm occurs when large files are downloaded concurrently.
Troubleshooting method: execute the command "puppet agent-t-debug" on the Agent and find that the timeout occurred when pulling the file. Because the file is large, there are many Agent pulls on the Master at the same time, resulting in a timeout.
Solution: store large files on cloud storage to improve download speed.
Grouping is not limited to existing Facter attributes
Problem: the existing Facter attributes of policy grouping and grayscale publishing grouping are not satisfied.
Reason: as there are more and more access services, there are more service groups.
Solution: customize Facter.
Agent out of sync (Out of Sync)
Problem: Agent report is out of sync.
Reasons and solutions:
Table 2
Reason
Solution
Duplicate hostname
Re-authentication after modification of Agent Hostname
Rename the host after authentication
Delete the machine with the original name authentication directly in the Foreman console
Agent service exception
Restart the Puppet service on Agent
Agent disk is full
After cleaning the disk, Agent will start and recover itself.
Agent-side certificate error
After deleting the / etc/puppetlabs/puppet/ssl folder on Agent, perform puppet agent-t recertification
Agent side puppet.conf file is empty
Write the corresponding [Agent] configuration to the puppet.conf file and restore it.
Master side puppe.conf file is empty
Write the corresponding [Master] configuration to the puppet.conf file and restore it.
Foreman service down dropped
Execute service httpd restart, service foreman restart on the Foreman machine
Could not request certificate
1) Agent and Master time are out of sync, ntpdate master-IP synchronization time; 2) Agent is not connected to Master network; 3) Port 8140 on Master is not available.
Policy release to unexpected cluster
Problem: there is an error in the scope of the policy.
Reason: the Puppet Master entry file is unified as site.pp. Due to the large number of policy groups, there will be many corresponding branches in the grayscale release stage, so operation and maintenance engineers are prone to operational errors.
Solution: manage site.pp as a policy module that contains default default groupings and groups that need to be published in grayscale. The site.pp under the manifest folder only needs to include the module.
Figure 7 default grouping strategy after site.pp optimization
Fig. 8 grouping of grayscale stages of policy release
Function monitoring found that the synchronized files were unexpected.
Problem: Master is deployed in a cluster mode, and the data on multiple Master may not be synchronized during the policy change. In this case, the files pulled by the same Agent may be inconsistent.
Reason: because there are multiple Master, and one of the Master does not update the file, the LB is forwarded through the polling policy. When the Agent requests Master, it is Master A, and when pulling the file, the request may be Master B. the data of the two Master are inconsistent.
Solution: update the LB policy to the source IP hash.
After reading the above, have you mastered the cause of the problem and the solution to the problem of Puppet monitoring? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.