In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article mainly shows you the "nagios rookie fault example analysis", the content is easy to understand, clear, hope to help you solve your doubts, the following let the editor lead you to study and learn the "nagios rookie fault example analysis" this article.
Nagios, which has been running for more than a year, suddenly has a problem, and the website can be opened, but all host and service can not see the status. Prompt nagios maybe not running
The first reaction was that nagios was dead.
Restart the nagios service
[root@nagios-server ~] # / etc/init.d/nagiosstopStopping nagios: done. [root@nagios-server ~] # / etc/init.d/nagiosstartStarting nagios: done.
Refresh the page is still the same, clean the browser cache, reopen the browser still can not see.
Is there something wrong with the configuration file? I haven't changed anything lately.
Check the configuration file
[root@nagios-server ~] # / etc/init.d/nagioscheckconfig
No error.
Server restart
[root@nagios-server ~] # reboot
Still no.
Check the nagios error log.
[root@nagios-server ~] # cat/usr/local/nagios/nagios.log
The log is empty!
That's weird. then look at the system log.
[root@nagios-server ~] # cat/var/log/messages
This kind of record is all over the screen.
The check of service 'MEM' on host' xxx'looks like it was orphaned (results never came back). I'm scheduling an immediate check of theservice...
I had a headache, so I made a wrong decision, which caused me to lose the opportunity to find the root of the problem in the first place!
Yeah, I emptied the system log.
[root@nagios-server ~] # > / var/log/messages
The whole world is clean, restart the nagios service and see what errors will be recorded in the log.
[root@nagios-server ~] # / etc/init.d/nagiosstop [root@nagios-server ~] # / etc/init.d/nagiosstart [root@nagios-server etc] # tail-30/var/log/messagesDec 30 17:37:38 nagios-server nagios: Nagios3.5.1 starting... (PID=50752) Dec 30 17:37:38 nagios-server nagios:Local time is Fri Dec 30 17:37:38 CST 2016Dec 30 17:37:38 nagios-server nagios:LOG VERSION: 2.0Dec 30 17:37:38 nagios-server nagios:Warning: Host 'xxx' has no services associated with itinerary Dec 30 17:37:38 nagios-server nagios:Finished daemonizing... (New PID=50753) Dec 30 17:39:21 nagios-server nagios:Nagios 3.5.1 starting... (PID=51146) Dec 30 17:39:21 nagios-server nagios:Local time is Fri Dec 30 17:39:21 CST 2016Dec 30 17:39:21 nagios-server nagios:LOG VERSION: 2.0Dec 30 17:39:21 nagios-server nagios:Warning: Host 'xxx' has no services associated with itinerary Dec 30 17:39:21 nagios-server nagios:Lockfile' / usr/local/nagios/var/nagios.lock' looks like its already held byanother instance of Nagios (PID 50753). Bailing out...
Well, an exception was found, and nagios.lock was occupied by other processes.
4. According to this abnormal prompt to open du Niang search, looking for a long time to find that there is a post that is very similar to me. Delete nagios.log, objects.cache and retention.dat, and then restart the nagios service
[root@nagios-server ~] # cd/usr/local/nagios/var [root@nagios-server var] # lltotal 36drwxrwxr-x. 2 nagios nagios 24576 Dec 3000:00 archives-rw-r--r-- 1 nagios nagios 0 Dec 30 17:27 nagios.lock-rw-rw-r-- 1 nagios nagios 0 Dec 30 17:27 nagios.log-rw-r--r--. 1 nagios nagios 0 Dec 30 17:27 objects.cache-rw- 1 nagios nagios 0 Dec 30 17:27 retention.datdrwxrwsr-x. 2 nagios nagcmd 4096 Dec 30 17:27 rwdrwxr-xr-x. 4 root root 4096 Oct 22 2015 spool-rw-rw-r-- 1 nagios nagios 0 Dec 30 17:27 status.dat
How can nagios.log, objects.cache, and retention.dat files all have a size of 0? Regardless of it, try to follow the post.
When we get here, the old bird should find out what the problem is. But I ignored this important detail and once again lost the opportunity to find the root of the problem!
[root@nagios-server var] # rm-fobjects.cache retention.dat status.dat [root@nagios-server var] / etc/init.d/nagiosstop [root@nagios-server var] / etc/init.d/nagiosstart
But with eggs, the fault remains the same. It's a little square at this time, and I don't know how to investigate. Check that nagios.cfg and nrpe.cfg are normal.
5. Restart nagios repeatedly and find another anomaly. The nagios service "doesn't seem to be started". Notice the quotation marks here.
[root@nagios-server etc] # / etc/init.d/nagios start Starting nagios: done. [root@nagios-server etc] # / etc/init.d/nagios statusnagios is not running
That is to say, although the startup command was executed successfully, it was strange to query the nagios status to indicate that it did not run. Is there something wrong with the nagios program?
Keep reading the journal. You can see from the log that nagios has been started repeatedly, and there may be multiple instances (instance) running.
[root@nagios-server etc] # tail-30/var/log/messagesDec 30 17:37:38 nagios-server nagios:Nagios 3.5.1 starting... (PID=50752) Dec 30 17:37:38 nagios-server nagios:Local time is Fri Dec 30 17:37:38 CST 2016Dec 30 17:37:38 nagios-server nagios:LOG VERSION: 2.0Dec 30 17:37:38 nagios-server nagios:Warning: Host 'xxx' has no services associated with itinerary Dec 30 17:37:38 nagios-server nagios:Finished daemonizing... (New PID=50753) Dec 30 17:39:21 nagios-server nagios:Nagios 3.5.1 starting... (PID=51146) Dec 30 17:39:21 nagios-server nagios:Local time is Fri Dec 30 17:39:21 CST 2016Dec 30 17:39:21 nagios-server nagios:LOG VERSION: 2.0Dec 30 17:39:21 nagios-server nagios:Warning: Host 'xxx' has no services associated with itinerary Dec 30 17:39:21 nagios-server nagios:Lockfile' / usr/local/nagios/var/nagios.lock' looks like its already held byanother instance of Nagios (PID 50753). Bailing out...
Take a look at the contents of nagios.lock.
[root@nagios-server etc] # echo / usr/local/nagios/var/nagios.lock
The file is empty.
Kill the nagios process
[root@nagios-server] # kill-9 50753 [root@nagios-server] # kill-9 50753-bash: kill: (50753)-No such process
Can not find the previous log, anyway N times restart, found a problem: in fact, the nagios process has been started, but when it started, did not write PID into / usr/local/nagios/var/nagios.lock!
Write the PID in manually
[root@nagios-server ~] # echo 44336 > / usr/local/nagios/var/nagios.lock [root@nagios-server ~] # / etc/init.d/nagiosstatus nagios (pid 44336) is running...
The original nagios startup script will check the contents of the nagios.lock, if empty, it will return nagios is not running, but the actual nagios process is started normally.
So after repeated reboots, there will be the phrase "nagios-server nagios: Lockfile'/ usr/local/nagios/var/nagios.lock'looks like its already held by another instance of Nagios (PID 50753) .Bailing out..." in the system log.
When you find this problem, kill all PID and restart nagios. Sure enough, there is no nagios.lock error in it. Replace the full screen of "The check of service 'MEM' on host' xxx' looks like it was orphaned (results never came back). I'mscheduling an immediate check of the service..."
Switch to Google, use this keyword to search nagios The check of service looks like it was orphaned (resultsnever came back), read more than N posts in the search results, and finally see a sentence, to see if the disk is out of space. Well, look at the disk.
[root@nagios-server spool] # df-hFilesystem Size Used Avail Use% Mounted on/dev/sda3 8.7G 8.3G 1.1m / tmpfs 935M 0935m / dev/shm/dev/sda1 194M 34M 151m 19% / boot [root@nagios-server spool] # df-iFilesystem Inodes IUsed IFree IUse% Mounted on/dev/sda3 577088 104974 472114 19% / tmpfs 239320 1 239319 1% / dev/shm/dev/sda1 51200 39 51161 1% / boot
The reason has been found! The free disk space is only 1.1m, and there is a reasonable explanation for the previous failure points. Why the nagios.log log is 0; why nagios.lock is empty. All the reasons why there is no space left on the disk!
[root@nagios-server nagios] # du-a / var | sort-n-r | head-n 106044444 / var5946296 / var/log5882592 / var/log/httpd3127308/var/log/httpd/error_log-201612252659808 / var/log/httpd/error_log66488 / var/lib54616 / var/lib/rpm45952 / var/log/httpd/access_log-2016122543136 / var/lib/rpm/Packages31512 / var/log/httpd/access_ log [root @ nagios-server nagios] # du-sh / var5.8G / var [root@nagios-server nagios] # cd/var/log/httpd/ [root@nagios-server httpd] # du-hsx * | sort-rh | head-103.0g error_log-201612252.6G error_log45M access_log-2016122531M access_log7.2M access_log-201612117.1M access_log-201612182.0M access_log-20161204532K ssl_request_log-20160630460K ssl_access_log-20160630140K ssl_request_log-20161024
Lesson:
Looking back at the previous / var/log/messages, there is actually very important news hidden in it!
Dec 28 12:30:01 nagios-server auditd [1032]: Audit daemon is low on disk space for logging
Dec 28 13:30:01 nagios-server auditd [1032]: Audit daemon is suspending logging due to low disk space.
Dec 28 14:54:30 nagios-server nagios:Warning: The check of host 'xxx' looks like it was orphaned (results never cameback). I'm scheduling an immediate checkof the host...
Dec 28 14:54:30 nagios-server nagios:Warning: The check of host 'xxx' looks like it was orphaned (results never cameback). I'm scheduling an immediate checkof the host...
The above is all the contents of the article "example Analysis of nagios Rookie faults". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.