Monitoring weapon Nagios II: detailed introduction of Nagios and monitoring private information of external servers 07/07 Update SLTechnology News&Howtos

Monitoring weapon Nagios II: detailed introduction of Nagios and monitoring private information of external servers

2025-07-07 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/03 Report--

Monitoring weapon Nagios II: detailed introduction of Nagios and monitoring private information of external servers

The definition of the monitoring object of Nagios is separate from the action command of the monitored device. One is the main package and the other is the plug-in package. First, let's talk about the use of the nagios main program, the nagios-plugins plug-in and the nrpe software.

Nagios main program:

The Nagios main program provides only a monitoring platform, and what is really used for work is not it, but its plug-ins. After Nagios is installed, the default configuration file is in the / usr/local/nagios/etc directory. It defines the commands that nagios can call, the monitoring of printers and switches, and the connection between defined templates and alarms.

Usually, when you are ready to monitor a service, it is defined in the main program of nagios. The Localhost.cfg configuration file is the default monitoring definition configuration file of the nagios system, which is all defined templates written by the authors of nagios. However, in the actual work, it will not be defined here, but additional configuration files will be written to distinguish them. The purpose is to facilitate management. Hosts.cfg and services.cfg configuration files are usually written in the object directory to define host and service information.

Nagios-plugins plug-in package:

The Nagios-plugins plug-in is used to provide action commands to the monitored host, such as check_tcp check_load by default. Generally speaking, there are characteristic monitoring action commands when monitoring external host services, but there are no script files for these action commands in the default action command directory. This requires us to define it, which can only be used after definition. Where will these plug-ins be used by nagios? Nagios has many cfg files that define a wide variety of information, among which hosts.cfg and services.cfg (usually these two, but also other profiles that define hosts and services) are used to define host and service information. These plug-ins are used here.

Nrpe package:

The Nrpe package is an extension of the plug-in. When monitoring the private information of the external host, the client needs to retrieve the local private information according to the request sent by the server, while retrieving the local private information requires a variety of monitoring action commands, which are not available, so you need to install it. Usually the nrpe plug-in package is used to monitor private information, such as monitoring the hard disk information of the external host.

Detailed description of the installation directory of Nagios

Nagios is installed in usr/local/nagios/ by default. After installation, the five directories etc, bin, sbin, share and var will be generated under its directory.

The purpose of each directory of Nagios is described as follows:

Ls / usr/local/nagios

Catalogue

Action

Bin

Directory where the Nagios executable is located

Etc

Directory where the Nagios configuration file is located

Sbin

The directory where the Nagios CGI file is located, that is, the directory where the files required to execute the external command are located

The directory where the Nagios web page file is located

Libexec

Directory where the external plug-in for Nagios is located

1. Etc directory

Tree / usr/local/nagios/etc/ ├── cgi.cfg ├── htpasswd.users ├── nagios.cfg ├── nagiosgraph.cfg ├── objects │ ├── commands.cfg │ ├── contacts.cfg │ ├── hosts.cfg │ ├── localhost.cfg │ ├── printer.cfg │ ├── service.cfg │ ├── switch.cfg templates.cfg timeperiods.cfg │ └── windows.cfg └── resource.cfg

The nagios.cfg and objects directories are stored in etc, and the configuration files defined by nagios are stored in the objects directory, including definition master configuration files, variable definition files, command definition files, as well as host and service definition files.

Nagios.cfg file

The default path of nagios.cfg is / usr/local/nagios/etc/nagios.cfg, which is the core configuration file of nagios. All object configuration files must be defined in this file to play their role. Here, you only need to reference the object configuration file in the Nagios.cfg file.

Ls / usr/local/nagios/etc/objects/

Catalogue

Action

Cgi.cfg

Profile that controls CGI access

Nagios.cfg

Nagios main configuration file

Resource.cfg

Variable definition files, also known as resource files, define variables in these files so that they can be referenced by other configuration files

Objects

Objects is a directory in which there are many configuration file templates that define Nagios objects

Objects/commands.cfg

Command definition configuration file, where the defined commands can be referenced by other configuration files

Objects/contacts.cfg

Define profiles for contacts and contact groups

Objects/localhost.cfg

Define a configuration file to monitor the local host

Objects/templates.cfg

A template profile that defines hosts and services, which can be referenced in other configuration files

Objects/timeperiods.cfg

A configuration file that defines the Nagios monitoring period

Objects/windows.cfg

A profile template for monitoring Windows hosts, which is not enabled by default

Explanation:

Several definitions involved in the configuration of nagios are: host, host group, service, service group, contact person, contact group, monitoring time, monitoring command and so on. From these definitions, we can see that the configuration files of nagios are related to each other and refer to each other. To successfully configure a nagios monitoring system, it is necessary to find out the relationship between dependencies and dependencies of each profile.

When you see the above directory, the directory state of the defined service is too messy. Therefore, we usually define our own files instead of their local configuration files, which can explain the problem more clearly. For convenience of maintenance, we usually create separate configuration files for each defined object of nagios, as shown below:

Create a hosts.cfg file to define hosts and host groups

Create a services.cfg file to define the server

Use the default contacts.cfg file to define contacts and contact groups

Define life with the default commands.cfg file

Use the default timeperiods.cfg to define the monitoring period

Use the default templates.cfg file as the resource reference file

Important document recognition:

(1) hosts.cfg file

Hosts.cfg is mainly used to specify the address of the monitored host and related attribute information.

(2) services.cfg file

Services.cfg files are mainly used to define monitoring services and host resources, such as monitoring http services, ftp services, host disk space, host system load, and so on. Nagios-Server and Nagios-Windows-related services have been defined in the corresponding configuration files, so you only need to define Nagios-Linux-related services. Here, only one service is defined to verify the correctness of the configuration file. The definitions of other services will be discussed later.

(3) commands.cfg file

The commands.cfg directory is mainly used to define commands, some basic action commands are defined in the nagios-plugins plug-in package, and some commands need to be defined by us. The use of all commands must be defined offline in this directory, and you can use

2 、 libexec

Tree / usr/local/nagios/libexec/ ├── check_apt ├── check_pop-> check_tcp ├── process_perfdata.pl... ├── utils.pm └── utils.sh

Among the various action commands stored in the libexec directory, there are only two action commands in this directory before the Nagios-plugins plug-in package is installed. after the plug-in package is installed, the action commands defined by the plug-in package will be generated in this directory. All action commands are defined in the commands.cfg file mentioned above, and when you redefine a command, you need to compile a script file with the same name in the libexec directory. This command will not take effect until the script file has been compiled.

The extension generated by the Nrpe package is also in the libexec directory. The contents of this directory are explained as follows:

Monitoring object

Monitoring threshold

Master

Machine

Capital

Source

Host Survival: check_ping

-w 3000.0c80%-c 5000.0100%-p 5 (within 3000 millisecond response time, the packet loss rate exceeds 80% to report warning, and within 5000 millisecond response time, the packet loss rate exceeds 100% to report emergency, and a total of 5 packets are sent)

-w 5-c 10 (w for warning, c for critical)

System load: check_load

-w 15pyrrine 10pr 5-c 30pr 25pr 20 (warning or emergency if 1 minute, 5 minutes, 15 minutes is greater than the corresponding number of waiting processes)

Disk occupancy: check_disk

-w 15pyrrine 10pr 5-c 30pr 25pr 20 (warning or emergency if 1 minute, 5 minutes, 15 minutes is greater than the corresponding number of waiting processes)

Script detects disk I/O:check_iostat

-w 20%-c 10%-p / (the remaining space in the root partition is a 20% warning of the total size, 10% critical, followed by the root partition)

Detect zombie processes: check_zombie _ procs

-w 5-c 10-s Z (5 zombie process warnings, 10 critical)

Total processes detected: check_total_procs

-w 150-c 200 (total process to 150 warnings, 200 critical)

Script detects memory remaining: check_mem

-w 90%-c 95% (memory idle rate more than 90% warning, more than 95% critical)

Detect swap partition usage: check_swap

-w 20%-c 10% (20% warning of the total size of the remaining space in the swap partition

10% critical)

Should

Use

Take

Business

Flow

Quantity

Supervision

Control

Monitoring service port: check_tcp

-H localhost2-p 80 (host and corresponding port number)

Monitoring page response time: check_http

-H localhost2-u http:\ /\ / localhost2/test.jsp-w 5-c 10 (check page, report warning for more than 5s, report emergency for more than 10s)

Number of IP connections detected by script: check_ips

-w 200-c 250 (IP connections over 200 warnings, over 250 critical)

Monitoring server traffic: Check_traffi

-V 2c-C public-H localhost2-I 2-w 12Magi 30-c 15jade 35-M-b (snmp version, user, mainframe, corresponding Nic, warning threshold, critical threshold)

One: experimental objectives

1. Monitor the external server NFS, and the server acts as the client

2. Monitor the MySQL service of the external server

3. Monitor external server httpd

4. Monitor the private information of external servers

Second, the experimental environment

VMare

Action

Hostnam

Ip address

Installed softwar

RHEL-6.5

Server side

Yu61

192.168.1.61

Nagios software, nagios plug-ins, nrpe,LAMP environment, NFS

RHEL-6.5

Client

Yu62

192.168.1.62

Nagios plug-ins, nrpe, mysql-server, IO

RHEL-6.5

Client

Yu63

192.168.1.63

Nagios plug-ins, nrpe, NFS, Http,

# all servers need to turn off the firewall

Three: experimental steps

Actual combat: monitoring private information of external servers

1. Modify the configuration file

[root@yu61 objects] # vim hosts.cfg # # add define host {use linux-server host_name IO63 alias IO Service address 192.168.1.63 icon_p_w_picpath switch.gif statusmap_p_w_picpath at the end Switch.gd2 2d_coords 100200 3d_coords 100200100}

[root@yu61 objects] # cat service.cfg # # add # check_server_IO-63##define service {use local-service host_name IO63 at the end Service_description Root Partition check_command check_nrpe!check_sda2} define service {use local-service host_name IO63 service_description Total Processes check_command check_nrpe!check_total _ procs} define service {use local-service host_name IO63 service_description Current Load check_command check_nrpe!check_load}

2. Check configuration and restart services

[root@yu61 ~] # / etc/init.d/nagios checkconfig Total Warnings: 0Total Errors: 0 [root@yu62 ~] # service httpd restart

3. Test and view hosts and services

4. Generate nrpe.cfg

[root@yu63 nrpe-2.12] # make install-daemon-config [root@yu63 nrpe-2.12] # ls / usr/local/nagios/etc/nrpe.cfg / usr/local/nagios/etc/nrpe.cfg

5. Install xinetd service management nrpe

[root@yu63 nrpe-2.12] # rpm-ivh / mnt/Packages/xinetd-2.3.14-39.el6room4.x8664.rpm [root @ yu63 nrpe-2.12] # cat / etc/xinetd.d/nrpeservice nrpe {server = / usr/local/nagios/bin/nrpe server_args =-c / usr/local/nagios/etc/nrpe.cfg-- inetd log_on_failure + = USERID disable = noonly_from = 127.0.0.1 192.168.1.61}

[root@yu63 nrpe-2.12] # vim / etc/services # # Service port is added at last

[root@yu63 nrpe-2.12] # / etc/init.d/xinetd restart [root@yu63 nrpe-2.12] # netstat-antup | grep 5666tcp 0 0: 5666:: * LISTEN 62841/xinetd

6. Modify the file and specify the monitoring standard

[root@yu63 nrpe-2.12] # vim / usr/local/nagios/etc/nrpe.cfg## add the content command [check _ sda1] = / usr/local/nagios/libexec/check_disk-w 38%-c 35%-p / dev/ sda1 command [check _ sda2] = / usr/local/nagios/libexec/check_disk-w 42%-c 43%-p / dev/ sda2 command [check _ swap] = / usr/local/nagios/libexec/check _ swap-w 20%-c 10% [root@yu61 objects] # / usr/local/nagios/libexec/check_nrpe-H 192.168.1.63 NRPE v2.12

7. Check configuration and restart services and test

[root@yu61 ~] # / etc/init.d/nagios checkconfig Total Warnings: 0Total Errors: 0 [root@yu62 ~] # service httpd restart

8. View the effect of fluctuating disk utilization

# # when memory utilization reaches the test value, there will be a state of emergency

[root@yu63 nrpe-2.12] # df-hFilesystem Size Used Avail Use% Mounted on/dev/sda2 20G 4.8G 14G 26% / tmpfs 750M 224K 750M 1% / dev/shm/dev/sda1 4.9G 162m 4.5G 4% / boot/dev/sr0 3.6G 3.6G 0100% / mnt [root@yu63 ~] # dd if=/dev/zero of=a.txt count=100 bs=40M [root@yu63 ~] # df-h Filesystem Size Used Avail Use% Mounted on/dev/sda2 20G 6.3G 12G 35% / tmpfs 750M 224K 750M 1% / dev/shm/dev/sda1 4.9G 162m 4.5G 4% / boot/dev/sr0 3.6G 3.6G 0100% / mnt

[root@yu63 ~] # df-h Filesystem Size Used Avail Use% Mounted on/dev/sda2 20G 7.2G 12G 40% / tmpfs 750M 224K 750M 1% / dev/shm/dev/sda1 4.9G 162m 4.5G 4% / boot/dev/sr0 3.6G 3.6G 0100% / mnt

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.