Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Zabbix 3.0-Chapter 8 Management alarm

2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/02 Report--

Chapter VIII Management alarm

In this chapter, you can learn about the configuration of Triggers (trigger) and Actions (action), and introduce the regular expression, alarm, alarm upgrade and so on of trigger in detail.

As a monitoring solution, alarm is an indispensable function. When the value of the monitoring item collected from the monitoring object meets the threshold set in the system, the alarm event is generated, according to the different types of alarm event, the corresponding alarm action is generated, alarm information is sent to the user, or commands are executed, and so on. The alarm process in Zabbix is shown in figure 7-1 below.

Figure 8-1

8.1 trigger

We know that monitoring data is collected through monitoring items in Zabbix, and then the data is stored in the database. Sometimes when there is a problem with a monitoring index, we want the system to notify us in time, which requires telling the system under what circumstances the data of the monitoring index is problematic, that is, the threshold. The trigger is to calculate the data of the monitoring index through the logical expression and compare its result with the defined threshold, so as to judge whether the state of the monitoring index is normal or not.

In Zabbix, no matter how complex the expression of the trigger is, the final result is either True or False. This result is directly related to the state of the trigger. When the result is False, the state of the trigger is OK, which means that the data is normal. When the result is True, the status of the trigger is PROBLEM, which means that there is a problem with the data, and this correspondence should be kept in mind.

If the time-based functions nodata (), date (), dayofmonth (), dayofweek (), time (), and now () are used in the trigger, Zabbix server will recalculate the trigger every 30 seconds. If both time-based functions and other functions are used, the operation will be redone every 30 seconds when a new value is received.

Recently, we have completed the recording and release of the video tutorial "zabbix 4.0", which is based on zabbix 4.2 and provides a comprehensive explanation of Zabbix. Welcome to watch. Course link: https://edu.51cto.com/sd/ce000

8.1.1 understanding the expression of a trigger

The expressions used in triggers are very flexible, and you can also create complex logical operations through them. Let's look at the format of a simple expression.

{: ()}

For example: {Testsrv: vfs.fs.size [/ var,pused] .delta} > 3

You can see that a trigger expression consists of two main parts:

The function applied to the monitoring item data.

Perform arithmetic and logical operations on the results of the function.

In the above example, the complete monitor item key, Testsrv:vfs.fs.size [/ var,pused] and the applied function delta are specified, and the operation results are compared with the constant 3 using the operator greater than sign (>).

We can reference multiple monitor items in the trigger expression, and each monitor item applies a function. You can also use the same monitor item twice in an expression, but specify the complete monitor item explicitly, for example:

{test:log [/ tmp/operations.log,10,skip] .nodata} = 1 or

{test:log [/ tmp/operations.log,10,skip] .str (error)} = 1

If operations.log does not receive a new line for at least 10 minutes, or if an error is found in the log file, the state of the trigger changes to PROBLEM.

In a trigger expression, you can reference monitoring items not only from the same host, but also from different hosts or proxy servers (if you can access them), for example:

{Proxy1:server1:agent.ping.last (0)} = 0 and

{Proxy2:server2:agent.ping.last (0)} = 0

If both the host server1 and server2 do not ping at the same time, the state of the trigger changes to PROBLEM. It works well even if the host is monitored by different proxy servers.

All functions that can be applied in Calculate checks can be applied in trigger expressions. For more detailed definitions of available functions, please refer to the official Zabbix website (https://www.zabbix.com/documentation/3.0/manual/appendix/triggers/functions).

Many trigger functions can receive seconds (seconds) and # num parameters, which have the following meanings:

Seconds: specifies a time period in seconds. The trigger performs a function operation using the value of the monitor item within the specified period. For example, sum (600) represents the sum of the values of the monitoring item in the last 600 seconds. 600 seconds can also be written as 10m (minutes) and 86400 can be written as 1d (days).

# num: specify the number of times. The trigger collects the value of the monitoring item using the most recent number specified for the function operation. For example: sum (# 5) represents the summation of the values collected for the last 5 times.

It is important to note that the meaning of # num is different when using the last function, and # num represents the last time. For example, the last five collected values of a monitoring item are 3 (latest), 7, 2, 6, 5 last (# 2) will return 7 last (# 5) will return 5.

Which parameter should we use in the trigger? It is recommended that in Passive (passive) monitoring mode, the data of various monitoring items are collected by Zabbix server on a regular basis. In this case, it is better to use seconds. If you modify the monitoring interval of the relevant monitoring items, it will affect the # num parameters, thus affecting the trigger. The use of seconds parameters may be closer to the actual monitoring, and you can more easily understand the definition of triggers in future trigger maintenance. For monitoring items that use active (active) monitoring, especially trapper monitoring items and log files, we cannot guarantee a stable and reliable monitoring interval, so using # num is usually the best choice.

The second parameter, time_shift, which allows data to be referenced from a period of time in the past, is also supported in the avg, count, last, min, and max functions. For example: avg (1hpjr 1d) will return the average of the previous day's hour. Here we should note that triggers only use historical data, so the historical data within the specified period must be accessed properly when using the time_shift parameter.

Zabbix supports the following operators in triggers (priority from highest to lowest) as shown in Table 8-1.

Table 8-1

Priority

Operator

Define

one

-

Unary minus operator (minus sign without operands on the left)

two

Not

Logical operator NOT

three

*

Multiply

/

Except

four

+

Add

-

Minus

five

=

Greater than or equal to

six

=

Equal to

Not equal to

seven

And

Logical operator AND

eight

Or

Logical operator OR

Note that the not, and, and or operators must be lowercase, and all operators except the unary minus and not operators are combined from left to right.

Let's combine some examples to better understand the use of triggers.

{www.zabbix.com:system.cpu.load [all,avg1] .last ()} > 5

Www.zabbix.com:system.cpu.load [all,avg1] specifies the complete monitor item, which means the monitor item system.cpu.load [all,avg1] on the host www.zabbix.com, uses the function last () to collect the most recent value, and uses the operator > to compare with the constant 5. If the last monitor item value is greater than 5, the state of the trigger becomes PROBLEM.

{www.zabbix.com:system.cpu.load [all,avg1] .last ()} > 5or {www.zabbix.com:system.cpu.load [all,avg1] .min (10m)} > 2

The state of the trigger changes to PROBLEM when the last CPU load average is greater than 5 or when the last 10-minute load average is greater than 2.

{www.zabbix.com: vfs.file.cksum [/ etc/passwd] .diff ()} = 1

The state of the trigger changes to PROBLEM when the previous checksum value of the monitoring / etc/passwd file is different from the most recent value.

{www.zabbix.com:net.if.in [eth0,bytes] .min (5m)} > 100K

The state of the trigger changes to PROBLEM when the number of bytes received by the network interface eth0 is greater than 100KB in the last 5 minutes.

{smtp1.zabbix.com: net.tcp.service[ SMTP] .last ()} = 0and {smtp2.zabbix.com: net.tcp.service[ SMTP] .last ()} = 0

The state of the trigger changes to PROBLEM when monitoring that SMTP services are stopped on different hosts.

{zabbix.zabbix.com:icmpping.count (30m 0m)} > 5

The status of the trigger changes to PROBLEM when the monitoring host fails ping more than 5 times in the past 30 minutes.

{zabbix.zabbix.com:tick.nodata (3m)} = 1

In the example, tick must use Zabbix tgapper monitoring mode. The host sends the value of tick to the server periodically through zabbix_sender. If no data is received in 3 minutes (180 seconds), the state of the trigger will change to PROBLEM.

{zabbix:system.cpu.load [all,avg1] .min (5m)} > 2and {zabbix:system.cpu.load [all,avg1] .time ()} > 000000 and {zabbix:system.cpu.load [all,avg1] .time ()} 2

The second parameter, time_shift, is used in the example, and the state of the trigger becomes PROBLEM if the average CPU load in the last hour is twice as long as it was at the same time yesterday.

{TemplatePfSense:hrStorageFree [{# SNMPVALUE}] .last ()} 5) + ({server2:system.cpu.load [all,avg1] .last ()} > 5) + ({server3:system.cpu.load [all,avg1] .last ()} > 5) > = 2

When the latest CPU load average of at least two of the three hosts exceeds 5, the state of the trigger changes to PROBLEM.

{zbserver:grpsum ["cluster", "proc.num [listener]", last, 0] .last (0)} /

{zbserver:grpsum ["cluster", "agent.ping", last, 0] .last (0)}

< 0.5 Aggregated 和Calculated 监控项在定义触发器时也是非常有用的。在例子中正常提供服务的服务器和可用服务器的比值低于0.5时触发器的状态就会变为PROBLEM。 有时候触发器必须在不同的情况下使用不同的条件,例如,机房温度超过20℃触发器状态变为PROBLEM,然后触发器一直处于PROBLEM状态,直到温度低于15℃恢复为OK状态。可通过定义下面的触发器来实现。 ({TRIGGER.VALUE}=0 and {server:temp.last()}>

20) or

({TRIGGER.VALUE} = 1 and {server:temp.last ()} > 15)

In this trigger definition we use {TRIGGER.VALUE}, {TRIGGER.VALUE} to return the value of the current trigger. A {TRIGGER.VALUE} of 0 is an OK state, and a {TRIGGER.VALUE} of 1 is a PROBLEM state. When the state of the trigger is OK, the result of {TRIGGER.VALUE} = 0 is 1, and the result of {TRIGGER.VALUE} = 1 is 0, so when the temperature exceeds 20 ℃, the state returned by the trigger is 1, that is, the PROBLEM state.

To deepen our understanding, let's take a look at this example. When the maximum value of disk free space in the last 5 minutes is less than 10GB, the state of the trigger changes to PROBLEM, and then the trigger remains in the state of PROBLEM until the minimum value of disk free space in the last 10 minutes is greater than 40GB and returns to OK state.

({TRIGGER.VALUE} = 0and {server:vfs.fs.size [/, free] .max (5m)} Trigger severities page, customize the name of the appropriate level, for example, Important. If you need to translate the custom name into the local language, you need to edit the / locale/zh_CN/LC_MESSAGES/frontend.po file and add 2 lines to the file:

Msgid "Important"

Msgstr "major failure"

Save after editing and execute the following command to generate frontent.mo.

# msgfmt-vfrontend.po-o frontend.mo

8.1.3 create a trigger

Before introducing the creation of triggers, let's take a look at the meaning of the parameters in the trigger configuration page. Click Configuration-> Hosts/Template-> Triggers page, and click the Create trigger button in the upper right corner to enter the trigger configuration page. This is shown in figure 8-2 below.

Figure 8-2

The parameters in the Trigger tag have the following meanings:

Name: trigger name. Macro variables are supported in the name, including: {HOST.HOST}, {HOST.NAME}, {HOST.CONN}, {HOST.DNS}, {HOST.IP}, {ITEM.VALUE}, {ITEM.LASTVALUE} and {$MACRO}. $1 million 2, … $9 can refer to a constant in an expression $1-$9 will be resolved to the value of the applied constant, for example, the name of the trigger is Processor load above $1 on {HOST.NAME}, and if the defined expression is {New host:system.cpu.load [percpu,avg1] .last ()} > 5, then the name of the trigger automatically becomes Processor load above 5 on New host when fired.

Expression: the logical expression used to evaluate the state of the trigger.

Multiple PROBLEM eventsgeneration: when selected, an event is generated each time the state of the trigger is determined to be PROBLEM. You need to use this in scenarios where alarm notifications are sent all the time for monitoring item failures.

Description: description of the trigger. It can include instructions to solve specific problems, contact information of the person in charge, and so on. You can also include macro variables like trigger names.

URL: if URL is defined, you can see a URL connection when you click the trigger name on the Monitoring-- > Triggers page. You can use {TRIGGER.ID}, several {HOST.*} macro variables, and user-defined macro variables. This is shown in figure 8-3 below.

Figure 8-3

Severity: sets the trigger alarm level.

Enabled: check this to enable triggers.

The Dependencies tag is shown in figure 8-4 below.

Figure 8-4

Click the Add connection in the Dependencies field, select the trigger you want to rely on in the pop-up page, and click the Select button.

After introducing the meaning of the parameters on the trigger configuration page, let's take a look at the steps of creating a trigger through the process of creating a trigger for the agent.ping monitoring item defined in the Zabbix server host.

1. Enter the name of the trigger in the Name field in the Trigger tag, such as Zabbixagent on {HOST.NAME} is unreachable for 5 minutes. We use the macro variable {HOST.NAME} in the name so that we can see which host triggered the alarm in the notification.

2. We fill in the expression in the Expression field. If you are familiar with expressions, you can enter them manually here. Or click the Add button on the right to select and set the parameters in the pop-up page. This is shown in figure 8-5 below.

Figure 8-5

In this example, the time parameter is set in the Agent.ping,Function defined in the monitoring item selection Zabbix server host, and the time parameter is set in Last of (T), and the constant is set in N. Click the Insert button to return. At this point we will see that there is already an expression in the Expression field. This is shown in figure 8-6 below.

Figure 8-6

3. Click the Expression constructor test expression under the Expression field, where you can also construct complex expressions. This is shown in figure 8-7 below.

Figure 8-7

Check the expression to be tested, click Test to enter the test page, select the return value of the expression VALUE is 1, click the Test button to see the result of the expression, TRUE is 1, false is 0. As shown in figure 8-8 below

Figure 8-8

4. Select MultiplePROBLEM events generation when you need to generate an alarm event every time.

5. Fill in the description information of the trigger.

6. Optionally enter the relevant URL of the trigger.

7. Select the alarm level of the trigger.

8. Check this box if the trigger is enabled.

9. When the trigger depends on other triggers, add it to the Dependencies tag page.

Click the Add button to save.

8.2 event

The Event (event) in Zabbix is based on a timestamp to record what happened at a certain time. The event itself is meaningless, but it has a very important position in the Zabbix alarm system, and the event is the basis for the system to generate actions, such as sending alarm messages.

The main types of events generated in Zabbix are as follows:

Trigger events (trigger event): whenever the state of a trigger changes (OK-> PROBLEM-- > OK), an event is generated, which is the most frequently used and important event source in Zabbix.

Discovery events (Discovery event): generates an event when a host or service is detected. For example, every time Service Up or Service Down,Host Up or Host Down,Host Discovered or Host Lost is detected by Zabbix.

Auto registration event (auto-registration event): generates an event when the active agent is automatically registered by the server.

Internal events (internal event): an event is generated when a monitor item or low-level discovery rule becomes unsupported (not supported) / normal (normal) and the state of the trigger becomes unknown / normal.

Click on the date and time of the event on the Monitoring-> Events page to view the details of each event.

8.3 actions

When the corresponding event is generated in the system, it is necessary to send a notice or execute a command, how to solve it in Zabbix? In fact, Zabbix provides independent Actions (actions) components that respond to various types of events generated in the system. Actions are completely independent of hosts and templates, and each action is defined globally.

Each action consists of three parts, which are:

Definition of action

Conditions for triggering an action

The operation performed by an action.

Create an action

In the upper right corner of the Configuration-> Actions page, click the Event source drop-down box to select the event source, and then click the Createaction button to enter the configuration page. This is shown in figure 8-9 below.

Figure 8-9

The parameters in the Action tag have the following meanings:

Name: the unique name of the action.

Default subject: default title, which can contain macro variables.

Default message: default information content, which can contain macro variables.

Recovery message: when checked, a recovery message will be sent when the fault returns to normal (PROBLEM-- > OK). It is important to note that to receive a recovery message, Triggervalue=Problem must be set in action conditions. Trigger value=OK does not need to be set, otherwise the recovery information will not be sent. The recovery information inherits acknowledgment state and history from the PROBLEM event (when using the {EVENT.ACK.HISTORY} and {EVENT.ACK.STATUS} macro variables). If you use the {EVENT.*} macro variable in the recovery information, it will be referenced from the PROBLEM event. The {EVENT.RECOVERY.*} macro variable can only be extended in recovery information and will be referenced from the recovery/OK event.

Recovery subject: recovery message title. Can contain macro variables.

Recovery message: restore information, which can contain macro variables.

Enabled: check this to enable this action.

On this configuration page, we can configure the action with a unique name and define a default information in which data related to a specific event can be referenced, such as the name of the host, monitor item or trigger, the value of the monitor item and trigger, and URL.

When we created the action, the system already used some macro variables in the default information. In fact, because the action is global, all macro variables defined in Zabbix can be used in the action's information. In addition, triggers can monitor multiple monitoring items from multiple hosts, and you can reference all the hosts and monitoring items involved (up to 9 different hosts or monitoring items). By using these macro variables, you can provide a wealth of information content, and you can learn more about the fault when you see the message sent.

The default information can be sent in a variety of ways, such as email, SMS, chat, etc. You can define different alarm sending methods in the action.

8.3.2 configuration of action conditions

The action executes an event only if it matches the defined conditions. In the Conditions tag page, we can define the conditions of event-based host, trigger, and trigger values, where we can combine AND/OR with different single conditions to form the conditions we need. This is shown in figure 8-10 below.

Figure 8-10

The parameters in the Conditions tag page have the following meanings:

Type of calculation: operation type (logical operator between condition). The options are:

AND: all conditions must be met simultaneously.

OR: just satisfy one of the conditions.

AND/OR: a combination of conditions. Different types of conditions use AND, and the same type of conditions use OR. For example.

Host group = Oracle servers

Host group = MySQL servers

Trigger name like 'Database is down'

Trigger name like 'Database is unavailable'

The result is:

(Hostgroup = Oracle servers or Hostgroup = MySQL servers) and (Trigger name like 'Database is down' orTrigger name like' Database is unavailable')

Custom expression: a user-defined calculation formula that must include all conditions (uppercase A, B, C, etc.), and or or (lowercase), as well as spaces, tabs, and parentheses. The example in AND/OR is represented as (An or B) and (C or D), and custom expressions can be in many forms, such as (An and B) and (C or D), (An and B) or (C and D), (An or B) and C) or D, and so on.

Conditions: the condition added.

New condition: select the conditions to add, and the items to choose from vary depending on the source of the event. The supported operators are:

= equal to

Not equal to

Like contains

Not like does not contain

Each time an action is created, two conditions are automatically added (which can be deleted):

Trigger value = PROBLEM: send messages only when PROBLEM. If you want to receive a recovery message, this condition must be set.

Maintenance status = notin maintenance: no information is sent during mainframe maintenance.

If the state of the trigger changes from OK to PROBLEM, the current state of the trigger is PROBLEM. If the state of the trigger changes from PROBLEM to OK, the current state of the trigger is OK.

8.3.3 configuration for performing actions

Through the settings in the Operations tag, you can define the actual action to be taken in the action, which is mainly composed of two aspects: one is to define the steps (steps) of the operation, and the other is to define the actual action in each step.

The simplest scenario is to define only one step in which the default information is sent to a set of defined recipients. However, in the real world, specific requirements will make the scenario more and more complex, requiring multiple steps and operations to be defined.

Click the Operations tab and click the New link in Actionoperations, and the configuration page is shown in figure 8-11 below.

Figure 8-11

The parameters in the Operations tag page have the following meanings:

Default operation step duration: the default time interval for each operation step, with a minimum of 60 seconds.

Action operations: displays the actions defined in the action.

Operation details: detailed configuration of each operation.

Step: define the steps for the operation. As shown in the figure above, it is from 1 to 1. It is important to note that a definition of 0 means that this step will continue until the state of the trigger sends a change.

Step duration: the interval used in the step. The minimum is 60 seconds. If set to 0, the value defined in Default operation step duration is used. Multiple operations can be assigned to the same step, using the shortest time interval defined if they use different Step duration.

Operation type: the operation type is send message.

Send to user groups: click Add to select the user group that receives the message. This user group must have at least read permission on the host.

Send to users: click Add to select the user to receive the message. This user must at least have read permission on the host.

Send only to: send messages to all defined media (alarm methods) or selected users.

Default message: if selected, the default information of the definition is sent.

Subject: the title of the custom message.

Message: customize the content of the information, in which macro variables can be used.

Operation type: operation type is remote command

Target list: choose to execute the command on the current host (current host), other host, or host group.

Type: the type of selected command: IPMI, Custom script, SSH, Telnet, or Global script.

Execute on: execute on Zabbix agent or Zabbix server.

Commands: the command to be executed.

Conditions: the condition for performing an action. Not Ack is executed only when there is no response to the event, and Ack is executed only after the event is responded to.

Definition of actions in different events

The actions that can be defined in all events are:

Send message (sendmessage)

Execute remote commands (remotecommand)

The actions that can be defined in the Discovery event are:

Add host (add Host)

Remove host (delete host)

Enable host (enable host)

Disable host (disable host)

Add to group (add to Group)

Delete from group (removed from the group)

Link to template (link to template)

Unlink to template (unlink to template)

Set host inventory mode (set the mode of host asset records)

The actions that can be defined in the auto-registration event are:

Add host (add Host)

Disable host (disable host)

Add to group (add to Group)

Link to template (link to template)

Set host inventory mode (set the mode of host asset records)

Configuration for sending messages

Sending alarm messages is the most simple and effective way to notify managers when a fault occurs, and it is also the most widely used method in Zabbix.

In order to send and receive information normally, you first need to define the media (alarm method) and the information receiver (user) who uses the media, and then you need to configure the action. The recipient of the information must have read access to the events generated by these hosts and be able to access the expression of the trigger normally, otherwise the message will not be sent successfully.

The information sent can be viewed in the Monitoring-> Events page. You can see the statistics of the completed action in the Actions column of the page. The green number indicates that the message is sent, and the red In progress indicates that the action,Failed is in progress and indicates that the action failed. If you click the link for the date and time of the event, you can see the success or failure of sending the details of the event in the Message action on the event details page. You can see the details of all the actions on the Reports-> Action log page.

In the Operations tab, click the New link to add step and select Operation type as Send message, as shown in figure 8-12 below.

Figure 8-12

In figures 8-12 above, step 1 to 5 are configured to send messages every 10 minutes.

8.3.3.1 configuration of remote commands

When the conditions are met, you can execute pre-defined commands on the monitored host to complete some specific tasks, such as:

Some applications (web server, middleware, CRM, etc.) restart automatically when they do not respond.

Restart the server using IPMI when the server is not responding.

Clear the useless files to free up space when the disk space is full.

Migrate virtual machines from one physical machine to another according to the CPU load.

When resources in a cloud infrastructure (CPU, memory, disk, etc.) add or remove new nodes.

The action of configuring a remote command is similar to that of sending information, except that the type of operation defined is different. There are some restrictions when configuring the actions of remote commands: remote commands cannot exceed 255characters; active agent mode is not supported; and remote commands are not supported on Zabbix agent monitored by Zabbix proxy (if necessary, it is recommended to connect directly to agent from Zabbix server). Remote commands can contain macro variables or execute multiline commands. When you need to support the command (customscripts) on Zabbix agent, you must set the EnableRemoteCommands parameter to 1 in zabbix_agentd.conf and restart the zabbix agent service for the configuration to take effect.

Let's take a look at the steps to configure remote commands through an example.

In the upper right corner of the Configuration-> Actions page, click the Event source drop-down box to select the event source, and then click the Createaction button to enter the configuration page. Configure the name and default information in the Action tag and the conditions in the Conditions tag, as shown in figure 8-13 below.

Figure 8-13

Add an action to the Operations tag, and operationtype selects remote command. As shown in figure 8-14 below.

Figure 8-14

In this example, Zabbix will restart the Apache service with the command sudo/etc/init.d/apache restart, and to confirm that the command can be executed on Zabbix agent, you need to configure the zabbix user to be able to execute sudo.

# visudo

Zabbix ALL=NOPASSWD: ALL # allows zabbix users to run all commands without a password

Or configure zabbix users to only restart the apache service

Zabbix ALL=NOPASSWD: / etc/init.d/apache restart # only allows you to restart apache services

8.3.3.1 configuration of alarm upgrade

Even if an action depends on a single event, it does not mean that it can only perform a single operation. In fact, it can perform any number of operations, even for an infinite period of time or until the conditions for performing the action change. To do this, we need to configure Escalation (alarm upgrade). Alarm messages or automatic execution commands can be sent at the same time in multiple Escalationstep, alarm messages can be sent to different user groups or users, or the same information can be sent repeatedly periodically before the fault is not resolved or the event does not have acknowledged (response).

Flexible alarms can be achieved through Escalation configuration, and we can achieve:

Notify the user as soon as the failure occurs.

Alarm notifications can be sent repeatedly until the fault is resolved.

Delay sending alarm notification.

Alarm messages can be sent step by step, and can be set to be sent to department managers or more advanced users according to the situation.

Remote commands can be executed immediately after a failure, or after a failure has not been resolved for a period of time.

Failure recovery information can be sent.

Let's take a look at two examples. The configuration of example 1 is shown in figure 8-15.

Figure 8-15

From figure 8-15 above, we see that step 1 is executed immediately, sending a message to a user group, and then delaying the next steps by 1 minute. Start executing step 2 in one minute, and execute remote commands on the host. The default interval of 3600 seconds is used in step 2, so step 3 is executed after an hour. The configuration of steps 3, 4, and 5 is the same, and messages are sent to the user group every 10 minutes. Step 6 starts 30 minutes later, and in this step we add a condition Event acknowledged = Not Ack, so a message is sent to the admin user when the event does not respond. You may notice that the next step in step 6 is step 0, which means to execute this step forever, and it will be executed repeatedly according to the Duration (sec) interval set in the step until the state of the trigger changes. But if you do not use the condition Trigger value = PROBLEM in the action, step 0 will always be executed even when the trigger state changes to OK, so be careful when setting step 0.

Example 2 configuration is shown in figure 8-16 below.

Figure 8-16

The default interval of steps in the action is 1800 seconds. After execution, Step 1-4 sends messages to MySQL administrators at 0:00, 0:30, 1:00 and 1:30, respectively. Step 5 and 6 send a message to Database manager at 2:00 and 2:10 (step 6 does not send a message at 3:00 because the 600th second set later in step 5-7 overrides the setting in step 5-6). Step 5-7 sends messages at 2:00, 2:10, 2:20 (600s when step duration is set). Step 11 sends a message at 4:00 (step 8-11 uses the default interval of 1800 seconds).

You need to pay attention to the following when using alarm upgrades in a real environment:

When the host enters the maintenance state, the operations defined in the actions being performed continue to be performed, and the maintenance state only affects the actions and has no effect on the operations defined in the actions.

When the Timeperiod defined in the action ends, the operation defined in the action being performed continues, and the Timeperiod only affects the action and has no effect on the action defined in the action.

When a failure occurs in the maintenance state and has not been resolved after the end of the maintenance state, the alarm upgrade steps defined in the action begin to be performed after the end of the maintenance.

When a failure occurs during the maintenance without data and is not resolved after the maintenance is completed, the defined alarm upgrade steps can not be performed until the trigger is triggered.

When the configuration of different alarm escalation steps overlap, the execution of each new alarm upgrade will replace the previous alarm upgrade, no matter how many steps there are, at least one step.

When actions are disabled, events based on triggers are deleted, triggers are disabled or deleted, hosts or monitoring items related to triggers are disabled, monitoring items are disabled or deleted, and hosts are disabled during the alarm upgrade, messages being sent and other information configured in the alarm upgrade are sent. It's just that (NOTE: Escalation cancelled) will be added to the message sent later, for example, when the action is disabled, NOTE: Escalation cancelled: action''disabled will be added to the message to notify the user to cancel the alarm upgrade. The reason for the cancellation can also be viewed from the log file by setting Debug Level = 3.

After the action is deleted during the alarm upgrade, the message will not be sent. Setting Debug Level = 3 can be viewed from the log file, such as escalationcancelled: action id:334 deleted.

This article is from http://ustogether.blog.51cto.com/8236854/1929384. If you need to reprint it, please contact the author.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report