A scheme for capacity expansion of zabbix server 07/11 Update SLTechnology News&Howtos

A scheme for capacity expansion of zabbix server

2025-07-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Bao Guangya, the original author of this article, is a software development engineer of the basic platform department of Jingdong Mall. It is published on my blog with the consent of the author. If you need to reprint it, you need to obtain my consent.

I. Introduction

With the rapid growth of monitoring volume, zabbix administrators will one day find that the hard disk iops reached tens of thousands, approaching the limit of hard disk io, unable to support more monitoring data. This paper proposes a horizontal scaling scheme to increase the data io capability of zabbix system with minimal changes.

Considering that zabbix database io mainly consists of history table and trends table, this scheme distributes the io of history table and trends table to other hosts without increasing the number of zabbix servers. The advantage of this solution is that it maintains a single zabbix server and does not need to consider coordination among multiple servers. This database separation model can also be compatible with the original centralized model. However, since io is spread across multiple hosts, multiple database instances have to be accessed when data needs to be read or written. At the same time, the parts of the code that involve database reading and writing, including zabbix server and web api, need to be rewritten, but most of them can refer to existing code.

This solution is based on zabbix version 3.0.10. This article only deals with the transformation of zabbix server, the modification of the web api will be discussed separately, this article does not cover.

Second, zabbix data read and write mechanism

Since io of configuration data is much smaller than io of history and trends data, this scheme does not involve changes to configuration data.

cache and vc_cache are two variable names in zabbix source code, the former is used to store the original data from agent/proxy, the latter stores the data loaded from the database (when the data has expired, the new data will be copied directly from the former to the latter), used for trigger calculations, etc.

1. Writing history and trends

Poller and trapper processes (including pinger) are responsible for receiving history data from agents and proxies, flushing it into cache, and updating trends data in cache. Updates to cache are mainly implemented via the function process_hist_data.

The dbsyncer process is responsible for writing data from cache to the history and trends tables in the database. Because dbsyncer has multiple processes, the processes are coordinated through locks to avoid conflicts. Cache data warehousing is mainly implemented through two functions: DCsync_history and DCsync_trends.

Reading history and trends data

vc_cache allocates space at program startup, but does not load data. At this point, the poller and trapper processes have not yet started receiving data, so they do not write data to vc_cache.

After the program starts, when data is needed for calculation, it will try to get values from vc_cache. If it cannot get values, it will load data from history table into vc_cache. There are three functions in the source file that read values from the database and load them into vc_cache: vc_db_read_values_by_time, vc_db_read_values_by_count, vc_db_read_values_by_time_and_count. history and trends data deletion

The housekeeper process is responsible for removing stale data from the history and trends tables. Housekeeper is also responsible for deleting expired events, alerts, sessions, etc. database connection

zabbix Each process accesses the database through a single connection. The execution function of each query does not set the connection parameter, but maintains the connection through the global conn variable. If you want to access multiple databases, you can only increase the number of join variables or modify conn dynamically. watchdog

Watchdog process is responsible for monitoring database status and sending alarm messages when connection failure is detected.

III. Specific programmes and implementation

In the database, history tables are divided into five tables according to different data types: history, history_uint, history_str, history_text, and history_log. Trends tables are divided into two tables: trends and trends_uint. Following the idea of decentralized io, two scenarios can be considered. The first scenario is to distribute history and trends into two separate databases according to categories, and the other is to store each table independently into a single database according to categories and data types. The following is mainly discussed according to the first option.

overwrite configuration file

Add the required database connection parameters to the configuration file, as well as switches for switching between centralized and decentralized modes. The parsing of the configuration file takes place at program startup, so you also need to modify the startup program to add array elements that store database connection parameters and switch variables. Modify database connect function

On the basis of retaining the original connect function, a new connect with input parameters is added to establish different connections according to needs. Global variables are also added to maintain multiple connections. Modify database query functions

On the basis of keeping original query function, query function with connection parameter is added to dynamically transform query connection. There are multiple query functions in zabbix for different types of queries, all of which need to be modified. Call to function

In the functions mentioned above that involve reading and writing history and trends, the access part of the database needs to be modified to add conditional judgment on mode switches to call different functions. The logic of the mode switch shall ensure that the data storage mode can be switched between centralized and separated modes by restarting the service.

If you adopt the scheme of separating the database by monitoring data type, you also need to modify the sql text construction process. Modify watchdog logic

Change the original single instance status monitoring to multi-instance simultaneous monitoring, and alarm when any instance connection fails.

IV. Data consistency issues

One of the risks of the split model is data consistency issues. In centralized mode, zabbix coordinates cache access through mutex locks to ensure cache data consistency. When writing to a database, consistency is guaranteed through transactions. Because of the cache lock mechanism, database separation does not affect cache consistency, and the problem can only exist within the database.

If you adopt a category-based separation scheme, where history and trends data are stored in two separate databases, you need to consider consistency between history, trends, and other tables. If you adopt the scheme of separation by category + data type, you should also consider the data consistency between the tables of history and the consistency between the tables of trends.

By analyzing the transaction logic in the source code, the update operation of the history/trends table does not need to be consistent with other tables (at the database level), and the two parties can write the database independently if the program allows.

V. Further programmes

Following the idea of database separation, a more radical solution is to split each table in the history and trends data, and store the data in more databases according to a certain hash algorithm with itemid or clock as the key.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.