Introduction to the principle of Open-falcon 07/15 Update SLTechnology News&Howtos

Introduction to the principle of Open-falcon

2025-07-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/03 Report--

Open-falcon is Xiaomi's open source monitoring tool. There are three ways to install open-falcon, one is stand-alone installation (back-end and front-end installation, one server is recommended), one is Docker installation, and the last one is distributed installation on multiple machines.

Key points: this case introduces the first kind, stand-alone installation (in fact, it is divided into two servers, one is to install the back-end service, and the other is to install the front-end service).

Distributed installation is also very simple, is to git the open-falcon binary package, each server only leaves the required module folder and open-falcon to execute the script, then change the configuration file under the module folder, and finally start. Distributed deployment is generally recommended for build environments. See https://book.open-falcon.org/zh_0_2/distributed_install/ for more information.

Open-falcon monitoring generally uses a variety of plug-ins.

Architecture diagram:

Structure diagram of open-falcon official website

Picture on the Internet

Component description table:

Component name

Use

Service port

Remarks

Agent

Deployed on servers that need to be monitored

Http: 1988

Http://192.168.153.134:1988/

Transfer

The data receiver forwards the data to Graph and Judge at the back end

Http: 6060

Rpc: 8433

Socket: 4444

Graph

Manipulate rrd files to store monitoring data

Http: 6071

Rpc:6070

Query

Query each Graph data and provide a unified http query interface

Http: 9966

Dashboard

Query the web of the monitoring historical trend chart

Http: 8081

Python environment is required, and database dashborad instances and graph components need to be connected

Task

Load some scheduled tasks, full index update, junk index cleaning, self-component monitoring, etc.

Http: 8082

You need to connect to the database graph instance

Aggregator

Cluster aggregation module

Http: 6055

Alarm

Http: 9912

Api

API

Http: 8080

Gateway

Http: 16060

Hbs

Heartbeat server

6030

Judge

Alarm judgment

Http: 6081

Rpc: 6080

Nodata

Alarm exception handling

Http: 6090

Mysql

Database

3306

Redis

Cache server

6379

How it works:

Falcon-agent (client):

Each server has installed falcon-agent,falcon-agent is a golang-developed daemon program for self-discovered collection of stand-alone data and indicators, these indicators are not limited to the following aspects, a total of more than 200 indicators.

CPU correlation

Disk dependent

Load

Memory dependent

Network related

Port survival, process survival

Ntp offset (plug-in)

A process resource consumption (plug-in)

Collection of netstat, ss and other related statistical items

Machine kernel configuration parameters

As long as the machine with falcon-agent is installed, it will automatically start to collect various indicators and actively report. There is no need for users to do any configuration in server (which is very different from zabbix). The advantage of this is that it is easy for users to maintain and has high coverage. Of course, this will also cause great pressure on the server side, but the server components of open-falcon have high stand-alone performance and can be scaled horizontally at the same time, so it is a good thing to automatically collect enough data. For SRE and DEV, tracking problems afterwards is no longer a problem.

Falcon-agent, which can be found on our github: https://github.com/open-falcon/falcon-plus

Transfer (Transporter):

Falcon-agent reports the data to transfer, the long link established between them.

Transfer, receive the data sent by the client, do some data tidiness, check it, and forward it to multiple back-end systems for processing. When forwarding to each back-end business system, transfer will slice the data according to the consistent hash algorithm to achieve the horizontal expansion of the back-end business system.

Transfer provides both jsonRpc API and telnet API. Transfer itself is stateless, so it will not affect if you kill one or more. At the same time, transfer has high performance and can forward more than 5 million data per minute.

There are three types of business backends supported by transfer: judge, graph and opentsdb. Judge is a high-performance alarm decision component developed by us, graph is a high-performance data storage, archiving and query component developed by us, and opentsdb is an open source time series data storage service. It can be opened through the configuration file of transfer.

There are generally three data sources for transfer:

Basic monitoring data collected by falcon-agent

Falcon-agent executes the data returned by the user-defined plug-in

Client library: online business systems are embedded with a unified perfcounter.jar. Qps and latency of each RPC interface in the business system will be actively collected and reported.

Note: the above three kinds of data will first be sent to the local proxy-gateway, and then forwarded by gateway to transfer.

Judge cluster (alarm judgment):

After falcon-agent reports the data to transfer, transfer forwards it to Judge cluster, and uses consistent hash for data fragmentation. An instance processes only part of the data.

Graph cluster (data storage, specification, query interface):

After falcon-agent reports the data to transfer, transfer forwards it to Graph cluster, and uses consistent hash for data fragmentation. An instance processes only part of the data. Rrdtool stores data by archiving and provides RPC interface.

Alarm (alarm):

After being judged by Judge, put it in the redis queue. Alarm reads the alarm events from the redis queue for processing, sends SMS messages and emails, and the callback API is called back. Alarm merging is also done in alarm, the sender module that specializes in sending alarms, and the links module that alarm merging depends on.

Query:

Because Graph does excessive slicing, query uses consistent hash data slicing that is consistent with transfer. Provide a http interface to the outside. Query is a back-end module written by go.

Dashborad:

Query the monitoring data in dashborad, which is web made by python.

Portal:

Portal is the web made by python, which configures the monitoring policy and then writes it to the database.

Heartbeat server:

Heartbeat server, falcon-agent will send heartbeat to heartbeat server every minute and report its own version, hostname, ip and so on. Pull plug-ins and special collection items to be executed from heartbeat. This information requires heartbeat to access Portal's database to obtain. In order to make alarm judgment, Judge needs to read the alarm policy from the portal database. However, there are many Judge instances, so it will cause great pressure to read the database, so you can make heartbeat become the db cache cache, heartbeat reads data from the database to memory, and Judge calls the rpc API of heartbeat to obtain the alarm policy.

Data storage:

For the monitoring system, the storage and efficient query of historical data is always a very difficult problem!

Large amount of data: at present, in our monitoring system, there are about 20 million data reports per cycle (the reporting period is 1 minute and 5 minutes, accounting for 50% respectively). There is never a low peak in 24 hours a day, whether it is day or night. Every cycle, there is always so much data to update.

More write operations: general business systems, usually read more and write less, can easily use a variety of cache technologies, and all kinds of databases, the processing efficiency of query operations is much higher than that of write operations. On the other hand, the monitoring system is just the opposite, and the writing operation is much higher than reading. Tens of millions of updates per cycle cannot be completed for commonly used databases (MySQL, postgresql, mongodb).

Efficient check: we say that the monitoring system has few read operations, which is relative to writing. The monitoring system itself has high requirements for reading, and users often query hundreds of meitric data in the past day, week, month and year. How to return to the user and draw within 1 second is a big challenge.

Open-falcon has invested a lot of energy in this area. We divide the data into two categories according to the purpose, one is for drawing, the other is for users to do data mining.

For the drawing data, the query to be fast is the key, and the amount of information can not be lost. For users to query 100 metric, the amount of data was there in the past year, and it is difficult to return it in one second. In addition, even if it is returned, the front end cannot render so much data and has to sample, resulting in a lot of unnecessary consumption and waste. With reference to the concept of rrdtool, data is sampled and archived automatically every time the data is stored. Our archiving strategy is as follows: historical data is preserved for 5 years. At the same time, in order not to lose the amount of information, three copies will be saved according to the average sampling, the maximum sampling and the minimum sampling when the data is archived.

/ / 12 hours c.RRA ("AVERAGE", 0.5,720) / 5m one spot 2dc.RRA ("AVERAGE", 0.5,576) c.RRA ("MAX", 0.5,576) c.RRA ("MIN", 0.5,5576) / / 20m one spot storage 7dc.RRA ("AVERAGE", 0.5,20504) c.RRA ("MAX", 0.5,20,504) c.RRA ("MIN") C.RRA ("AVERAGE", 0.5,180,766) c.RRA ("MAX", 0.5,180,766) c.RRA ("MIN", 0.5,180,766) / one day 1yearc.RRA ("AVERAGE", 0.5720730) c.RRA ("MAX", 0.5720,730) c.RRA ("MIN", 0.5720,730)

For raw data, transfer will type a copy to hbase, or you can directly use opentsdb,transfer support to write data to opentsdb.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.