In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/03 Report--
Open-falcon is Xiaomi's open source monitoring tool. There are three ways to install open-falcon, one is stand-alone installation (back-end and front-end installation, one server is recommended), one is Docker installation, and the last one is distributed installation on multiple machines.
Key points: this case introduces the first kind, stand-alone installation (in fact, it is divided into two servers, one is to install the back-end service, and the other is to install the front-end service).
Distributed installation is also very simple, is to git the open-falcon binary package, each server only leaves the required module folder and open-falcon to execute the script, then change the configuration file under the module folder, and finally start. Distributed deployment is generally recommended for build environments. See https://book.open-falcon.org/zh_0_2/distributed_install/ for more information.
Open-falcon monitoring generally uses a variety of plug-ins.
Architecture diagram:
Structure diagram of open-falcon official website
Picture on the Internet
Component description table:
Component name
Use
Service port
Remarks
Agent
Deployed on servers that need to be monitored
Http: 1988
Http://192.168.153.134:1988/
Transfer
The data receiver forwards the data to Graph and Judge at the back end
Http: 6060
Rpc: 8433
Socket: 4444
Graph
Manipulate rrd files to store monitoring data
Http: 6071
Rpc:6070
Query
Query each Graph data and provide a unified http query interface
Http: 9966
Dashboard
Query the web of the monitoring historical trend chart
Http: 8081
Python environment is required, and database dashborad instances and graph components need to be connected
Task
Load some scheduled tasks, full index update, junk index cleaning, self-component monitoring, etc.
Http: 8082
You need to connect to the database graph instance
Aggregator
Cluster aggregation module
Http: 6055
Alarm
Alarm
Http: 9912
Api
API
Http: 8080
Gateway
Gateway
Http: 16060
Hbs
Heartbeat server
6030
Judge
Alarm judgment
Http: 6081
Rpc: 6080
Nodata
Alarm exception handling
Http: 6090
Mysql
Database
3306
Redis
Cache server
6379
How it works:
Falcon-agent (client):
Each server has installed falcon-agent,falcon-agent is a golang-developed daemon program for self-discovered collection of stand-alone data and indicators, these indicators are not limited to the following aspects, a total of more than 200 indicators.
CPU correlation
Disk dependent
IO
Load
Memory dependent
Network related
Port survival, process survival
Ntp offset (plug-in)
A process resource consumption (plug-in)
Collection of netstat, ss and other related statistical items
Machine kernel configuration parameters
As long as the machine with falcon-agent is installed, it will automatically start to collect various indicators and actively report. There is no need for users to do any configuration in server (which is very different from zabbix). The advantage of this is that it is easy for users to maintain and has high coverage. Of course, this will also cause great pressure on the server side, but the server components of open-falcon have high stand-alone performance and can be scaled horizontally at the same time, so it is a good thing to automatically collect enough data. For SRE and DEV, tracking problems afterwards is no longer a problem.
Falcon-agent, which can be found on our github: https://github.com/open-falcon/falcon-plus
Transfer (Transporter):
Falcon-agent reports the data to transfer, the long link established between them.
Transfer, receive the data sent by the client, do some data tidiness, check it, and forward it to multiple back-end systems for processing. When forwarding to each back-end business system, transfer will slice the data according to the consistent hash algorithm to achieve the horizontal expansion of the back-end business system.
Transfer provides both jsonRpc API and telnet API. Transfer itself is stateless, so it will not affect if you kill one or more. At the same time, transfer has high performance and can forward more than 5 million data per minute.
There are three types of business backends supported by transfer: judge, graph and opentsdb. Judge is a high-performance alarm decision component developed by us, graph is a high-performance data storage, archiving and query component developed by us, and opentsdb is an open source time series data storage service. It can be opened through the configuration file of transfer.
There are generally three data sources for transfer:
Basic monitoring data collected by falcon-agent
Falcon-agent executes the data returned by the user-defined plug-in
Client library: online business systems are embedded with a unified perfcounter.jar. Qps and latency of each RPC interface in the business system will be actively collected and reported.
Note: the above three kinds of data will first be sent to the local proxy-gateway, and then forwarded by gateway to transfer.
Judge cluster (alarm judgment):
After falcon-agent reports the data to transfer, transfer forwards it to Judge cluster, and uses consistent hash for data fragmentation. An instance processes only part of the data.
Graph cluster (data storage, specification, query interface):
After falcon-agent reports the data to transfer, transfer forwards it to Graph cluster, and uses consistent hash for data fragmentation. An instance processes only part of the data. Rrdtool stores data by archiving and provides RPC interface.
Alarm (alarm):
After being judged by Judge, put it in the redis queue. Alarm reads the alarm events from the redis queue for processing, sends SMS messages and emails, and the callback API is called back. Alarm merging is also done in alarm, the sender module that specializes in sending alarms, and the links module that alarm merging depends on.
Query:
Because Graph does excessive slicing, query uses consistent hash data slicing that is consistent with transfer. Provide a http interface to the outside. Query is a back-end module written by go.
Dashborad:
Query the monitoring data in dashborad, which is web made by python.
Portal:
Portal is the web made by python, which configures the monitoring policy and then writes it to the database.
Heartbeat server:
Heartbeat server, falcon-agent will send heartbeat to heartbeat server every minute and report its own version, hostname, ip and so on. Pull plug-ins and special collection items to be executed from heartbeat. This information requires heartbeat to access Portal's database to obtain. In order to make alarm judgment, Judge needs to read the alarm policy from the portal database. However, there are many Judge instances, so it will cause great pressure to read the database, so you can make heartbeat become the db cache cache, heartbeat reads data from the database to memory, and Judge calls the rpc API of heartbeat to obtain the alarm policy.
Data storage:
For the monitoring system, the storage and efficient query of historical data is always a very difficult problem!
Large amount of data: at present, in our monitoring system, there are about 20 million data reports per cycle (the reporting period is 1 minute and 5 minutes, accounting for 50% respectively). There is never a low peak in 24 hours a day, whether it is day or night. Every cycle, there is always so much data to update.
More write operations: general business systems, usually read more and write less, can easily use a variety of cache technologies, and all kinds of databases, the processing efficiency of query operations is much higher than that of write operations. On the other hand, the monitoring system is just the opposite, and the writing operation is much higher than reading. Tens of millions of updates per cycle cannot be completed for commonly used databases (MySQL, postgresql, mongodb).
Efficient check: we say that the monitoring system has few read operations, which is relative to writing. The monitoring system itself has high requirements for reading, and users often query hundreds of meitric data in the past day, week, month and year. How to return to the user and draw within 1 second is a big challenge.
Open-falcon has invested a lot of energy in this area. We divide the data into two categories according to the purpose, one is for drawing, the other is for users to do data mining.
For the drawing data, the query to be fast is the key, and the amount of information can not be lost. For users to query 100 metric, the amount of data was there in the past year, and it is difficult to return it in one second. In addition, even if it is returned, the front end cannot render so much data and has to sample, resulting in a lot of unnecessary consumption and waste. With reference to the concept of rrdtool, data is sampled and archived automatically every time the data is stored. Our archiving strategy is as follows: historical data is preserved for 5 years. At the same time, in order not to lose the amount of information, three copies will be saved according to the average sampling, the maximum sampling and the minimum sampling when the data is archived.
/ / 12 hours c.RRA ("AVERAGE", 0.5,720) / 5m one spot 2dc.RRA ("AVERAGE", 0.5,576) c.RRA ("MAX", 0.5,576) c.RRA ("MIN", 0.5,5576) / / 20m one spot storage 7dc.RRA ("AVERAGE", 0.5,20504) c.RRA ("MAX", 0.5,20,504) c.RRA ("MIN") C.RRA ("AVERAGE", 0.5,180,766) c.RRA ("MAX", 0.5,180,766) c.RRA ("MIN", 0.5,180,766) / one day 1yearc.RRA ("AVERAGE", 0.5720730) c.RRA ("MAX", 0.5720,730) c.RRA ("MIN", 0.5720,730)
For raw data, transfer will type a copy to hbase, or you can directly use opentsdb,transfer support to write data to opentsdb.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.