What are the coping strategies of OpenFalcon in high concurrency scenarios 07/15 Update SLTechnology News&Howtos

What are the coping strategies of OpenFalcon in high concurrency scenarios

2025-07-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article introduces the relevant knowledge of "what are the coping methods of OpenFalcon in high concurrency scenarios". In the operation of actual cases, many people will encounter such a dilemma. Next, let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

Seven ways to deal with OpenFalcon in High concurrency scenarios

Data acquisition push-pull choice

For the monitoring system, it includes several core functions: data acquisition, historical data storage, alarm, and finally drawing display and data mining.

data acquisition

We want to do a unified thing in this area, so we need to collect a lot of monitoring indicators, such as Linux's own monitoring indicators, such as CPU, memory, Nic, IO, etc., and some hardware indicators, such as fan speed, temperature, etc., plus some open source software such as MySQL,Nginx,OpenStack,HBase.

There are many product lines within the company, and we want to collect the operation status of each service. For example, the middle-tier service has opened some RPC ports, and we all want to know what the latency and QPS are during the service process of each RPC port.

So we need to cover a lot of monitoring indicators, and there are only two people in the monitoring team at the beginning, so it is impossible to collect all the monitoring indicators-- manpower is not enough and the technical level is not up to standard.

So we need to build monitoring data together so that professional people can do professional things.

DBA students are familiar with MySQL, so they collect MySQL related indicators, distributed team students collect indicators such as HBase or Hadoop, and network group students collect switch and router indicators.

What the monitoring team needs to do is to develop a specification. Develop a mechanism for data entry, and then the company's developers and operators push the data according to the specification.

Data collection does not use pull method, because too many data sources, as a Client to connect the source side, collect data, and then close the connection, there will be a lot of time_wait status connections, and its throughput must be low, so you have to make yourself a Server end. There are now more than 100 million pieces of data in each cycle.

To explain briefly in this cycle, monitoring data is continuously reported. For example, some basic monitoring items of Linux are collected once a minute and then reported again the next minute. Some like business monitoring have 3 minutes, there are 5 minutes.

Therefore, the cycle refers to the data in which the amount of data is less than 5 minutes and there are reported actions.

Fast forwarding and fault tolerance of intermediate nodes

There are a lot of data to Push to the Server side, the front-end component of the Server side is that Transfer,Transfer receives the data and forwards it to the latter two components, one is Graph (for drawing) and the other is Judge (for alarm judgment).

First, create a Queue for each backend instance, such as 20 Graph instances and 60 Judge instances, and create a Queue for each instance, with 80 Queue in each Transfer memory

When a monitoring data comes in, you need to determine which instance the data is sent to, and then put it to the instance corresponding to the Queue. The RPC logic for Transfer to receive data is very simple. After getting the data, it will be sent to Queue and returned immediately.

The alarm link from Agent to transfer,transfer to judge,judge to Redis,Redis to alarm is relatively long. If you want to trigger the alarm as soon as possible, each link of the link wants to be handled relatively quickly. With Queue, the Transfer throughput is particularly large and will not slow down the whole link.

There is a problem when data is placed in Queue. If the data is not forwarded in time, for example, if the backend load is high or dead, Queue data will accumulate and crash will occur if memory is exceeded. Therefore, a simple protection measure is taken to set the Queue to a fixed length. If the data exceeds the Queue length, the data cannot be put in it.

After the data Push is sent to Queue, a special Queue is read into worker, and then forwarded. Forwarding is done by a bunch of write worker, where worker refers to goroutine. In this way, a bunch of goroutine work together to improve forwarding performance.

Consistent hash fragmentation to improve throughput

Consistent hash fragmentation

It is impossible for so much data to be processed by one machine, so a data slicing is done, that is, a machine only processes part of the data, the alarm data is sliced by Judge, and the drawing data is sliced by Graph, both of which use consistent hash.

There is a problem with consistent hash sharding, that is, it is troublesome to expand and scale up. After the data is called to a certain judge, it will be called all the way to this judge.

When the list changes, due to the consistent hashing algorithm, the data that originally hit one judge instance will be called to another Judge instance. Therefore, Judge must ensure that there is no state, so that it is easy to scale up and down.

Status marker

In fact, it is not appropriate to say that it has no state, and there are a few points in judge memory. For example, after the cpu.idle data of a certain machine comes up, it reaches a specific threshold for 3 ~ 5 times in a row before it goes to the alarm, rather than reaching the threshold immediately. Like CPU, it goes to the police only when it is busy all the time. When Judge judges, it judges multiple points.

After the alarm is generated, there are some follow-up processing, such as making a judgment on this alarm event, what is the state of the last event, how many times is the alarm, whether it has reached the maximum number of alarms that can no longer be reported, and avoid repeated alarms that cause interference to the processing staff. Generally speaking, people set up the alarm three times.

Therefore, the alarm event needs to record a status, mark whether it is healthy, and if there is a problem, how many times the record is currently. Judge status exists in the database, although 100 million pieces of data come up, in fact, there is not much alarm data, there may be only 100000 data levels, so it can be stored in the database, so Judge strips off the alarm event status.

Although there is still some of the data state mentioned above in Judge memory, it is not a big problem. Because monitoring data keeps coming up, even if some state is lost, the new Judge instance will soon be populated with data.

Expand capacity

Consistent hash is not very friendly to capacity expansion. For example, if 20 Graph machines become 40, part of the data from the old Graph instance will be typed into the new Graph, so we have done an automatic data migration.

This automatic migration is caused by consistent hashing. If you use the mapping table as a shard to maintain the correspondence between the data and graph instances in the mapping table, it will be much easier to expand.

Fast matching of data and policy

When the data comes up, to determine whether the data triggered the alarm strategy, there are many kinds of strategies, such as we now have tens of thousands of them in the system.

After coming up with a piece of data, it is necessary to determine which strategy the data is related to, and then determine what the threshold is and whether to trigger the alarm.

We choose to go to the database to synchronize all alarm policy lists, because the list of policies in the whole company will not be very large, although there is a large amount of data, but there are not many policies, such as only a few thousand or tens of thousands.

A simple index is made in the strategy database to facilitate the quick positioning of the strategy after the data comes up, and then according to the threshold to see if it needs alarm processing.

In order to speed up policy determination, you need to know some recent historical data, which is stored in Judge memory, so it is relatively fast to get it.

The first version was not put in memory when it was tested. At that time, 56 Redis instances were used, each of which was about 3000 QPS. At that time, the volume was small, and it still reached such a high QPS. Because of the high concurrency, putting data in Redis is not a good solution, and later put it in Judge memory. It's much faster to deal with it this way.

There is also Judge memory and DB in the alarm state. If each alarm judgment is time-consuming to access DB, it is loaded into Judge memory, which is equivalent to the cache mechanism, and the memory is no longer loaded by DB. When Judge restarts, there is no data in memory, so the judgment of alarm status needs to be loaded by DB, and then read into memory.

Delayed data writing to reduce the number of RRD file openings

Automatic archiving of time series data

It is well known that RRD,RRD is used to store alarm data. A large number of open source monitoring software stores time series data using RRD.

The biggest advantage of RRD is automatic archiving. Monitoring data has a characteristic, do not need to care about the specific value of historical monitoring data, just need to know the trend at that time.

The last 6 hours may have to look at the original value requirements. But the demand for the original value in the last 3 days and 1 month is very small. And that point is very many, one point a minute, there are 1440 points a day. No wonder the browser doesn't crash after loading for 1 month.

Because as long as you can see the trend at the historical point. Therefore, the archiving function is especially needed for monitoring data. For example, an hour of data is archived into a point, and RRD can do this for us.

RRD optimization: delaying merge writes

RRD default operation performance is relatively poor, its logic is to open the RRD file, then go to seek,write,seek,write, and finally close file handle. A monitoring indicator corresponds to a RRD file. For example, this machine processes about 2 million monitoring indicators, but there are actually 2 million RRD files.

Each time you write data, open the RRD file, read the header information, contain some meta information, record the archiving strategy, data type and other data, and then do the write operation.

If you do this after each data comes up, RRD reading and writing will result in a very large IO. Although the disks are now SSD, due to the high level of IO, an optimization has been made to delay writes.

Now the data are reported every minute, and some data may be reported every 10 or 30 seconds. Instead of reporting to open RRD file writing immediately, wait 10 or 30 minutes, cache in memory for a period of time, cache 30 or 60 points, open the file at one time, write once and then close it, so that the number of times RRD files are opened can be reduced, and the IO can be reduced.

We break up the data according to the cache time, such as caching for half an hour, and then according to the length of half an hour. For example, half an hour is 1800 seconds now. For example, we make it into 1800 slots in 1800 seconds, and each data is distributed to 1800 slots on average. Write slowly while writing, so as to do a scatter operation to avoid some peaks in IO.

Alarm events are queued according to priority

The number of alarm events is usually not very large, but it will be relatively large in individual cases-when some large area alarms are triggered, such as when a core switch is down, a particularly large alarm will be generated.

For example, the service depends on several upstream services, the upstream service is dead, and all downstream services are alarmed. If the core switch goes down, many core services will be affected, and after the core services are affected, many downstream services will generate an alarm, thus generating a large area alarm. But usually, the amount of alarm is a very stable quantity.

When a large area alarm appears, we still apply the Queue mechanism and use Queue to smooth the peak value.

Hierarchical treatment of alarm

We will grade the alarm, from P0, P1 to P5 P0 is the highest.

Priority-based grading strategy

P0 and P1 send text messages and send e-mails immediately after receiving the alarm; P2 does not send text messages immediately, but do an alarm merge; those low-level P3 and P4 do alarm merging, and do not send text messages at the same time, only send mail.

We make a rating for alarm events, and each level corresponds to a Queue in the Redis. Using Redis BRPOP to simply deal with alarm events according to priority, that is, first deal with P0, then deal with the sequence of P1 and P2.

Limit the flow of sending interfaces by limiting the number of Worker

The system itself may have long links, and each link wants to be able to withstand high concurrency. In fact, in the system development process, we rely on other infrastructure, such as internal SMTP servers to send mail, SMS channels to send text messages, but these interfaces can not bear a lot of concurrency.

When you want the system to rely on calling several interfaces with poor concurrency, it is best to make a current limit.

There is a special module Sender to send alarm mail or alarm SMS, it can be configured with a number of Worker when it is sent, which can be understood as the maximum number of threads to call the sending interface. In this way, the backend sending interface can be protected from being hung up.

This is the end of the content of "what are the ways to deal with OpenFalcon in high concurrency scenarios". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.