Real-time monitoring and early warning system without code 04/28 Update SLTechnology News&Howtos

Real-time monitoring and early warning system without code

2025-04-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Today, the editor brings you an article about making a real-time monitoring and early warning system without code. The editor thinks it's pretty good, so I'll share it for you as a reference. Let's follow the editor and have a look.

Why do you want to monitor?

Online release of the service, how to know that everything is normal, such as the release of 5 servers, how to intuitively understand whether there is a request to come in, access everything is normal.

At that time, the online library was configured to Beta, such a low-level mistake, it took more than a dozen people to arrange the wrong library all night.

One of the core services failed, resulting in a large number of errors, how to determine what went wrong.

The problem caused by SOA, the problem of invoking XX service, is very slow, can it be measured?

Due to the large number of business systems, a large number of system logs and business logs are generated every day. The log 400m generated by a server of single-stream business may want to view the content directly for a few minutes, and it is impossible to view the content at all. It brings a lot of inconvenience to development and operation and maintenance. Now the business is distributed and logs are distributed on each server, so it is even more inefficient to view logs and statistics. Real-time collection of logs distributed on different nodes or machines, for offline or online access and analysis to improve work efficiency is extremely urgent, in this context, the company's unified log platform for the preliminary architecture design.

In the information age, the value of the log is infinite. In order to effectively monitor, maintain, optimize and improve the system, it is inseparable from the collection and analysis of logs. Next, let's take a look at this set of unified log platform which is suitable for the existing business system in the spirit of "short and fast" Internet. It is generally divided into business log monitoring platform and software and hardware service monitoring platform.

Overall Design of Business Log platform

The above is the final architecture plan. The unified log monitoring system is responsible for centralizing all system logs and business logs, then uploading them to the log center (kafka cluster) through flume or logstash, and then providing real-time analysis and processing of the logs for Storm, Spark and other systems, or storing the logs directly to HDFS for offline data analysis and processing, or writing to ElasticSearch to provide data query. Or directly initiate abnormal alarm or provide indicator monitoring query.

According to the existing business volume, the above structure is a bit "heavy" and can be used as a future goal. At this stage, reference can be made to the following framework:

The above contents are mainly configuration-based and have no impact on the existing business. For Windows environment, you can use FileBeat to monitor the full and incremental uploaded logs of local logs. For some stable logs, such as system logs or framework logs (such as HAproxy access logs, system exception logs, etc.), write to the local directory local0 through rsyslog, and then logstash will upload the incremental logs in local0 to the log center according to its configuration. In Java environment, it can be sent directly to Logstash using log4j.

Log processing layer

Logs can be simply classified and processed in Logstash and then sent out.

We can aggregate the logs, build different indexes according to different businesses, and store them in ElasticSearch to provide queries. When the abnormal log is found, it is sent to the monitoring center to send an alarm to the corresponding business party, and the real-time performance of finding and pre-sending problems is improved. Count some index information such as access log or call log and send it to the monitoring center to grasp the relevant call trend. The call chain is starting to work, and the system performance bottleneck is clear at a glance.

Log storage layer

In ElosticSearch, topics (databases) are indexed according to different businesses, and types (tables) are built according to requirements in the business. Unneeded historical data can be persisted to HDFS as needed to reduce the pressure on ES.

Presentation layer Kibana

Kibana, a component of ELK, is an open source analysis and visualization platform for Elasticsearch, which is used to search and view data stored interactively in Elasticsearch indexes. Using Kibana, you can use a variety of charts for advanced data analysis and presentation.

Kibana makes huge amounts of data easier to understand. It is easy to operate, and the browser-based user interface can quickly create a dashboard (dashboard) to display the dynamics of Elasticsearch queries in real time.

Kibana can easily integrate data from Logstash, ES-Hadoop, Beats or third-party technologies into Elasticsearch, supporting third-party technologies such as Apache Flume, Fluentd, and so on.

Monitor the overall health status of ES directly query the content of ES index simple query filter log data window can be real-time graphical statistical display using ElastAlert to achieve log monitoring alarm

The platform lacks an alarm about the number of mysql connections, and specified services such as streaming service data are abnormal. When the exception is triggered, the relevant responsible personnel can be notified in time by SMS, email, etc.

Such as fault information:

The "log" mentioned above is not limited to log information, but can also be business data.

Design of Software and hardware Service Monitoring platform

When an exception is found in the business layer log, for example, when saving data to Mysql, it is often reported that the connection database timed out, and the problem was found only after a period of time when the operator found it and notified us, but it was no longer possible to reproduce the production environment at that time, so we rely on experience to guess whether the reason is the network problem of the server or the real connection of the database or the problem with the writing of the program. Therefore, it is necessary to monitor the software and hardware monitoring data of the production environment at that time.

After multi-consultation and reference to the monitoring schemes of major factories and comparison, Zabbix is used for monitoring here.

A recent overview of the overall problems of various services

Access performance for Web server and API, HAproxy, IIS, Tomcat

Real-time drawing monitoring server number of all TCP ports and number of MySql database connections, Redis performance

Custom aggregation shows the recent status of each table on the server, CPU, memory, traffic.

Show a health status of all servers, clear at a glance

Automatically register to monitor new servers

Alarm mechanism, Email, Wechat, SMS, etc.

Other characteristics

It can monitor the platform service status of Linux, Windows, printer, file system, network card device, SNMP OID, database and so on.

Allows flexible customization of problem thresholds, called trigger in Zabbix, and is stored in the back-end database.

Advanced alarm configuration, you can customize alarm upgrade (escalation), recipient and alarm method.

The historical data stored in the database can be configured with a built-in data cleaning mechanism.

The web front end uses php to access without barriers.

Zabbix API provider-level access interface that can be accessed quickly by third-party programs.

Flexible access system.

Combined with the above business and software and hardware logs to facilitate development and operation and maintenance to find real-time problems to improve the efficiency of problem solving, and the early stage can only configure 0 code to achieve monitoring and report display.

Expansibility

Spark can be used to analyze the data in real time, intelligently intercept abnormal data and send abnormal alerts directly.

Develop the early warning and monitoring system on the application system level based on Zabbix combined with your own business needs.

In the future, you can join Kafka to centralize logs. As to why kafka cluster is selected to build log center, the main reasons are as follows:

1. Distributed architecture, which can support horizontal expansion.

2. High throughput, which can process hundreds of thousands of messages per second on an ordinary server (much higher than our peak of 15000 messages per second).

3. Messages are persisted and stored in topic partitions to support repeatable consumption.

4. You can delete expired data periodically according to broker configuration.

The above are the specific steps of making a real-time monitoring and early warning system without code, the content is more comprehensive, and I also believe that there are quite some knowledge points that we may see or use in our daily work. Through this article, I hope you can gain more.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.