What is the service hierarchical monitoring strategy of distributed system in big data's development? 04/25 Update SLTechnology News&Howtos

What is the service hierarchical monitoring strategy of distributed system in big data's development?

2025-04-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article is to share with you what the service hierarchical monitoring strategy of distributed architecture in big data's development is like. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.

I. distributed failure

The architecture of distributed system, business development, these are relatively easy to deal with under good ideas and design documentation specifications, and the relative here refers to the sudden failure of the production environment under the relatively distributed architecture.

In the actual development, there is such a very enchanting situation: the more complex the core of the business, the more worried about problems, the more likely to go wrong.

Therefore, when the link of the core service fails, how to locate the problem quickly is a headache, especially in some special cases, the problem is very vague and difficult to reproduce, coupled with the urging of customers or leaders. This kind of scenario is the shadow of most developers. What's more, it is possible that someone is responsible for the development of the pointcut where the problem occurs, but the actual problem is that it occurs on the request for other services on the link, and if this happens more often, the level of dumping will rise in a straight line.

The more complex the system, the more experienced the development or operation and maintenance, the more obsessed with the monitoring system, especially the full link monitoring, bottom layer, network, middleware, service link, log observation and early warning, etc., used to quickly locate the problem, save time and worry.

Second, full-link monitoring 1. Monitoring level

In distributed systems, the architecture and levels that need to be monitored are extremely complex, which are usually divided into three levels as a whole: application services, software services, and hardware services.

In general, operations and maintenance manage hardware services, develop management applications and software services.

2. Application service

The application layer serves the business logic developed, and it is also the most prone to sudden problems. When you stay in a company for a long time, because you have developed many business lines, you will feel that you are not a developer but a handyman. You have to spend a lot of time dealing with various problems every day. Application layer monitoring involves the following core modules:

Request traffic

Any service, highly concurrent traffic will expose a variety of service problems, especially the traffic of the core interface is the focus of monitoring.

Service link

When a problem occurs in a request, quickly determine the service in which the problem lies, or between which services, it is essential to deal with the problem quickly.

Log system

Core interface logging is also a necessary function, usually based on the analysis results of the log system, we can identify the outliers of the system and focus on optimization.

3. Software service

In order to solve various complex business scenarios of distributed systems, a variety of intermediate software is usually introduced to support it, such as necessary database, cache, message MQ and so on. Usually these middleware have their own monitoring and management ports.

Database: more use of Druid monitoring and analysis

Message queuing: common RocketMQ and console

Redis cache: provide commands to obtain relevant monitoring data

Some companies even develop an aggregation platform for managing operations and monitoring directly in the middleware layer, which makes it easier to analyze the problem as a whole.

4. Hardware service

At the hardware level, the three core contents that operation and maintenance pay most attention to: CPU, memory and network. The failure of the underlying hardware resources is more likely to be triggered by the upper application service or middleware service.

There are many mature frameworks for hardware-level monitoring, such as zabbix,grafana, etc. Of course, these components have rich functions, not only in the hardware layer.

5. Avalanche effect

Some faults lead to large area service paralysis, also known as avalanche effect, and there may be no rapid processing of the fault source and no circuit breaker mechanism, which leads to the collapse of the whole service link, which is a common problem, so when dealing with faults, we should learn to analyze the core fault points based on full stack monitoring information and global correlation, quickly cut off the failure of single point of service, and ensure the availability of the whole system.

Matters needing attention

Although the monitoring system plays a great role, it is still very difficult to build, we need to have a good awareness, not the feeling of business development, all aspects of requirements need to be dealt with, the basic strategy of monitoring system is as follows.

1. Selectivity

Not all environments for all services, and all interfaces need to be monitored, usually monitoring core links, core middleware, and service environments.

For example: transaction links, transaction libraries, and deployment environments; or high concurrency services from major customers that need to be dealt with in a timely manner if something goes wrong. To put it bluntly, revenue-generating services need to be focused on.

Even if there are problems in non-critical services, there is a buffer time, so there is no need to spend energy to add monitoring. When doing a monitoring system, there is such a sentence: simple link monitoring is easy to make mistakes; complex link monitoring is more complex and more prone to errors. However, this is to better solve the fault.

2. Independence

The failure of the monitoring system itself can not affect the normal business process, even if there is no monitoring information under certain circumstances, it can not affect the normal business service because of the monitoring service.

3. Integrity

The aggregated monitoring system can observe the global status of the monitoring link, so that it can quickly locate the fault coordinates and analyze the cause of the problem.

4. Early warning

For example, if the CPU suddenly rises, a middleware service stops suddenly, and the memory consumption is too high, you can make an early warning notification based on the monitoring system, and then notify the relevant person in charge by email or message to achieve the goal of rapid response. Most developers are familiar with this scenario and have a psychological shadow.

Thank you for reading! This is the end of the article on "what is the service hierarchical monitoring strategy of the distributed system in the development of big data". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, you can share it for more people to see!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.