Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Structure and orientation of three-dimensional operation and maintenance

2025-04-14 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/02 Report--

Write at the front

As more and more enterprises apply to the cloud, the scale and complexity of cloud applications are increasing day by day, which poses new challenges to the operation and maintenance of cloud applications. Huawei Cloud AOM Service is oriented to the operation and maintenance of large-scale enterprise applications. In practice, Huawei Cloud AOM Service has evolved and constructed a complete set of three-dimensional operation and maintenance system for cloud applications.

1. Architecture of common cloud applications

In the early days of cloud applications, most of them purchased layer I resources of cloud services (mostly infrastructure such as hosts and other computing resources) to build various clusters, and operation and maintenance personnel mostly focused on host monitoring for operation and maintenance. at the same time, we build application, database and other monitoring systems for application layer and business layer operation and maintenance. With the popularity of container technology, more and more enterprises turn to CaaS and PaaS to manage applications. Through the development of micro-service framework, business implementation uses more cloud services, such as distributed middleware, functional services, AI services and so on. At the same time, operation and maintenance also turn to cloud operation and maintenance services.

A typical modern cloud application architecture:

After the domain name resolution phase, the static resource returns directly after hitting the CDN, and when it fails, it will be pulled back to the origin. The dynamic request directly accesses the WEB service. Before the request reaches layer 4 and layer 7 ELB, most enterprise applications will also choose WAF to clean the abnormal traffic.

After ELB, the request arrives at the business application server, and most of the business instances are distributed architecture, and the micro-services are called each other. In general, the enterprise operation and maintenance staff pay more attention to the application instance layer, which is mostly the service developed by the enterprise itself.

At present, the middleware provided by each CSP in the persistence layer is different. Huawei cloud users use more middleware, such as distributed cache, distributed database and so on. Due to the provision of dynamic expansion and higher-level SLA, more and more enterprises no longer need professional DBA to use cloud services and are more agile in development.

Problems with so many cloud services and various resources will lead to abnormal application KPI and decline in user experience, which in turn will affect business operations. For every enterprise that uses public cloud services, the cost will be very high if it invests a lot of manpower to build its own OPS system and associate the entire request links. Therefore, Huawei Cloud AOM has built a set of three-dimensional operation and maintenance system through practice to help enterprises better carry out one-stop operation and maintenance in the process of helping enterprises to apply operation and maintenance. The following sections will introduce you to the positioning and structure of three-dimensional operation and maintenance.

Second, the orientation and structure of three-dimensional operation and maintenance

Three-dimensional operation and maintenance positioning:

Three-dimensional operation and maintenance is mainly around the user application monitoring, one-stop completion of user experience monitoring, application performance monitoring, infrastructure monitoring.

With reference to the above typical cloud application architecture, by layering the different resources passing through the business request path, different professional operation and maintenance service subsystems are designed around the hierarchy, and different data are coordinated and analyzed in series on different subsystems. build a cloud operation and maintenance platform to maximize data value, provide a unified operation and maintenance center for operation and maintenance personnel, and achieve the goal of one-stop three-dimensional operation and maintenance. The following is the layering of three-dimensional operation and maintenance:

Three-dimensional operation and maintenance stratification

The construction of three-dimensional operation and maintenance should not only cover the end-to-end resources of the application, but also focus on data analysis through a variety of operation and maintenance data and friendly interface display through a variety of visualization means. Therefore, the construction of three-dimensional operation and maintenance system includes the following work:

Resource modeling

In fact, it is to connect the resources that the application depends on to CMDB, but the CMDB of cloud business is different from that of self-built data center operation and maintenance. The latter mostly corresponds to the CMDB at the SRE (website reliability engineer) level, while the CMDB required for application operation and maintenance management is a tailor-made CMDB for cloud resources. It mainly has the following characteristics

Separate the business model from the stock resource model (detailed interpretation later in the following article)

The stock model can describe different cloud resources under different cloud services.

Support the mapping of cloud resources in cloud services

Support for mapping resources across cloud services

Support tag management of cloud resources (tagging, synchronizing tagging, querying by tags)

Support for historical resource snapshots

The step of resource modeling is the basis of all data association and operation and maintenance platform. After associating different resources through a unified model, it can help users quickly find the root cause of the fault and analyze a large number of alarms through the correlation relationship. Suppress repeated alarms, etc.

Data visualization

A good visual interface can not only improve the efficiency of operation and maintenance personnel, but also intuitively display and view various resource consumption trends, help enterprises analyze operation trends, predict the future use of resources, and so on. Apply operation and maintenance management data visualization to design in accordance with the following principles

Establish a resource topology map with both sources

Resource topology refers to the relationship between a resource and other resources, such as the relationship between a CVM and ELB and VPC,CDN, which is shown by a resource topology diagram. As follows

The so-called left and right source refers to a resource as the center, the topology diagram shows the associated resources of each layer above and below, to avoid too large topology, but can find the upper or lower resources through a resource.

Associated resource drilldown

After the establishment of the topology, through the resource link on the diagram, you can jump to the topology diagram of another selected resource, while the new topology diagram is centered on the new resource, so as to achieve the goal of continuous drilling down through the associated resources. it is convenient for operators to find problems.

Rapid jump of cloud resources

A cloud resource may involve multiple cloud services, such as ELB instance, ELB service itself, VPC,CDN,ECS, and each cloud service entry is scattered, so you need to add a hyperlink in the resource name to quickly jump to the cloud service console.

View templating

For the display of each resource monitoring data, AOM provides a template by default, but at the same time, it should support user-defined templates. Because OPS staff focus on different metrics or other data, they should be able to view the same resource from different perspectives through templates.

Function oriented

Complex functions need to be set up or configured quickly through wizards to reduce the time cost for users to learn documents or videos.

Service platform

The goal of platform is to support users to realize automatic operation and maintenance through each subsystem through open API. Metrics, logs, event alarms and other data should be subscribed through the interface and forwarded to the external system for analysis by the user operation and maintenance platform. The analysis results are input into the three-dimensional operation and maintenance platform through API and continuous business analysis through the event-driven platform.

In other words, through the data flow, the cooperation between the platform and the user operation and maintenance system can be realized, and the process automation can be realized.

Automation will help users achieve automatic fault recovery. If you find that you need to expand the capacity after data analysis, you can expand the instance by triggering the event or calling the auto scaling subsystem by API. It can also scale down when the resources are idle to save the operating costs of the enterprise.

Intelligence of analysis

Provide dynamic threshold calculation ability for index data, without the need for users to set threshold, anomaly detection through machine learning, for large-scale system operation and maintenance can effectively reduce the cost of manual configuration. At the same time, it also avoids the repeated work that the static threshold setting is unreasonable and needs to be adjusted constantly.

For log data, intelligent extraction template, analysis of variable parameters and static text, through log keyword monitoring, real-time grasp of application anomalies.

Apply the overall structure of operation and maintenance management:

The following is the overall framework of application operation and maintenance management, which is mainly divided into five subsystems, each of which provides different functions through multiple micro services to achieve the goal of three-dimensional operation and maintenance.

The ALM module is responsible for the management and correlation analysis of event alarms, and supports users to configure notification policies to send alarms to operators in a timely manner.

The ALS module is responsible for analyzing logs.

The INV module, namely the CMDB module, realizes the management of resources and the mapping and query of resources.

The AMS module is mainly responsible for the management of index data and provides the ability of threshold configuration.

DPA module is mainly responsible for big data computing and intelligent capabilities, online and offline analysis of data, event-driven operation of each subsystem.

More information can be found at https://www.huaweicloud.com/product/aom.html

In addition, the pedestal environment in the architecture diagram shows the scope of AOM operation and maintenance, from infrastructure to PaaS layer applications and containers and VM applications, covering all layers of resources that the application depends on.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 227

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report