Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Microservice architecture-service degradation

2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

1. Introduction

What is a service downgrade? When the pressure on the server increases sharply, according to the actual business situation and traffic, some services and pages are not dealt with strategically or handled in a simple way, so as to release server resources to ensure the normal or efficient operation of the core transaction.

If you still don't understand, you can give an example: if there are a lot of people who want to pay me right now, but my server has some other services running besides the payment service, such as search, scheduled tasks and details, and so on. However, these unimportant services take up a lot of memory and CPU resources in JVM. In order to collect all the money (money is the target), I have designed a dynamic switch to reject these unimportant services directly in the outermost layer. In this way, the back-end service after processing will have more resources to collect money (the collection speed is faster). This is a simple service degradation scenario.

2. Use the scene

What scenarios are service downgrades mainly used for? When the overall load of the entire microservice architecture exceeds the preset upper limit threshold or the upcoming traffic is expected to exceed the preset threshold, in order to ensure the normal operation of important or basic services, we can delay or suspend the use of some unimportant or non-urgent services or tasks.

3. Core design

3.1 distributed switch

According to the above requirements, we can set up a distributed switch to achieve service degradation, and then centrally manage the switch configuration information. The specific plans are as follows:

Service degradation-distributed switch

3.2 automatic demotion

Timeout degradation-mainly configure timeout time and timeout retry times and mechanism, and use asynchronous mechanism to detect recovery

Downgrade the number of failures-mainly some unstable API. When the number of failed calls reaches a certain threshold, it is also necessary to use an asynchronous mechanism to detect the reply.

Downgrade-if the remote service to be invoked is down (network failure, DNS failure, HTTP service returning an error status code, and RPC service throwing an exception), it can be degraded directly

Current limiting degradation-when current limiting excess is triggered, temporary shielding can be used for temporary shielding.

When we go to kill or snap up some restricted goods, the system may crash because of the large number of visits. At this time, developers will use current restrictions to limit the number of visits. When the current limit threshold is reached, subsequent requests will be downgraded. The downgraded solution can be: queued page (divert the user to the queued page and try again later), out of stock (directly tell the user that it is out of stock), and error page (if the activity is too hot, try again later).

3.3 configuration Center

The configuration information of microservice degradation is centralized management, and then friendly operation is carried out through the visual interface. Network communication is required between the configuration center and the application, so it may lead to the loss of configuration push information, reboot or network restart, untimely changes, and so on. Therefore, the service degraded configuration center needs to implement the following characteristics to ensure that even if the configuration changes are achieved as far as possible:

Service degradation-configuration Center

Start the active pull configuration-used to initialize the configuration (reduce the first scheduled pull cycle)

Publish and subscribe configuration-used to implement timely configuration changes (about 90% of configuration changes can be resolved)

Timing pull configuration-used to solve the situation where publication subscriptions fail or disappear (it can solve about 9% of the message changes of invalid publication subscriptions)

Offline file cache configuration-used to temporarily solve the problem of not connecting to the configuration center after reboot

Editable configuration document-used to directly edit the document to achieve the definition of the configuration

Provide Telnet command to change configuration-- used to solve the common problem that configuration center fails and cannot change configuration

3.4 processing strategy

When a service downgrade is triggered and new transactions arrive again, how do we deal with these requests? From the overall perspective of the micro-service architecture, we usually have several commonly used downgrade solutions:

Page degradation-visual interface disables clicking buttons and adjusts static pages

Delay services-such as delayed processing of scheduled tasks, delayed processing of messages after entering MQ

Write degradation-Service requests that directly prohibit related write operations

Read degradation-- Service requests that directly prohibit relevance

Cache degradation-use caching to downgrade some frequently read service interfaces

For the downgrade processing strategy at the backend code level, we usually use the following processing measures:

Throw an exception

Return to NULL

Call Mock data

Call Fallback processing logic

4. Advanced features

We have made a downgrade switch for each service, and it has been verified online, which feels like no problem at all.

Scenario 1: one day, the operator held an event and suddenly came by and said, now that the traffic has almost reached the upper limit, is there any way to downgrade all unimportant services in batches? Developers look confused, this is not the operation of DB, where there are batch operations.

Scene 2: one day, the operator made trouble again, saying that we would hold an activity later, so that we would quickly downgrade all the unimportant services ahead of time, and the development was ignorant. How could I know which services to downgrade?

Reflection: although the function of service degradation is realized, it does not take into account the experience of implementation. Too many services, do not know which services to downgrade, a single operation downgrade speed is too slow.

4.1 grading and downgrading

When the micro-service architecture occurs to varying degrees, we can choose to abandon it according to the comparison of services (that is, the principle of car loss), so as to further ensure the normal operation of the core services.

If you wait for online services to fail and choose which services should be downgraded and which cannot be downgraded one by one, but there are hundreds of services online, it will certainly be dragged down before it is too late to downgrade. At the same time, there will be a lot of work to sort out before activities such as big promotion or second kill, so it is suggested that the architect or core developer should sort it out in advance during the development period to see if the initial evaluation value can be degraded, that is, whether the default value can be downgraded.

In order to facilitate the degradation of services in the bulk operation microservice architecture, we can establish an evaluation model of service importance from a global point of view. If there are conditions, it is suggested that the mathematical modeling model (or other models) of Analytic hierarchy process (The analytic hierarchy process, referred to as AHP) can be used for qualitative and quantitative evaluation. (it is certainly many times better than the architect to decide whether to downgrade directly. Of course, the difficulty and complexity will be much higher, that is, you need a mathematical modeling talent), and the basic idea of Analytic hierarchy process (AHP) is that people think and judge a complex decision problem roughly the same.

The following is the final evaluation model given by individuals, which can be designed as a reference model for the evaluation of service degradation:

We use the method of mathematical modeling or the way that the architect pats the head directly, combined with the priority principle of whether the service can be degraded, and make a reference design according to the level of typhoon warning (all belong to storm warning). All services in the micro-service architecture can be classified into the following four types of failure storm levels:

Evaluation model:

Blue Storm-indicates the need for a small downgrade of non-core services

Yellow Storm-indicates the need for medium-sized downgraded non-core services

Orange Storm-indicates the need for a massive downgrade of non-core services

Red Storm-indicates that all non-core services must be downgraded

Design description:

Fault severity: blue

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report