In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
Today, I will talk to you about how to automatically remove a single point of exception in the practice of micro-service governance. Many people may not know much about it. In order to make you understand better, the editor summarized the following. I hope you can get something according to this article.
Under the micro-service architecture, stability and high availability are an eternal topic. In the actual governance process, we may encounter the following scenarios:
An application grayscale release, first on several machines, due to problems with the logic of the code, resulting in thread pool full, abnormal operation.
In the server cluster, the load of some machines is too high due to full disks or competition for host resources, and the call timeout occurs on the client.
In the server cluster, several machines cause Full Garbage Collection because the thread pool is full.
In the above three scenarios, because the client does not illegally perceive that the server has a problem, it will still send a request to these machines, resulting in an error in the business call, and the upstream machine will be dragged down by a brief failure of a machine downstream, resulting in the risk of application avalanche.
In the face of this scenario, if the service is degraded only for this purpose, it will do too much harm to the application, but if we can detect some faulty machines in the service cluster and isolate them briefly, it can effectively ensure the high availability of services and the stability of the system, and provide valuable buffer time for operation and maintenance personnel to locate and troubleshoot problems.
The following is to introduce how to extract outlier instances.
Microservice Outlier Ejection (microservice outlier instance removal)
What is the removal of outlier instances
When a tamping machine exception occurs at a single point, consumer can actively judge and delete the corresponding provider instance for a short time, no longer request, and continue to access after a certain time interval. At the same time, it has the ability to judge global anomalies. When the number of provider exception instances is too large and exceeds a certain control ratio, it shows that the overall service quality of provide is low, and this mechanism only maintains a certain proportion of removal.
The function of removing outlier instances
From the fault-tolerant ability of the service layer, the business stability is enhanced to effectively solve the problem of single point of failure.
The difference between fusing and fusing
A circuit breaker is a way to avoid avalanche effect caused by rapid collapse of the service when the input load of the service surges. The circuit breaker is generally composed of circuit breaker request judgment algorithm, circuit breaker recovery mechanism, circuit breaker alarm and other modules. Isolation refers to an architectural approach to the unitary design of the system in order to avoid the spread of failures caused by dependent service failures.
If only because of the single point exception problem in the server cluster, the circuit breaker degradation scheme will do too much harm to the application. The removal of outlier instances can effectively solve the single point exception problem and ensure the quality of service. If the overall service quality of provider is low, the effect of outlier removal is no longer obvious, and the fuse degradation function can be used at this time.
Remove the supported version of the outlier instance
As long as your application version is in the list, you can use the outlier instance removal feature without changing a single line of code.
Currently supports version development version (soon to support) Dubbo2.5.0~2.7.3
< 2.5.0 dubbox 版本Spring CloudD、E、F、G、HC 目前已经覆盖了市面上大部分微服务场景,后续我们将会持续支持开源最新的 Dubbo/Spring Cloud 版本。 我们提供了 Dubbo 和 Spring Cloud 两种场景的离群摘除功能,本文将先介绍一下 Dubbo Microservice Outlier Ejection 的实践与效果。 下面将通过在 EDAS 上通过演示 Dubbo 离群摘除功能及效果。 企业级分布式应用服务EDAS(Enterprise Distributed Application Service)是一个应用托管和微服务管理的 PaaS 平台,提供应用开发、部署、监控、运维等全栈式解决方案,同时支持 Dubbo、Spring Cloud 等微服务运行环境。 https://www.aliyun.com/product/edas 准备 接下来以微服务Demo为例子示范离群摘除功能,读者可以从github中下载验证 https://github.com/aliyun/alibabacloud-microservice-demo/tree/master/src 微服务Demo是一个简单的电商项目,下图为项目结构,cartservice 为 Dubbo 框架的购物车服务 provider,productservice 为Spring Cloud提供的商品详情服务 provider,frontend 为web controller即前端展示页面,可以理解为consumer。 cdn.com/012c8ebbb65144e189f6731a6f021a4dc23468da.png">We will take the cartservice service, that is, the Dubbo server, as an example to demonstrate the outlier instance removal function.
Deploy micro-service Demo on EDAS
First, cd cartservice changes to the cartservice directory, then it is packaged through mvn clean install, and then switched to the target directory through cd cartservice-provider/target. We can see the newly generated cartservice-provider-1.0.0-SNAPSHOT.jar package, and then create a cartservice application on EDAS.
Then launch the application, and so far, we have launched a cartservice-provider. Click to expand the capacity according to the specification of this instance, and the service is deployed on two instances.
As we can see in the com.alibabacloud.hipstershop.provider.CartServiceImpl class of this provider, this provider provides two shopping cart services for viewCart and addItemToCart, and we add some logic to simulate runtime exceptions in viewCart.
@ Value ("${exception.ip}") private String exceptionIp; @ Override public List viewCart (String userID) {if (exceptionIp! = null & & exceptionIp.equals (getLocalIp () {throw new RuntimeException (Runtime exception);} return cartStore.getOrDefault (userID, Collections.emptyList ());}
ExceptionIp is the configuration item of exception.ip in ACM configuration center. If this configuration is native ip, the service throw RuntimeException is used to simulate business exception scenarios.
You may have guessed why the cartservice was expanded to two instances. The runtime simulates an abnormal instance by configuring the ACM configuration center to specify the IP of one of the instances.
Next, we need to deploy two frontend/ productservice services in the same way, uploading frontend/target/frontend-1.0.0-SNAPSHOT.jar and productservice/productservice-provider/target/productservice-provider-1.0.0-SNAPSHOT.jar, respectively.
As you can see from the figure below, our microservice Demo is deployed on EDAS.
Simulated business exception
When entering the frontend application, we see that the public network ip of the instance is 47.99.150.33.
Click View Cart to access to http://47.99.150.33:8080/cart
Then continue to visit http://47.99.150.33:8080/cart and find a 50% probability of error page
At this point, we write a script that regularly accesses a large number of http://47.99.150.33:8080/cart simulation requests.
While: do result= `curl $1-s`if [["$result" = * "500" *]]; then echo `date +% Fmure% T` $result else echo `date +% Fmure% T`" success "fi sleep 0.1done
And then sh curlservice.sh http://47.99.150.33:8080/cart.
We see a 50% success rate of repeated calls 10 times per second.
In fact, it can be understood that the quality of service in the downstream decreases sharply with the exception of a certain machine in the upstream, and may even cause the downstream service to be dragged down by the abnormal (system, business) of some machines in the upstream.
Enable outlier removal strategy
Below I will demonstrate the opening of the outlier removal strategy and the demonstration of its effect.
Create
We go to the "outlier instance removal" interface under "Micro Service Management" in the list on the left side of EDAS, and select "create outlier instance removal policy".
As shown in the figure above, you can select a namespace, fill in the policy name, and select the framework type supported by the policy (Dubbo/Spring Cloud).
Select effective application
These parameters provide default values, and you need to adjust the most appropriate values according to the specific circumstances of your application. Because the RuntimeException to be protected belongs to business exception, select network exception + business exception. (it should be noted that even if the upper limit of the removed instance ratio is very low and the integer down is less than 1, when the number of instances in the cluster is greater than 1 and an instance is abnormal, we will also remove an instance.)
Creation completed
You can see that the information for the policy is created.
strategy
We see the outlier removal policy we created, which is aimed at the Dubbo framework and the exception type of network exception + business exception.
Verify the effect of outlier removal
At this time, we see that after perceiving the exception, the outlier removal feature takes effect, and the correct result is returned after the request is called for a while.
When the client senses that a server is abnormal, it removes it actively. Only Provider instances with normal business are called. At the same time, we can also monitor through ARMS (EDAS Monitoring system) to see the increase in the quality of service and the removal of traffic from the abnormal Provider.
The Dubbo framework can see a series of event logs extracted from outlier instances by searching for the "OutlierRouter" keyword in the logs in the / home/admin/.opt/ArmsAgent/logs directory.
Modify / close the outlier removal policy
For EDAS applications, we support dynamic modification and deletion of outlier removal policies through the console.
Modification of corresponding policy rules
Click modify to apply or edit the policy.
The operation of the console takes effect on the configuration in the application in real time. If the policy is deleted, the relevant policy is disabled by default.
If we open the ARMS monitor to observe the specific invocation.
ARMS monitoring
If we enable monitoring, we will directly see the traffic and request errors and other information.
Before activating outlier removal
Enable it as shown below, and then jump to the ARMS (EDAS Monitoring system) application monitoring page. We need to enable advanced monitoring for all three applications.
From the following topology diagram, we can see that traffic is constantly accessing the cartservice service.
It can be seen that after turning on the outlier removal point, the error rate decreased significantly from 50%.
Technical architecture of Dubbo Agent solution
For users, they can enjoy the ability of enhanced stability without changing one line of code and one line of configuration.
Outlier Detection outlier Detection based on outlier instance removal Technology
All are data statistics based on the time window.
Two kinds of realization
1. Dubbo version 2.7 embeds a MetricsFilter into the link, manages each request/response of the link, counts the rt, whether the call is successful or not, the exception type, and has endpoint (ip+port) stored for key
2. Count the passed http requests in the Agent base, and count the data of the latest time window through the results of url, rt, status code, exception type, etc. (currently write dead for 10 seconds, do not reveal it for the time being)
Real-time statistics of the first N seconds of the call information, as the basis for the removal of outlier instances.
Outlier Ejection outlier removal
Dubbo is based on the Dubbo Router implementation. In all the invokers corresponding to the called upstream service, the "unhealthy" nodes are blocked and the blocking information is recorded.
Dubbo-Router control logic
Only check and mark the status of each request, and there are two special threads in the background to determine whether the marked traffic is entered or removed from the isolation list, modify blocking information and other time-consuming operations to ensure the real-time performance of the request to the maximum extent.
Spring Cloud is based on extended LoadBalace implementation and is similar in principle.
After reading the above, do you have any further understanding of how to automatically remove a single point of exception in the practice of microservice governance? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.