In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-14 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
Big data in how to use fuse design pattern protection software, in order to solve this problem, this article introduces the corresponding analysis and solution in detail, hoping to help more partners who want to solve this problem to find a more simple and feasible method.
As software developers, our life is fast-paced, we use agile software development methods, iteratively develop our software functions, develop and submit tests, pass QA tests and deploy to the production environment, and then terrible things happen in the production environment, the pressure of the production environment exceeds our design value, that is to say, overload, this often happens in the call of remote services. Because there is no overload protection, the requested resources are blocked and waited on the server, thus exhausting the system or server resources. very often, at the beginning, there is only a local, small-scale failure of the system, but due to various reasons, the scope of the fault is getting larger and larger, which finally leads to the overall consequences. Murphy's law is particularly effective in the software. As the saying goes, "anything that will go wrong will go wrong". How can we solve this problem? there is a design pattern called fuse, which can be used to solve the problem of overload protection.
We often encounter a phenomenon in daily life, if the electricity load at home is too large, such as turning on a lot of household appliances, it will "automatically trip" and the circuit will be disconnected. In the past, an older method was a "fuse". When the load was too large, or when the circuit was malfunctioning or abnormal, the current would continue to rise, in order to prevent the rising current from damaging some important or valuable devices in the circuit. burn the circuit and even cause a fire. The fuse will fuse and cut off the current when the current rises abnormally to a certain height and heat, thus protecting the safe operation of the circuit. This automatic tripping device is a circuit fuse, which usually uses an electromagnet to cut off the circuit instead of burning it, and the fuse can be reused. The component mode in which we mimic circuit fuses in the software is CircuitBreaker.
In large-scale distributed systems, it is usually necessary to invoke or operate remote services or resources, which fail due to reasons beyond the control of the caller, such as slow network connection, resource occupation or temporary unavailability. These errors usually return to normal later. However, in some cases, the results are difficult to predict due to unpredictable reasons, and remote methods or resources may take a long time to repair. This error is so serious that part of the system becomes unresponsive and even causes the entire service to be completely unavailable. In this case, the problem may not be solved by constantly retrying. instead, the application should return immediately and report an error at this time.
In general, if a server is very busy, partial failures in the system may result in "cascading failure". For example, an operation may invoke a service in the cloud, and the service will set a timeout, and an exception will be thrown if the response time exceeds that time. But this strategy causes concurrent requests to invoke the same operation to block until the timeout expires. This blocking of requests may take up valuable system resources, such as memory, threads, database connections, and so on, which will eventually be exhausted, thus dragging down the entire system. In this case, it may be a better choice for the operation to return an error immediately rather than waiting for a timeout to occur. We try again only when the invocation of the service is likely to succeed.
Fuse design pattern
Fuse mode http://martinfowler.com/bliki/CircuitBreaker.html summarized by Uncle Martin, fuse mode can prevent applications from constantly trying to perform operations that may fail, so that applications do not have to wait for errors to be corrected, or waste CPU time waiting for long timeouts. Fuse mode also enables the application to diagnose whether the error has been corrected, and if so, the application will try to invoke the operation again.
Fuse mode is like an agent for operations that can easily lead to errors. This agent can record the number of errors that occurred in the most recent call and then decide to allow the operation to continue or return an error immediately.
Fuses can be implemented using a state machine, which internally simulates the following states.
Closed state: a request to an application can directly cause a method call. The proxy class maintains the number of recent call failures, increasing the number of failures by 1 if a call fails. If the number of recent failures exceeds the threshold that allows failures for a given time, the proxy class switches to the Open state. At this point, the agent turns on a timeout clock, and when the clock exceeds that time, it switches to the Half-Open state. The timeout setting gives the system a chance to correct the error that caused the call to fail.
Open status: in this state, a request to the application returns an error response immediately.
Half-Open state: allows a certain number of requests to the application to invoke the service. If the invocation of the service by these requests is successful, then the error that caused the failure of the call can be considered to have been corrected, and the fuse is switched to the closed state (and the error counter is reset) If there is a call failure in this certain number of requests, it is considered that the problem that caused the previous call failure still exists, the fuse switches back to disconnect mode, and then starts to reset the timer to give the system time to correct the error. The semi-disconnected state can effectively prevent the service being restored from being dragged down again by a sudden large number of requests.
In the Close state, the error counter is time-based. Automatically resets during a specific time interval. This can prevent the fuse from entering the disconnected state due to an accidental error. The failure threshold that triggers the fuse to enter the disconnected state will occur only if the number of errors reaches the threshold of the specified number of errors within a specific time interval. The successive success counters used in the Half-Open state record the number of successful calls. When the number of successive successful calls reaches a specified value, switch to the closed state, and if a call fails, immediately switch to the disconnected state, and the timer of consecutive successful calls will return to zero the next time it enters the semi-disconnected state.
The implementation of fuse mode makes the system more stable and resilient, provides stability when the system recovers from errors, and reduces the impact of errors on system performance. It improves the response of the system to events by quickly rejecting services that may cause errors when trying to call, rather than waiting for the operation to time out or never return a result. If the fuse design mode emits an event each time the state is switched, this information can be used to monitor the running status of the service and notify the administrator to handle it when the fuse is switched to the disconnected state.
The fuse pattern can be customized to accommodate specific scenarios that may cause remote services to fail. For example, you can use a growing strategy for timeouts in fuses. When the fuse starts to enter the disconnected state, you can set the timeout to a few seconds, then if the error is not resolved, then set the timeout to a few minutes, and so on. In some cases, we can return some incorrect default values in the disconnected state instead of throwing an exception.
The following factors may need to be considered when implementing fuse mode:
Exception handling: when invoking a fuse-protected service, we have to handle exceptions when the service is not available. These exception handling usually depends on the specific business situation. For example, if the application is only temporarily degraded, you may need to switch to another alternative service to perform the same task or get the same data, or report an error to the user and prompt them to try again later.
Type of exception: there may be many reasons why the request failed. Some reasons may be more serious than others. For example, the failure of a request may be due to a remote service crash, which may take several minutes to recover, or a timeout due to temporary overload on the server. Fuses should be able to check the type of error so that the policy can be adjusted according to the specific error situation. For example, it may take many timeout exceptions to determine that you need to switch to the disconnected state, and it only takes a few error prompts to determine that the service is unavailable and quickly switch to the disconnected state.
Log: fuses should be able to record all failed requests, as well as requests that may attempt success, so that administrators can monitor the performance of services protected by fuses.
Test whether the service is available: in the disconnected state, the fuse can use regular ping remote services or resources to determine whether the service is restored, rather than using a timer to automatically switch to the half-disconnected state. This ping operation can simulate previous failed requests, or it can be determined by invoking the method provided by the remote service to check the availability of the service.
Manual reset: it is difficult to determine the recovery time for failed operations in the system, and providing a manual reset feature allows the administrator to manually force the fuse to a closed state. Similarly, if a fuse-protected service is temporarily unavailable, the administrator can forcibly set the fuse to the disconnected state.
Concurrency problem: the same fuse may be accessed by a large number of concurrent requests at the same time. The implementation of fuses should not block concurrent requests or increase the burden of each request call.
Differences in resources: when using a single fuse, you need to be careful if a resource is distributed in multiple places. For example, one data may be stored on multiple disk partitions (shard), one partition can be accessed normally, and the other may have a temporary problem. In this case, if different error responses are confused, the application's access to these problematic partitions is more likely to fail, while those that are considered normal are likely to be blocked.
Speed up the fuse operation: sometimes, the error message returned by the service is sufficient for the fuse to perform the fuse operation immediately and hold it for a period of time. For example, if the response returned from a distributed resource indicates that the load is overloaded, you can conclude that it is not recommended to retry immediately, but that you should wait a few minutes and try again. (the HTTP protocol defines "HTTP 503 Service Unavailable" to indicate that the requested service is currently unavailable, which can contain other information such as timeouts, etc.)
Repeat failed requests: when the fuse is disconnected, the fuse can record the details of each request instead of just returning a failure message, so that when the remote service is restored, these failed requests can be requested again.
Fuse usage scenario
You should use this mode to:
Prevent applications from directly invoking remote services or shared resources that are likely to fail.
An unsuitable scene
For direct access to local private resources in the application, such as data structures in memory, using fuse mode will only add additional overhead to the system.
Not suitable as an exception handling substitute for business logic in an application
There are many class libraries that implement the fuse design pattern. Here we introduce a project called Polly. It is a very neat bag that provides us with many kinds of fuses. It covers most exception handling strategies such as retry, retry and wait, and Polly is also very easy to use. Here is how to use Polly:
/ / Break the circuit after the specified number of exceptions
/ / and keep circuit broken for the specified duration
Var policy = Policy
.handle ()
.innovitBreaker (2, TimeSpan.FromMinutes (1))
Var result = poilcy.Execute (() = > DoSomething ())
If DoSomething () causes the DivideByZeroException 2 fuses to disconnect for one minute.
In an application system, we usually call remote services or resources (these services or resources usually come from a third party). Calls to these remote services or resources usually lead to failure or suspension without response. until the timeout occurs. In some extreme cases, a large number of requests will block the invocation of these abnormal remote services, which will lead to the depletion of some critical system resources, resulting in the failure of cascading and dragging down the whole system. The fuse pattern takes the form of a state machine internally, which makes it possible to package these remote services that may cause requests to fail. When an exception occurs in the remote service, it can immediately return an error response to the incoming request and inform the system administrator to control the error within a local range, so as to improve the stability and reliability of the system.
The answer to big data's question on how to use fuse design pattern to protect software is shared here. I hope the above content can be of some help to you, if you still have a lot of doubts to be solved. You can follow the industry information channel for more related knowledge.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.