What is Hystrix 04/19 Update SLTechnology News&Howtos

What is Hystrix

2025-04-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article mainly explains "what is Hystrix". The content of the explanation is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "what is Hystrix".

What is a service avalanche?

In a distributed architecture, it is very common that a request requires multiple services to be invoked.

For example, the client accesses the user service, and the user service needs to invoke the order service, and the order service needs to invoke the goods service. Due to network reasons or its own reasons, if the order service or goods service cannot respond in time, the user service will be blocked until the order service goods service responds.

If a large number of requests pour in at this time, the thread resources of the container will be consumed, resulting in service paralysis.

The dependence between service and service, the fault will spread, cause a chain reaction, and will cause disastrous serious consequences to the whole micro-service system, which is the "avalanche" effect of service failure.

1. As shown in the figure, the system is running happily at this time.

two。 Suddenly at this time, the network of the goods service node failed. The goods service node is down and the goods service is not available.

3. Due to the paralysis of good services, the requests sent by order services to goods services can not be returned and have been blocked. At this time, user services still keep sending requests to order services, resulting in resource exhaustion of order service nodes, paralysis of order service nodes and unavailability of order services.

4. Due to the paralysis of good services, the requests sent by order services to goods services can not be returned and have been blocked. At this time, user services still keep sending requests to order services, resulting in resource exhaustion and paralysis of order service nodes. At this time, the requests sent by the user service to the order service can not be returned, but the client still sends requests to the user service node continuously. Finally, the user service node is the same as the order service node. The server is paralyzed due to resource exhaustion, and the user service is not available.

As mentioned above, the paralysis of one service node leads to the paralysis of the service nodes of the whole link, which is called the service avalanche.

Why is there a service avalanche?

Traffic surge: for example, abnormal traffic and user retry lead to increased system load

Cache penetration: assuming that An is the client side and B is the server side, and assuming that system A requests flow to system B, which exceeds the carrying capacity of system B, it will cause system B to crash.

The program has Bug: logic problems of cyclic code calls, memory leaks caused by unreleased resources, etc.

Hardware failures: such as downtime, power outage in the computer room, fiber digging and so on.

Serious database bottlenecks, such as long transactions, slow sql and so on.

Thread synchronous waiting: synchronous service invocation mode is often used between systems, and core services and non-core services share a thread pool and message queue. If a core business thread calls a non-core thread, the non-core thread is handed over to a third-party system to complete. When there is a problem with the third-party system itself, the core thread blocks and waits all the time. The call between processes has a timeout limit, and eventually this thread will be broken, which may also cause an avalanche.

Is there any way to solve the service avalanche?

In the case of sudden traffic surge, we can use automatic expansion, or add service current limit function to the load balancer.

The service avalanche caused by cache penetration can be solved by cache preloading, cache asynchronous loading and so on.

The program bug,emmm..., can only be changed to bug.

The service avalanche caused by too long database query time can be optimized by sql, hardware upgrade and so on.

For the service avalanche caused by hardware failure, routing across computer rooms, multi-activity in different places and so on.

There are many different solutions for different scenarios that cause service avalanches, but there is no general solution to all the problems.

It has been proved by a large number of practices that thread synchronous waiting is the most common avalanche effect scenario. At this moment, Hystrix, the protagonist of this chapter, will appear on the scene, and we will describe in detail how to use Hystrix for fault isolation and fuse mechanism to solve the problem that dependent services are not available.

Overall cognition of Hystrix

Hystrix is an open source library for handling invocation failures and fault tolerance between services in a microservice architecture.

In micro-service architecture, calls between services may fail, such as timeouts, exceptions, and so on.

Hystrix can guarantee that in the case of a dependency problem, it will not lead to the collapse of the entire service link and improve the availability of the micro-service architecture.

The Hystrix, also known as a "circuit breaker", is itself a switching device that returns an expected and manageable alternative response (FallBack) to the caller through fault monitoring of the circuit breaker (similar to a fuse), rather than waiting for a long time or throwing an exception that the caller cannot handle. This ensures that the threads of the service caller will not be occupied unnecessarily for a long time, thus avoiding the spread of faults or even avalanches in the distributed system.

Hystrix design goal, realization method

Design goal

The main contents are as follows: (1) to control and protect the invocation delay and failure of dependent service invocation.

(2) to prevent a service-dependent fault from spreading in the whole system, service A-> service B-> service C, service C fails, service B also fails, service A fails, the whole system fails, and the whole system is down. (3) provide support for fail-fast (Quick failure) and quick recovery.

(4) provide support for elegant downgrade of fallback.

(5) support near real-time monitoring, alarm and operation and maintenance operations.

Mode of realization

Encapsulate an access request to an external dependency through hystrixCommand or HystrixObservableCommand, which typically runs in a separate thread.

For service calls that exceed the threshold (yu) we set, timeout returns are performed directly, and it is not allowed to block for a long time.

Isolate resources for each dependent service. Through the thread pool or semaphore.

Count the number of successes, failures, rejections and timeouts of the dependent service.

If the number of failed calls to a dependent service exceeds a point threshold, Hystrix automatically breaks the circuit, degrades the call to the service within a period of time, and then automatically attempts to recover after a period of time.

When the call to a service fails, is rejected, times out, or shorts out, the fallback degradation mechanism is automatically invoked.

Provide near-real-time support for property and configuration modifications

Hystrix workflow

First, let's take a look at the Hystrix workflow chart of the pipe network:

Let's interpret the workflow of Hystrix in detail for this picture.

1. A HystrixCommand or HystrixObservableCommand object is created for each call

two。 Execute execute (observe) or queue (toObservable) to make synchronous / asynchronous calls

3. Check whether the result of the request is cached, if the cache returns directly

4. Check to see if the circuit breaker is turned on, if so, skip directly to step 8

5. Check whether the thread pool / semaphore is full, and if so, proceed to step 8

6. Execute HystrixObservableCommand.construct () or HystrixCommand.run (), and skip to step 8 if you execute an exception or call a timeout

7. Calculate the status of the circuit breaker, and all operating states (success, failure, rejection, timeout) are reported to the circuit breaker for statistics to determine the status of the circuit breaker.

8. When the fallback degradation mechanism is called, four situations will be degraded through the above steps (fuse on, thread pool / semaphore running full, call timeout, call failure).

9. Returns the real result of the dependency request

Detailed explanation of workflow

The first step is to create a HystrixCommand or HystrixObservableCommand object

HytrixCommand and HystrixObservableCommand wrap the logic of accessing external dependencies.

The first step in the entire process is to instantiate a HystrixCommand or HystrixObservableCommand object.

When you construct these two Command objects, you can pass any parameters needed during execution through the constructor.

If only one result value is returned to the external dependency call, a HystrixCommand object can be instantiated.

HystrixCommand command = newHystrixCommand (arg1, arg2)

If you need to return multiple result values when invoking external dependencies, you can instantiate a HystrixObservableCommand object

HystrixObservableCommand command = newHystrixObservableCommand (arg1, arg2)

The second step is to execute execute (observe) or queue (toObservable) to make synchronous / asynchronous calls

HystrixCommand mainly uses the following two commands

Execute (): execute synchronously, returning a single result object from a dependent service, or throwing an exception when an error occurs.

Queue (): asynchronous execution, which directly returns a Future object that contains a single result object to be returned at the end of service execution.

HystrixObservableCommand uses the following two commands

Observe (): returns the Observable object, returns the Observable object, immediately issues the request, and gets the return result through the registered Subscriber, which is a Hot Observable, when relying on the service response (or throwing an exception / timeout).

ToObservable (): the Observable object is returned, but the request is made only when subscribing to the object, and then when the response is dependent on the service (or throwing an exception / timeout), the result is returned through the registered Subscriber, which is a Cold Observable.

The third step is to check whether the result of the request is cached, if the cache returns directly

If the request cache for the current command is enabled and the command cache is hit, the cached result is immediately returned as an Observable object.

The benefits of this result cache are:

1. In the same request context, you can reduce the overhead of requesting the original service with the same parameters.

2. The request cache takes effect before the execution of step 5, so it can effectively reduce unnecessary thread overhead.

Step 4, check to see if the circuit breaker is turned on

When the cache is not hit, Hystrix checks whether the circuit breaker is turned on before performing step 5. If turned on, Hystrix will not execute any commands to jump to step 8

Circuit breaker switch control conditions: 1. The number of calls to external dependencies meets the configured threshold of 2. 5. The rate of errors in external dependency calls meets the configured threshold

When the above two conditions are met, the circuit breaker turns on the fuse switch, after which all calls to external dependencies will be disconnected directly.

After the switch has been turned on for more than the trial window period, the circuit breaker will try to release some externally dependent calls

It is decided to turn the fuse switch back on or off according to the results of the trial.

Step 5, check whether the thread pool / semaphore is full

As we know, Hystrix introduces thread pool and semaphore to realize resource isolation mechanism. If the thread pool or queue or semaphore corresponding to the command is full, jump directly to step 8.

Step 6, execute HystrixObservableCommand.construct () or HystrixCommand.run ()

Hystrix will decide how to request dependent services based on the method we write.

1.HystrixCommand.run ()-returns a single response or throws an exception.

2.HystrixObservableCommand.construct ()-returns the Observable object to emit multiple results, or sends an error notification via onError.

If the execution time of the run () or construct () method exceeds the timeout threshold of the command, its thread will throw a TimeoutException (or on a separate thread, if the command is not running on its own thread).

In this case, the Hystrix is transferred to the fallback processing logic (step 8).

And if the command is not canceled or interrupted, it will discard the final return value of the run () or construct () method.

If the command does not throw an exception and returns a response, Hystrix will return the results to the caller after performing some logging and measurement reports.

If it is run through run (), Hystrix will return Observable to emit a single result and then send a notification of onCompleted; if it is run through construct (), Hystrix will directly return the Observable object produced by this method.

Step 7, calculate the status of the circuit breaker

Hystrix will send every successful, failed, rejected, timeout, and other events that depend on the service to the circuit breaker circuit breaker.

HystrixCircuitBreaker records the execution of external dependency requests by maintaining a series of counter.

According to the maintained information, the circuit breaker turns on the circuit breaker function when the trigger condition is met, and turns off the circuit breaker when the condition is right.

If the circuit breaker is turned on, it will be directly short-circuited for a period of time, and then if the first check after that finds that the call is successful, turn off the circuit breaker.

Step 8, call the fallback downgrade mechanism

Through a detailed interpretation of the above steps, we find that there are several situations in which the fallback degradation mechanism is invoked.

1. Circuit breaker open

two。 The thread pool or semaphore is full.

3.command execution exception

4. Execution timeout

In the service degradation logic, a general response result needs to be implemented, and the processing logic of the result should be obtained from the cache or according to some static logic, rather than relying on network requests.

If a network request must be included in the service degradation logic, the request must also be wrapped in HystrixCommand or HystrixObservableCommand to form a cascading degradation policy.

The final degradation logic must not be a processing that depends on network requests, but a processing logic that can stably return results.

1. In HystrixCommand, custom callback logic is provided in the HystrixCommand.getFallback () method, which returns a single callback value.

two。 In HystrixObservableCommand, custom callback logic is provided in the HystrixObservableCommand.resumeWithFallback () method, which returns an Observable object to emit one or more degraded results

If fallback returns a result, then Hystrix returns the result.

For HystrixCommand, an Observable object is returned, in which the corresponding result is sent

For HystrixObservableCommand, a raw Observable object is returned.

If fallback is not implemented, or if fallback throws an exception, Hystrix returns an Observable, but does not return any data.

If the command is executed in different ways, the return result will be different if the fallback is empty or if an exception occurs.

1. For execute (), throw exception 2. 0 directly. For queue (), a Future is returned, and exception 3. 0 is thrown when get () is called. For observe (), an Observable object is returned, but when the subscribe () method is called to subscribe to it, the caller's onError method 4. 0 is thrown. For toObservable (), return an Observable object, but throw the caller's onError method when you call the subscribe () method to subscribe to it

Step 9, return the real result of the dependency request

If the Hystrix command executes successfully, it returns the response to the caller in the form of Observable. Depending on how you called it in step 2, some conversions may be made before returning Observablez.

Execute (): get a Future object by calling queue (), and then call the get () method to get the value contained in the Future.

Queue (): converts Observable to BlockingObservable and BlockingObservable to Future.

Observe (): subscribe to the returned Observable and immediately start executing the logic of the command.

ToObservable (): returns an unchanged Observable that you must subscribe to before it can start executing the logic of the command.

The above is the entire Hystrix workflow, of course, there is no very in-depth explanation, but I still suggest to see it several times, I encountered several times during the interview asked me to briefly describe the Hystrix workflow, read it several times, keep in mind, the interview is not panic.

Hystrix is built using framework

Of course, Hystrix can also be integrated with Feign and Zuul, which will not be discussed here, but will be described in more detail in subsequent articles introducing Feign and Zuul.

This paper mainly introduces the use of HystrixCommand annotations.

First, let's build a HystrixClient project.

Add profile application.properties

Create a new RestConfiguration class to configure RestTemplate globally.

Create a new HystrixController class

Then add the @ EnableHystrix annotation to the startup class.

Transform the Ribbon architecture of the previous article, describe the OrderService provided in Ribbon in detail for the high-frequency questions in the interview, and let the code sleep for 5 seconds before returning.

In each of the three methods in HystrixController, 2000ms is configured. If no result is returned, the fallback we specified will be called back directly.

OrderService

In the above three steps, the basic Hystrix usage framework is completed, and then we launch the last article to explain the Ribbon architecture in detail, and discuss in detail the Eureka-Server mentioned in Ribbon for the high-frequency questions in the interview. According to the startup mode of the previous article, we respectively launch the three ports of OrderServeice 7777, 888, 9999. When we open the http://localhost:8761/ page, we find that there are already three services named order-service. The port number is 777888 and the service of 9999 is registered.

Finally, we start the HystrixClient startup class, and then we first access the localhost:8088/test2 that sets the timeout to 10000ms. Because we set the sleep time in OrderService to 3000ms, we can return the request within the timeout, so we don't have to call fallback.

Detailed explanation of Hystrix Core configuration

Configuration of Execution-related properties

Hystrix.command.default.execution.isolation.strategy

Isolation policy. Default is Thread. Optional Thread,Semaphore

Hystrix.command.default.execution.isolation.thread.timeoutInMilliseconds

The timeout for command execution is 1000ms by default.

Hystrix.command.default.execution.timeout.enabled

Whether or not timeout is enabled, true is enabled by default.

Hystrix.command.default.execution.isolation.thread.interruptOnTimeout

Whether the timeout is interrupted. Default is true.

Hystrix.command.default.execution.isolation.semaphore.maxConcurrentRequests

Theoretically, the principle of choosing semaphore size is the same as choosing thread size, but when choosing semaphore, each execution unit should be relatively small and fast (ms level), otherwise thread should be used. Semaphore should be a small part of the thread pool of the entire container (tomcat).

Fallback related configuration

Hystrix.command.default.fallback.isolation.semaphore.maxConcurrentRequests

If the number of concurrency reaches this set value, the request will be rejected and an exception will be thrown and fallback will not be called. The default is 10.

Hystrix.command.default.fallback.enabled

Whether to attempt to call hystrixCommand.getFallback () when execution fails or the request is rejected. Default true.

Metrics related attribute configuration

Hystrix.command.default.metrics.rollingStats.timeInMilliseconds

Set the statistical time window value, millisecond value, the opening of circuit break will be calculated according to the statistics of 1 rolling window. If rolling window is set to 10000 milliseconds, rolling window is divided into n buckets, and each bucket contains statistics on the number of success,failure,timeout,rejection. The default is 10000.

Hystrix.command.default.metrics.rollingStats.numBuckets

Set the number of rolling window partitions, if numBuckets=10,rolling window=10000, then the time of a bucket is 1 second. Must conform to rolling window% numberBuckets = = 0. The default is 10.

Hystrix.command.default.metrics.rollingPercentile.enabled

Whether to calculate and track enable metrics at execution time. Default is true.

Hystrix.command.default.metrics.rollingPercentile.timeInMilliseconds

Set the time of rolling percentile window. Default is 60000.

Hystrix.command.default.metrics.rollingPercentile.numBuckets

Sets the numberBuckets of the rolling percentile window. The logic is the same. The default is 6.

Hystrix.command.default.metrics.rollingPercentile.bucketSize

If bucket size=100,window=10s, if there are 500 executes in these 10 seconds, only the last 100 executions will be counted in bucket. Increasing this value increases memory overhead and sorting overhead. The default is 100.

Hystrix.command.default.metrics.healthSnapshot.intervalInMilliseconds

Record the interval between health snapshots (used to count success and error green), default 500ms.

ThreadPool related attribute configuration

Hystrix.threadpool.default.coreSize

The maximum number of threads for concurrent execution. Default is 10.

Hystrix.threadpool.default.maxQueueSize

The maximum number of queues for BlockingQueue. When set to-1, SynchronousQueue is used, and LinkedBlcokingQueue is used when the value is positive. This setting is only valid at initialization, and the queue size of threadpool cannot be modified after that, unless reinitialising thread executor. The default is-1.

Hystrix.threadpool.default.queueSizeRejectionThreshold

Even if the maxQueueSize is not reached, the request will be rejected when the value of queueSizeRejectionThreshold is reached. Because maxQueueSize cannot be dynamically modified, this parameter will allow us to set the value dynamically. If maxQueueSize =-1, this field will not work.

Hystrix.threadpool.default.keepAliveTimeMinutes

If corePoolSize and maxPoolSize are set to the same (the default implementation) this setting is invalid.

Hystrix.threadpool.default.metrics.rollingStats.timeInMilliseconds

The time of thread pool statistics metrics. Default is 10000.

Hystrix.threadpool.default.metrics.rollingStats.numBuckets

Rolling window is divided into n buckets, with a default of 10.

Fallback degradation of Hystrix fault-tolerant mechanism ‍

1. What is a demotion?

Degradation, usually refers to the transaction peak, in order to ensure the normal operation of the core service, it is necessary to stop some less important services, or when some services are not available, execute standby logic to quickly fail or return quickly from the fault service. to ensure that the principal business will not be affected.

The downgrade provided by Hystrix is mainly for fault tolerance, ensuring that the current service is not affected by dependent service failures, so as to improve the robustness of the service.

To support fallback or degraded processing, you can override the getFallBack method of HystrixCommand or the resumeWithFallback method of HystrixObservableCommand.

two。 Under what circumstances will you be demoted?

From the workflow flow chart of Hystrix, we can see that the following situations will follow the downgrade logic.

1. Circuit breaker open

two。 The thread pool or semaphore is full.

3.command execution exception

4. Execution timeout

3. What are the ways to deal with fallback and demotion?

Quick failure: thrown directly after a failure without processing.

Silent failure: after a failure, meaningless content is returned, such as null, empty Map, etc., and the fault will be shielded.

Static failure: in this configuration, if a failure occurs, the static default value will be returned. If the return value is boolean, the result will be the default true.

Stubbed: this configuration is suitable for situations where the return value is a composite object. In the event of a failure, an instance of the composite object is manually created, which often contains some default values or error messages.

Dependent caching: in this case, when the underlying service fails, the previous old data is fetched from the cache for use.

Primary and secondary mode: this is a special use of fallback degradation.

Primary and secondary mode explanation:

Sometimes, we may encounter such a scene. For a certain business, there may be two processing schemes. Scheme An is efficient, but it is not tested on a large scale and can not guarantee its reliability. Plan B is conservative and inefficient, but it will not appear. At this point, we can try to adopt the primary and secondary mode. The main process runs based on scheme A, and fallback runs based on scheme B. In the course of operation, if there is no error, plan A will be used all the time, and plan A can be degraded by fallback and quickly switched to scheme B to avoid the uncontrolled diffusion of the problem.

Construction of Hystrix monitoring system

What is Hystrix Dashboard?

Hystrix provides quasi-real-time call monitoring (Hystrix DashBoard). Hystrix will continuously record the execution information of requests initiated through Hystrix and show them to customers in the form of statistical reports and graphics, including how many requests are executed per second, how many requests are successful, how many requests fail, and so on.

Netflix monitors the above indicators through the Hystrix-metics-event-stream project, and SpringCloud also provides the integration of Hystrix DashBoard to convert the monitoring content into a visual interface, so that users can directly see the status of the service and cluster. In practice, we often use it in combination with Turbine.

Start building Hystrix Dashboard

The construction of Hystrix Dashboard is actually very simple, which is divided into three steps:

Create a monitoring Hystrix Dashboard project module

Configure application.yml

Configure Startup Class

First let's create a HystrixDashboard project and configure the pom.xml file as follows

Configure port to 9001

Server.port = 9001

Configure startup classes and add @ EnableHystrixDashboard annotations

Visit http://localhost:9001/hystrix after startup as long as we can see a porcupine, it means that the startup is successful.

Thank you for your reading, the above is the content of "what is Hystrix", after the study of this article, I believe you have a deeper understanding of what is Hystrix, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.