What is the asynchronous programming mode of Rust? 07/08 Update SLTechnology News&Howtos

What is the asynchronous programming mode of Rust?

2025-07-08 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article mainly explains "what is the Rust asynchronous programming mode". The content of the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "what is the Rust asynchronous programming mode".

How does the scheduler work?

The scheduler, as its name implies, is how to schedule program execution. Generally speaking, the program is divided into many "units of work", which we call a task. A task is either runnable or suspended (idle or blocked). Tasks are independent of each other because tasks that are "runnable" can be executed concurrently. The job of the scheduler is to perform the task until the task is suspended. The implicit essence of this process is how to allocate global resources to tasks-- CPU time.

The following content only revolves around the scheduler of "user space". Readers with basic knowledge of the operating system should understand that it refers to the scheduler that runs on the operating system thread, which is scheduled by the kernel scheduler.

The Tokio scheduler executes Rust's future, just like when we discuss threading models such as Java language and Go language, Rust's future can be understood as the "asynchronous green thread" of the Rust language, which is the Mfuture N mode, and many user-space tasks are multiplexed on a small number of system threads.

There are many design patterns for schedulers, each of which has its own advantages and disadvantages. But in essence, the scheduler can be abstracted as a queue and a processor that constantly consumes the tasks in the queue, which can be represented in pseudocode as follows:

While let Some (task) = self.queue.pop {task.run;}

When the task becomes runnable, it is inserted into the queue:

Although we can design resources, tasks, and processors to exist in a single thread, Tokio still chooses the multithreaded model. Modern computers have multiple CPU and multiple physical cores, and the use of a single-threaded model scheduler severely limits resource utilization, so in order to squeeze the ability of all CPU or physical cores as much as possible, you need to:

Single global task queue, multiprocessor

Multitasking queues, each with a separate processor

Single queue + multiprocessor

In this model, there is a global task queue. When a task is in the runnable state, it is inserted at the end of the task queue. Processors run on different threads, and each processor takes the task from the queue header and "consumes". If the queue is listed as empty, all threads (and corresponding processors) are blocked.

The task queue must support multiple producers and consumers. The common algorithm here is to use intrusive linked lists, where intrusive means that the queued task itself needs to contain a pointer to the next (later) task. In this way, memory allocation can be avoided during insert and pop-up operations, while insert operations are unlocked, but pop-up operations require a semaphore to coordinate multiple consumers.

This approach is mostly used to implement general thread pool scenarios, and it has the following advantages:

Tasks will be dispatched fairly.

The implementation is relatively simple and clear.

Fair scheduling mentioned above means that all tasks are scheduled on a first-in-first-out basis. This approach is not efficient enough in some scenarios, such as parallel computing scenarios using fork-join. Because the only important consideration is the calculation speed of the final result, not the fairness of the subtasks.

Of course, this scheduling model also has shortcomings. All processors (consumers) stick to the queue header, causing the processor to actually execute the task in much longer time than the task pops up from the queue, which is beneficial in long-time task scenarios because queue contention is reduced. However, Rust's asynchronous tasks are designed to be short-consuming, and the overhead of contention queues becomes high.

Concurrency and "mechanical empathy"

Readers must have heard the expression "specially optimized for the xxx platform", because only by fully understanding the hardware architecture can you know how to maximize the use of hardware resources and design the programs with the highest performance. This is the so-called "mechanical empathy", a term originally proposed and used by Martin Thompson.

As for the details of how to handle concurrency under modern hardware architecture is beyond the scope of this article, interested readers can also read the more Resources section at the end of the article.

In general, hardware achieves performance improvement not by increasing speed (frequency) but by providing more CPU cores for programs. Each core can perform a large number of calculations in a very short period of time, while operations such as accessing memory take more time. Therefore, in order to make the program run faster, we must maximize the number of CPU instructions per memory access. Although compilers can help us do a lot of things, as programmers, we need to carefully consider the structural layout of data in memory and the patterns of access to memory.

CPU's cache consistency mechanism works when it comes to thread concurrency, ensuring that each CPU's cache is up-to-date.

So obviously, we should avoid cross-thread synchronization as much as possible, because it is a performance killer.

Multiprocessor + multitask queue

Compared with the previous model, in this way, we use multiple single-threaded schedulers, each processor has its own exclusive task queue, which completely avoids the synchronization problem. Because Rust's task model requires that any thread can submit a task to the queue, we still need to design a thread-safe approach. Either each processor's task queue supports thread-safe insert operations (MPSC), or each processor has two queues: asynchronous queues and thread-safe queues.

This is the strategy used by Seastar. Because synchronization is almost completely avoided, the performance is very high. It is important to note, however, that this is not a panacea, because there is no guarantee that the task load is completely uniform, and the processor may experience a serious load imbalance, resulting in low resource utilization. The usual scenario is that the task is glued to a fixed, specific processor.

It is well known that real-world task loads are not uniform, so avoid using this model when designing a general scheduler.

"mission theft" scheduler

Generally speaking, the task theft scheduler is based on the fragmented scheduling model, mainly to solve the problem of low resource utilization. Each processor has its own exclusive task queue, and runnable tasks are inserted into the queue of the current processor and are only consumed (executed) by the current processor. But cleverly, when a processor is idle, it checks the task queues of other processors at the same level to see if it can "steal" some tasks to execute. This is the meaning of the name of this model. Eventually, the processor goes to sleep only if it cannot get a task from the task queue of another processor.

This is almost the best of both worlds. The processor can run independently, avoiding synchronization overhead. And if the task load is unevenly distributed among processors, the scheduler can also redistribute the load. It is precisely because of this feature that "task theft" schedulers are used in languages such as Go, Erlang, Java and so on.

Of course, it also has its drawbacks, and that is its complexity. The task queue must support "steal" operations and require some cross-processor synchronization. If the whole process is not executed correctly, the cost of "stealing" exceeds the benefits of the model itself.

Let's consider a scenario: processor An is currently executing a task, and its task queue is empty at the moment; processor B is idle, and it tries to "steal" the task but fails, so it goes to sleep. Then, the task performed by processor A produces 20 (sub) tasks. The goal is to wake up processor B. This in turn requires the scheduler to send a signal to the dormant processor when a new task is observed in the task queue. Obviously, additional synchronization is required in such a scenario, but this is exactly what we want to avoid.

To sum up:

It's always good to minimize synchronization.

"Task stealing" is the preferred algorithm for general schedulers

Processors are basically independent of each other, but "theft" operations inevitably require some synchronous operations.

Tokio 0.1Scheduler

In March 2018, Tokio released its first version of its scheduler based on the "task theft" algorithm. But there are some flaws in the implementation of that version:

First, the I CPU O task will be executed by threads that operate the Imax O selector (epoll, kqueue, iocp, etc.) at the same time; more tasks bound to it will go into the thread pool. In this case, the number of active threads should be flexible and dynamic, so it is reasonable to turn off idle threads. However, it makes more sense to keep a small number of active threads at all times when executing all asynchronous tasks on the Task Theft scheduler.

Secondly, the queue based on Chase-Lev deque algorithm was used at that time, which was later proved not suitable for scheduling independent asynchronous task scenarios.

Third, the implementation is too complex. Due to the excessive use of atomic in the code, however, in most cases, mutex is the better choice.

Finally, there are many small inefficient designs and implementations in the code, but there are some technical debts in the early days to ensure the stability of API.

Of course, with the release of the new version of Tokio, we have learned a lot of lessons and repaid a lot of technical debt, which is really exciting!

Next Generation Tokio Scheduler

Now let's take a closer look at the changes to the new scheduler.

New mission system

First of all, the important highlight is not part of Tokio, but it is crucial to achieving our achievements: std includes a new task system designed by Taylor Cramer. The system provides a hooks to the scheduling system to facilitate the scheduler to perform Rust asynchronous tasks, and it does a really good job, which is lighter and more flexible than the previous version.

The Waker structure is saved by resources to indicate that tasks can be run and pushed to the run queue of the scheduler. In the new task system, the Waker structure used to be larger, but the pointer width is two pointers. Reducing the size is important to minimize the overhead of copying Wake values and to take up less space in the structure, thus allowing more critical data to be placed in the cache row. A custom vtable design can achieve many optimizations, which will be discussed later.

Better task queue

Task queue is the core of the scheduler and the most critical component. The original tokio scheduler uses crossbeam's deque implementation, that is, single-producer, multi-consumer deque. The task enters the team at one end and leaves the team at the other end. In most cases, the queuing thread will dequeue it, however, other threads will occasionally go out of the queue to "steal". Deque contains an array and a set of indexes that track headers and tails. When the deque is full, enrollment data will lead to an increase in storage space. A new, larger array is allocated and the value is moved to the new store.

The ability of double-ended queues to grow involves complexity and operating costs. This situation must be taken into account in team-in / out-of-team operations. In addition, releasing the original array creates additional difficulties as the queue grows. In the garbage collection language, gc releases it. However, rust does not have GC, which means that the program is responsible for freeing the array, but the thread may be accessing memory concurrently. Crossbeam's answer is to adopt a generation-based recycling strategy. Although the overhead is not very large, it does add a lot of overhead to the code hot path. Every time you enter and exit a critical section, each operation must be an atomic RMW (read modify write) operation to signal other threads.

Since the costs associated with growing local queues are not low, it is worth studying whether there is a need to grow queues. This problem eventually leads to a rewrite of the scheduler. The strategy of the new scheduler is to use a fixed size for each queue. When the queue is full, the task is pushed to a global, multi-user, multi-producer queue instead of growing the local queue. The processor needs to check this global queue, but much less frequently than the local queue.

In earlier experiments, bounded mpmc queues were used instead of Crossbeam queues. Because both push and pop perform a lot of synchronization, they don't bring much improvement. A key point to keep in mind about stealing tasks is that there is little contention for queues when there is a load, because each processor only accesses its own queue.

At this point, I read the go source code carefully and found that it uses a fixed-size single-producer, multi-consumer queue. This queue requires very little synchronization to work properly. I made some modifications to the algorithm to make it suitable for the tokio scheduler. It is worth noting that the atomic operations in the go implementation version use sequential consistency (based on my limited knowledge of go). As part of tokio Scheduler, this version also reduces some replication in cooler code paths.

The queue implementation is a circular buffer that uses an array to store values. Atomic integers are used to track the position of the head and tail.

Struct Queue {/ Concurrently updated by many threads. Head: AtomicU32, / Only updated by producer thread but read by many threads. Tail: AtomicU32, / Masks the head / tail position value to obtain the index in the buffer. Mask: usize, / Stores the tasks. Buffer: Box,}

Enlistment is done by a separate thread:

Loop {let head = self.head.load (Acquire); / / safety: this is the * * only** thread that updates this cell. Let tail = self.tail.unsync_load; if tail.wrapping_sub (head)

< self.buffer.len as u32 { // Map the position to a slot index. let idx = tail as usize & self.mask; // Don't drop the previous value in `buffer[idx]` because // it is uninitialized memory. self.buffer[idx].as_mut_ptr.write(task); // Make the task available self.tail.store(tail.wrapping_add(1), Release); return; } // The local buffer is full. Push a batch of work to the global // queue. match self.push_overflow(task, head, tail, global) { Ok(_) =>

Return, / / Lost the race, try again Err (v) = > task = v,}}

Note that in this push function, the only atomic operations are load with Acquire order and store with Release order. There are no read-modify-write operations (compare_and_swap,fetch_and, etc.) or sequential consistency. Because on x86 chips, all load/store are already "atomic". Therefore, at the CPU level, this feature is not synchronous. Using atomic operations affects the compiler because it blocks some optimizations, but that's it. The first load can probably be done through the Relaxed order, but switching to the Relaxed order has no significant benefit.

Push_overflow is called when the queue is full. This feature moves half of the tasks in the local queue to the global queue. A global queue is an intrusive list protected by mutexes. First link the tasks to be moved to the global queue, then acquire the mutex and write all tasks by updating the tail pointer of the global queue.

If you are familiar with the details of atomic memory operations, you may notice that the push function shown in the figure above may cause "problems". The semantics of atomic load synchronization using Acquire order is very weak. It may return the old value (the concurrent theft operation may have increased the value of self.head), but the thread that executes the queue will read the old value in the thread. This is not a problem for the correctness of the algorithm. In the code path of queuing, we are only concerned about whether the local run queue is full. Given that the current thread is the only thread that can perform queuing operations, the old value will make the queue fuller than it actually is. It may mistakenly assume that the queue is full and enter the push_overflow function, but this function includes stronger atomic operations. If push_overflow determines that the queue is actually not full, return w / Err and try the push operation again. This is another reason why push_overflow moves half of the running queues to the global queue. By moving half of the queues, the false alarm rate of "running queue listed as empty" is reduced.

The local message is also very lightweight:

Loop {let head = self.head.load (Acquire); / / safety: this is the * * only** thread that updates this cell. Let tail = self.tail.unsync_load; if head = = tail {/ / queue is empty return None;} / / Map the head position to a slot index. Let idx = head as usize & self.mask; let task = self.buffer [IDX] .as _ ptr.read; / / Attempt to claim the task read above. Let actual = self .head. Engineer _ and_swap (head, head.wrapping_add (1), Release); if actual = = head {return Some (task.assume_init);}}

In this function, there is a single atomic load and a compare_and_swap with release. The main overhead comes from compare_and_swap.

The stealing function is similar to queuing, but self.tail 's load must be atomic. Similarly, similar to push_overflow, stealing attempts to steal half of the queue, rather than a single task. This is a good feature, which we will introduce later.

The last part is the global queue. This queue is used to handle overflows of local queues and to submit tasks from non-processor threads to the scheduler. If the processor is loaded, there are tasks in the local queue. After performing about 60 tasks from the local queue, the processor attempts to get the task from the global queue. When in the search state, it also checks the global queue, as described below.

Optimize messaging mode

Applications written in Tokio are usually modeled on many small independent tasks. These tasks will use messages to communicate with each other. This pattern is similar to other languages such as Go and Erlang. Considering the universality of this pattern, it makes sense for the scheduler to try to optimize it.

Given task An and task B. Task An is currently executing and sends a message to Task B via channel. The channel is that task B is currently blocked on the channel, so sending a message will cause task B to transition to runnable state and be queued to the current processor's run queue. The processor then pops the next task from the run queue, executes the task, and then repeats until task B is completed.

The problem is that there can be a significant delay between sending the message and executing task B. In addition, "hot" data, such as messages, is stored in the CPU cache when it is sent, but may have been purged from the cache by the time Task B is scheduled.

To solve this problem, the new Tokio scheduler implements specific optimizations (can also be found in the Go and Kotlin schedulers). When a task transitions to a runnable state, it is stored in the next Task slot instead of queuing it to the back of the queue. The processor always checks the slot before checking the run queue. When you insert a task into this slot, if the task is already stored in it, the old task is removed from the slot and queued to the back of the queue. In the case of message delivery, this ensures that the recipient of the message will be scheduled immediately.

Mission theft

In the steal Task Scheduler, when the running queue of the processor is empty, the processor will attempt to steal the task from the peer processor. Randomly select the peer processor, and then perform a theft operation on the peer processor. If no task is found, try the next peer processor, and so on until the task is found.

In fact, many processors usually finish processing their run queues at about the same time. This happens when a batch of tasks arrives (for example, when polling epoll to make sure socket is ready). The processor is awakened to get and run the task. This causes all processors to try to steal at the same time, meaning that multiple threads are trying to access the same queue. This will lead to competition. Random selection of initial nodes helps to reduce contention, but it is still bad.

The new scheduler limits the number of processors that perform stealing operations concurrently. We refer to the processor state we are trying to steal as "searching for tasks", or "searching" for short. The number of concurrency is controlled by using atomic counting to ensure that the processor increments before the search starts and decreases when it exits the search state. The maximum number of search programs is half of the total number of processors. Although the restrictions are quite hasty, they can still work. We do not have a hard limit on the number of search programs, we only need to cut costs and exchange accuracy for algorithm efficiency.

When in the searching state, the processor attempts to steal the task from the peer task thread and check the global queue.

Reduce cross-thread synchronization

Another key part of the task theft scheduler is peer notification. This is where the processor notifies the peer when observing a new task. If other processors are dormant, the task is awakened and stolen. Notification has another important responsibility. Review queuing algorithms that use weak atomic order (get / publish). Because the atomic memory order works without additional synchronization, there is no guarantee that the peer processor will know that the tasks in the queue are stolen. Therefore, the notification action is also responsible for establishing the necessary synchronization for the peer processor to let it know the task in order to steal it. These requirements make notification operations expensive. Our goal is to perform as few notification operations as possible while ensuring CPU utilization. Too much notification can lead to group problems.

The old version of the Tokio scheduler adopted a simple notification method. The processor is notified whenever a new task is pushed to the run queue. Whenever the processor finds a task when it wakes up, it notifies another processor. This logic causes all processors to wake up and cause contention. Usually most of these processors can't find the task and then go back to sleep.

The new scheduler has been significantly improved by using similar techniques in the Go scheduler. The new scheduler executes in the same place, but notifies only if there is no worker in the search state. After notifying worker, it immediately transitions to the search state. When a processor in the search state finds a new task, it first exits the search state and then notifies the next processor.

This method is used to limit the rate at which the processor wakes up. If one batch of tasks is scheduled at a time (for example, when polling epoll to ensure that the socket is ready), the processor is notified of the first task and is then in a search state. The processor is not notified of the remaining tasks in the batch. The handler responsible for the notification steals half of the task in the batch and then notifies another processor. The third processor will be woken up, look for tasks from the first two processors, and then steal half of them. In this way, the processor load will rise smoothly and the task will achieve fast load balancing.

Reduce memory allocation

The new Tokio scheduler only needs to allocate memory once for each task, while the old scheduler needs to allocate memory twice. Previously, the Task structure was as follows:

Struct Task {/ All state needed to manage the task state: TaskState, / The logic to run is represented as a future trait object. Future: Box,}

Then, the Task structure will also be placed in the Box. Since the release of the old Tokio scheduler, two things have happened. First of all, std:: alloc is stable. Second, the Future task system switches to the explicit vtable policy. With these two conditions, we can reduce the memory allocation once.

Now, the task is represented as:

Struct Task {header: Header, future: T, trailer: Trailer,}

Both Header and Trailer are the states required to perform tasks, which are divided into "hot" data (header) and "cold" data (trailer), that is, frequently accessed data and rarely used data. The thermal data is placed at the head of the structure and kept as small as possible. When CPU dereferencing a task, it loads the cache row size of data (between 64 and 128bytes) at one time. We want the data to be as valuable as possible.

Reduce atomic reference count

The final optimization is how the new scheduler reduces the atomic reference count. There are many unfinished references to the task structure: the scheduler and each wake-up program have a handle. The way to manage this memory is to use atomic reference counting. This strategy requires an atomic operation each time a reference is cloned and the opposite atomic operation each time a reference is deleted. When the final number of references is 0, memory is freed.

In the old Tokio scheduler, each wake-up had a reference count to the task handle:

Struct Waker {task: Arc,} impl Waker {fn wake (& self) {let task = self.task.clone; task.scheduler.schedule (task);}}

After waking up the task, the clone method (atomic increment) of task is called. Then place the reference in the run queue. When the processor finishes the task, it deletes the reference, causing the atomic decrement of the reference count. Although these atomic operations are inexpensive, they add up.

The designer of the std:: future task system has identified this problem. It has been observed that when Waker:: wake is called, the original waker reference is usually no longer needed. This allows you to reuse atomic counts when pushing tasks into the run queue. The std:: future task system now includes two Wake-up API:

Wake with self parameter

Wake_by_ref takes the & self parameter.

This API design forces callers to use the wake method to avoid atomic increments. Now the implementation becomes:

Impl Waker {fn wake (self) {task.scheduler.schedule (self.task);} fn wake_by_ref (& self) {let task = self.task.clone; task.scheduler.schedule (task);}}

This avoids the overhead of additional reference counting, but this is only available when ownership can be acquired. In my experience, wake is almost always called by borrowing rather than getting references. Waking up with self prevents reuse of waker, and thread-safe wake-up is more difficult when using self (details will be left to another article).

The new scheduler side solves the problem step by step by avoiding calling clone in wake_by_ref, so it is as effective as wake (self). This is done by having the scheduler maintain a list of all tasks that are currently active (not yet completed). This list represents the number of references required to push the task to the running queue.

The difficulty with this optimization is to ensure that the scheduler does not delete any tasks from its list before the task ends. The details of how to manage are beyond the scope of this article. If you are interested, please refer to the source code.

Using Loom for fearless concurrency

It is well known that it is not easy to write correct, concurrency-safe, lock-free code, and correctness is the most important, especially to avoid code defects associated with memory allocation. In this regard, the new version of the scheduler has made a lot of efforts, including a lot of optimization and avoiding the use of most std types.

From a testing perspective, there are usually several ways to verify the correctness of concurrent code. One is to rely entirely on users to validate in their usage scenarios; the other is to run various granularity unit tests that rely on loops to try to capture the concurrency defects of extreme cases with very low probability. In this case, how long does the cycle run properly becomes another question, 10 minutes or 10 days?

The above situation is unacceptable in our work, we want to deliver and release with full confidence, for Tokio users, reliability is the most important.

So we created a "new wheel": Loom, a tool for testing concurrent code. Test cases can be designed and written in the most mundane way, but when executed through Loom, Loom runs multiple use cases and permute all possible behaviors or results in a multithreaded environment, and in the process Loom verifies that memory access is correct, memory allocation and release behavior is correct, and so on.

Here is a real test scenario of the scheduler on Loom:

# [test] fn multi_spawn {loom::model (| | {let pool = ThreadPool::new; let C1 = Arc::new (AtomicUsize::new (0)); let (tx, rx) = oneshot::channel; let tx1 = Arc::new (Mutex::new (Some (tx); / / Spawn a task let c2 = c1.clone; let tx2 = tx1.clone Pool.spawn (async move {spawn (async move {if 1 = = c1.fetch_add (1, Relaxed) {tx1.lock.unwrap.take.unwrap.send ();}}); / / Spawn a second task pool.spawn (async move {spawn (async move {if 1 = = c2.fetch_add (1, Relaxed) {tx2.lock.unwrap.take.unwrap.send ();}); rx.recv;});})

The loom::model portion of the above code runs thousands of times, each with slightly different behaviors, such as the order in which threads switch, and with each atomic operation, Loom tries all possible behaviors (in accordance with the memory model specification in C++ 11). As I mentioned earlier, using Acquire for atomic loading is very weak (guaranteed), and old (dirty) values may be returned, and Loom will try all possible loaded values.

Loom plays a very important role in the daily development and testing of the scheduler, helping us to find and identify hidden defects missed by more than 10 other testing methods (unit testing, manual testing, stress testing).

Some readers may question the above-mentioned "permutation and replacement of all possible results or behaviors". It is well known that simple permutations and combinations of possible behaviors can lead to factorial "explosions". Of course, there are many algorithms to avoid this kind of exponential explosion, and the core algorithm used in Loom is based on the dynamic subset reduction algorithm (dynamic partial reduction). The algorithm can dynamically "prune" a permutation subset of consistent execution results, but it should be noted that even so, it can also lead to a significant reduction in pruning efficiency when the state space is large. Loom uses a bounded dynamic subset reduction algorithm to limit the search space.

All in all, Loom has greatly helped us to release the new version of the scheduler with more confidence.

Test result

Let's take a look at how much performance improvement has been achieved in the new version of Tokio scheduler.

First of all, in the microbenchmark, the new version of the scheduler has improved significantly.

Old version

Test chained_spawn... Bench: 2019796 ns/iter (+ /-302168) test ping_pong... Bench: 1279948 ns/iter (+ /-154365) test spawn_many... Bench: 10283608 ns/iter (+ /-1284275) test yield_many... Bench: 21450748 ns/iter (+ /-1201337)

New version

Test chained_spawn... Bench: 168854 ns/iter (+ /-8339) test ping_pong... Bench: 562659 ns/iter (+ /-34410) test spawn_many... Bench: 7320737 ns/iter (+ /-264620) test yield_many... Bench: 14638563 ns/iter (+ /-1573678)

The tests include:

Chained_spawn tests recursively generate new subtasks.

The ping_pong test assigns an oneshot channel and then generates a new subtask on which the subtask sends the message and the original task accepts the message.

Spawn_many tests generate large quantum tasks and inject them into the scheduler.

Yield_many will test a task that awakens itself.

To get closer to real-world workloads, let's try the Hyper benchmark again:

Wrk-T1-c50-D10

Old version

Running 10s test @ http://127.0.0.1:3000 1 threads and 50 connections Thread Stats Avg Stdev Max + /-Stdev Latency 371.53us 99.05us 1.97ms 60.53% Req/Sec 114.61k 8.45k 133.85k 67.00% 1139307 requests in 10.00s, 95.61MB read Requests/sec: 113923.19 Transfer/sec: 9.56MB

New version

Running 10s test @ http://127.0.0.1:3000 1 threads and 50 connections Thread Stats Avg Stdev Max + /-Stdev Latency 275.05us 69.81us 1.09ms 73.57% Req/Sec 153.17k 10.68k 171.51k 71.00% 1522671 requests in 10.00s, 127.79MB read Requests/sec: 152258.70 Transfer/sec: 12.78MB

At present, the Hyper benchmark is more referential than TechEmpower, so from the results, we are very excited that the Tokio scheduler has been able to hit such performance rankings.

Another impressive result is that Tonic, the popular gRPC client / server framework, achieved more than 10% performance improvement, even without specific optimizations made by Tonic itself!

Thank you for your reading, the above is the content of "what is Rust asynchronous programming". After the study of this article, I believe you have a deeper understanding of what Rust asynchronous programming is, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.