How Go Scheduler handles thread blocking 04/20 Update SLTechnology News&Howtos

How Go Scheduler handles thread blocking

2025-04-20 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article focuses on "how the Go Scheduler handles thread blocking". Interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn how the Go scheduler handles thread blocking.

How to make our system faster

With the rapid development of information technology, the processing power of a single server is getting stronger and stronger, forcing the programming mode to upgrade from the previous serial mode to the concurrent model.

Concurrency models include IO multiplex, multiprocess and multithreading, all of which have their own advantages and disadvantages. Modern complex high concurrency architectures mostly use several models together, and different scenarios apply different models to enhance their strengths and avoid weaknesses, and give full play to the maximum performance of the server.

Multithreading, because of its lightweight and easy to use, has become the most frequently used concurrency model in concurrent programming, including other sub-products such as post-derived co-programs, are also based on it.

Concurrent ≠ parallelism

Concurrency and parallelism are different.

On a single CPU core, threads switch tasks through time slices or relinquish control to achieve the purpose of running multiple tasks "at the same time", which is called concurrency. But in fact, only one task is executed at any time, and the other tasks are queued by some algorithm.

Multicore CPU allows "multiple threads" in the same process to run at the same time, which is called parallelism.

Processes, threads, collaborators

Process: the process is the basic unit of resource allocation in the system, which has independent memory space.

Threads: threads are the basic unit of CPU scheduling and dispatching. Threads are attached to the process, and each thread shares the resources of the parent process.

Co-program: the co-program is a kind of lightweight thread in user mode, the scheduling of the co-program is completely controlled by the user, and the switching between the co-programs only needs to save the context of the task without kernel overhead.

Thread context switch

Due to interrupt processing, multitasking, user mode switching and other reasons, CPU will switch from one thread to another. The switching process needs to save the state of the current process and restore the state of another process.

Context switching is expensive because it takes a lot of time to swap threads on the core. The delay of context switching depends on different factors, probably between 50 and 100 nanoseconds. Considering that the hardware executes an average of 12 instructions per nanosecond on each core, a context switch may take a delay of 6 to 1200 instructions. In fact, context switching takes up a lot of time for programs to execute instructions.

If there is cross-core context switching (Cross-Core Context Switch), it may cause CPU cache invalidation (the cost of CPU accessing data from cache is about 3 to 40 clock cycles, and the cost of accessing data from main memory is about 100 to 300 clock cycles), and switching costs in this scenario can be more expensive.

Golang is created for concurrency

Since its official release in 2009, Golang has quickly occupied the market share by relying on its extremely high running speed and efficient development efficiency. Golang supports concurrency from the language level, and programs run concurrently through lightweight co-programming Goroutine.

Goroutine is very lightweight, mainly reflected in the following two aspects:

Context switching costs are small: Goroutine context switching only involves the value modification of three registers (PC / SP / DX), while the context switching of comparison threads involves mode switching (from user mode to kernel mode), and 16 registers, PC, SP. Refresh of equal registers

Low memory footprint: thread stack space is usually 2K at least 2m Goroutine stack space

Goroutine running at 10w level can be easily supported in Golang programs, and when the number of threads reaches 1k, the memory footprint has reached 2G.

Go scheduler implementation mechanism:

Go programs schedule Goroutine execution on kernel threads through a scheduler, but Goroutine does not directly bind OS threads M-Machine to run, but P-Processor (logical processor) in Goroutine Scheduler acts as an "intermediary" to obtain kernel thread resources.

The Go scheduler model is often called the G-P-M model, which includes four important structures, namely, G, P, M, and Sched:

G:Goroutine, each Goroutine corresponds to a G structure. G stores the running stack, state and task functions of the Goroutine, and can be reused.

G is not the executor, and each G needs to be bound to P before it can be scheduled for execution.

P: Processor, which represents the logic processor, for G, P is equivalent to CPU core, and G can be scheduled only if it is bound to P. For M, P provides the relevant execution environment (Context), such as memory allocation status (mcache), task queue (G) and so on.

The number of P determines the maximum number of parallel G in the system (premise: the number of physical CPU cores > = P).

The number of P is determined by the GoMAXPROCS set by the user, but no matter how large the GoMAXPROCS is set, the maximum number of P is 256.

M: Machine,OS kernel thread abstraction represents the real computing resources. After binding a valid P, it enters the schedule loop, while the mechanism of the schedule loop is roughly obtained from the Global queue, P's Local queue and the wait queue.

The number of M is variable and is adjusted by Go Runtime. In order to prevent the system from being unable to schedule due to the creation of too many OS threads, the default maximum limit is 10000.

M does not retain the G state, which is the basis on which G can be scheduled across M.

Sched:Go scheduler, which maintains queues that store M and G, as well as some status information of the scheduler.

The mechanism of the scheduler loop is roughly to get G from various queues and local queues of P, switch to the execution stack of G and execute the function of G, call Goexit to do the cleaning work and return to M, and so on.

To understand the relationship among M, P and G, we can explain the relationship between M, P and G through the classical model of hamster pushing and moving bricks.

The task of Gopher is: there are several bricks on the construction site, and the gophers transport them to the fire with the help of a car to burn them. M can be regarded as the gopher in the picture, P is the car, and G is the brick in the car.

After figuring out the relationship between the three, let's focus on how the gophers are carrying bricks.

Processor (P):

Create a batch of cars (P) according to the GoMAXPROCS value set by the user.

Goroutine (G):

The Go keyword is used to create a Goroutine, which is equivalent to making a brick (G), and then putting the brick (G) into the current car (P).

Machine (M):

Gophers (M) cannot be created from the outside, but there are too many bricks (G) and too few gophers (M) to be too busy. There happens to be a free car (P) unused, so borrow some more gophers (M) from elsewhere until the car (P) is used up.

There is a gopher (M) insufficient process to borrow a gopher (M) from elsewhere, which is to create a kernel thread (M).

It should be noted that the gopher (M) cannot transport bricks without a car (P). The number of trolleys (P) determines the number of gophers (M) that can work. In the Go program, it corresponds to the number of active threads.

In the Go program, we show the G-P-M model with the following illustration:

P represents a logical processor that can run "in parallel", and each P is assigned to a system thread Mmaine G to represent the Go protocol.

There are two different run queues in the Go scheduler: the global run queue (GRQ) and the local run queue (LRQ).

Each P has a LRQ for managing the Goroutines assigned to be executed in the context of P, and these Goroutine are in turn context switched with the M bound by P. GRQ applies to Goroutines that has not been assigned to P.

As you can see from the figure above, the number of G can be much larger than the number of M. in other words, Go programs can use a small number of kernel-level threads to support a large number of Goroutine concurrency. Multiple Goroutine share the computing resources of kernel thread M through user-level context switching, but there is no performance loss caused by thread context switching for the operating system.

In order to make better use of the computing resources of threads, Go scheduler adopts the following scheduling strategies:

Task theft (work-stealing)

We know that in reality, some Goroutine runs fast and some are slow, so the problem that must be brought is that the Go will definitely not allow the P to touch fish, so it is necessary to make full use of computing resources.

In order to improve the parallel processing ability of Go and improve the overall processing efficiency, the scheduler allows G execution from GRQ or other P's LRQ when the G tasks between each P are unbalanced.

Reduce congestion

What if the executing Goroutine blocks thread M? will the Goroutine in the LRQ on P not get the schedule?

Blocking in Go is mainly divided into the following four scenarios:

Scenario 1: due to Goroutine blocking caused by atomic, mutex, or channel operation calls, the scheduler will switch out the currently blocked Goroutine and reschedule other Goroutine on the LRQ

Scenario 2: what will our G and M do with Goroutine blocking due to network requests and IO operations?

The Go program provides a network poller (NetPoller) to handle network requests and IO operations, and its background implements IO multiplexing through kqueue (MacOS), epoll (Linux) or iocp (Windows).

By using NetPoller to make network system calls, the scheduler can prevent Goroutine from blocking M when making these system calls. This allows M to execute other Goroutines in P's LRQ without creating a new M. Helps to reduce the scheduling load on the operating system.

The following figure shows how it works: G1 is executing on M, and there are three Goroutine waiting to be executed on LRQ. The network poller is idle and doing nothing.

Next, G1 wants to make network system calls, so it is moved to the network poller and handles asynchronous network system calls. M can then perform another Goroutine from LRQ. At this point, G2 is switched to M by context.

Finally, the asynchronous network system call is completed by the network poller, and G1 is moved back to the LRQ of P. Once G1 can switch context on M, the Go-related code it is responsible for can be executed again. The biggest advantage here is that no additional M is needed to perform network system calls. The network poller uses system threads, which handle an effective event loop at all times.

This method of calling looks complicated, but fortunately, the Go language hides this "complexity" in Runtime: Go developers do not need to care whether socket is non-block or register the callback of file descriptors themselves, they only need to treat socket processing in the Goroutine corresponding to each connection in the way of "block Imax O". The simple network programming mode of goroutine-per-connection is implemented (but a large number of Goroutine also brings additional problems, such as stack memory increase and scheduler burden).

In the eyes of the user layer, the "block socket" in Goroutine is actually "simulated" through the netpoller in Go runtime through the Non-block socket + Icano multiplexing mechanism. The net library in Go is implemented in this way.

Scenario 3: when calling some system methods, if blocking occurs when the system method is called, in this case, the network poller (NetPoller) cannot be used, and the Goroutine that makes the system call will block the current M.

Let's take a look at a situation where synchronous system calls (such as the file Iploo) cause M blocking: G1 will synchronize system calls to block M1.

After the scheduler intervenes: it is recognized that G1 has caused M1 blocking. At this time, the scheduler separates M1 from P and also takes G1 away. The scheduler then introduces a new M2 to serve P. At this point, you can select G2 from LRQ and switch context on M2.

After the blocked system call is complete: G1 can be moved back to LRQ and executed by P again. If this happens again, M1 will be set aside for future reuse.

Scenario 4: if you perform a sleep operation on Goroutine, M is blocked.

There is a monitoring thread sysmon in the background of the Go program, which monitors those long-running G tasks and then sets identifiers that can be preempted, and other Goroutine can come in and execute them first.

As long as the next time the Goroutine makes a function call, it will be preempted and the scene will be protected, and then put back into the local queue of P for the next execution.

At this point, I believe you have a better understanding of "how the Go scheduler handles thread blocking". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.