What is the common design of high concurrency network thread model and the optimization practice of MongoDB thread model 07/02 Update SLTechnology News&Howtos

What is the common design of high concurrency network thread model and the optimization practice of MongoDB thread model

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

What is the common design of high concurrency network thread model and the optimization practice of MongoDB thread model? aiming at this problem, this article introduces the corresponding analysis and solution in detail, hoping to help more partners who want to solve this problem to find a more simple and feasible method.

The server usually needs to support high concurrent business access. How to design an excellent server network IO worker thread / process model plays a vital role in the high concurrent access requirements of the business.

1. Thread model one. Single-threaded Network IO reuse Model

Description:

All network IO events (accept events, read events, write events) are registered with the epoll event set

In the main loop, the epoll event information collected by the kernel state is obtained at one time through epoll_wait, and then the callback corresponding to each event is polled and executed.

Event registration, epoll_wait event acquisition, and event callback execution are all handled by one thread

1.1 composition of a complete request

A complete request processing process mainly consists of the following parts:

Step 1: get one-time network IO events through epoll_wait

Step 2: read data and parse the protocol

Step 3: perform business logic processing after successful parsing, and then reply to the client

1.2 the network thread model is flawed

All work is performed by one thread, including epoll event acquisition, event handling (data read and write), and as long as the event callback handling of any request is blocked, the other requests are blocked. For example, the hash structure of redis, if there are too many filed, assume that a hash key contains millions of filed, then when the Hash key expires, the entire redis blocks.

In the single-threaded work model, CPU will become a bottleneck. If the QPS is too high, the entire CPU load will reach 100%, and the delay jitter is severe.

1.3 typical cases

Redis caching

Twitter caching middleware twemproxy

1.4 main cycle workflow while (1) {/ / epoll_wait waits for network events, or returns if there are network events, or timeout range size_t numevents= epoll_wait (); / / traverses the previous network events obtained by epoll, and executes the corresponding event callback for (j = 0; j)

< numevents; j++) { if(读事件) { //读数据 readData()//解析parseData() //读事件处理、读到数据后的业务逻辑处理 requestDeal() } else if(写事件) { //写事件处理，写数据逻辑处理 writeEentDeal() } else {//异常事件处理errorDeal() } }} 说明：后续多线程/进程模型中，每个线程/进程的主流程和该while()流程一致。 1.5 redis源码分析及异步网络IO复用精简版demo 由于之前工作需要，需要对redis内核做二次优化开发，因此对整个redis代码做了部分代码注释，同时把redis的网络模块独立出来做成了简单demo，该demo对理解epoll网络事件处理及Io复用实现会有帮助，代码比较简短，可以参考如下项目: redis源码详细注释分析 redis网络模块精简版demo 推特缓存中间件twemproxy源码分析实现 2. 线程模型二. 单listener+固定worker线程该线程模型图如下图所示：

Description:

The listener thread is responsible for accepting all client links

Each time the listener thread receives a new client link, it generates a new fd and then sends it to the corresponding worker thread through the dispenser (hash mode)

After the worker thread obtains the corresponding new link fd, all subsequent network IO reads and writes on the link are handled by that thread

Assuming that there are 32 links, after 32 links are successfully established, each thread handles an average of 4 links for reading and writing, message processing, and business logic processing.

2.1 defects in the network threading model

There is only one listener thread for accept processing, and high concurrency scenarios can easily become a bottleneck in an instant.

A thread handles data reading and writing, message parsing and subsequent business logic processing of multiple linked fd through IO reuse. There will be a serious queuing phenomenon in this process. For example, if the internal processing time after receiving and parsing the message of a link is too long, the requests of other links will block the queue.

2.2 typical case

Memcache caching is suitable for caching scenes and proxy intermediate scenes with fast internal processing. Chinese analysis of memcache source code implementation can be detailed in: memcache source code implementation analysis

3. Thread model three. Fixed worker threading model (reuseport)

The prototype of the model is as follows:

Description:

Linux kernel 3.9 begins to support the reuseport function, and each new link acquired by the kernel protocol stack is automatically evenly distributed to the user-mode worker thread.

This model solves the problem of listener single point bottleneck in model 1. Multiple processes / threads can be used as listener at the same time, and all of them can be linked to accept clients.

3.1 defects in the network process / thread model

After reuseport support, the kernel distributes different new links to multiple user-mode worker processes / threads through load balancing, and each process / thread handles data reading and writing, message parsing and parsed business logic processing of multiple client new link fd through IO reuse. Each worker process / thread processes multiple link requests at the same time. If the internal processing time of one link message after receiving and parsing is too long, the requests of other links will block the queue.

Although this model solves the single point bottleneck problem of listener, the queuing problem within the worker thread is not solved.

However, as a seven-layer forwarding agent, Nginx is suitable for this model because it is processed in memory, so the internal processing time is short.

3.2 typical cases

Nginx (nginx uses processes, the same principle of the model), this model is suitable for scenarios with simple internal business logic, such as nginx agents, etc.

Reuseport support performance improvement process can refer to another article to share: the application of Nginx multi-process high concurrency, low latency and high reliability mechanism in caching (redis, memcache) twemproxy proxy

In addition, refer to the Chinese annotation analysis of the nginx source code

4. Thread model 4: one link, one thread model

The thread model diagram is shown below:

Description:

The listener thread is responsible for accepting all client links

Every time the listener thread receives a new client link, it creates a thread, which is only responsible for data reading and writing, message parsing and business logic processing on the link.

4.1 defects in the network threading model:

One link creates one thread. If there are 100000 links, then 100000 threads will be needed. If there are too many threads, the system will be responsible and memory will be consumed a lot.

When the link is closed, the thread also needs to be destroyed, and frequent thread creation and consumption further increase the load on the system.

4.2 typical case:

Mysql default mode and mongodb synchronization thread model configuration are suitable for scenarios where request processing is time-consuming, such as database service.

Apache web server, this model limits the performance of apache, nginx advantages will be more obvious

5. Thread model 5: single listener+ dynamic worker thread (single queue)

The thread model diagram is shown in the following figure:

Description:

After receiving a new link fd, the listener thread hands over the fd to the thread pool, and all subsequent reading and writing, message parsing and business processing of the link are handled by multiple threads in the thread pool.

The model converts a request into multiple tasks (network data reading and writing, message parsing, business logic processing after message parsing) to the global queue, and the threads in the thread pool get the task execution from the queue.

The same request access is split into multiple tasks, and a single request may be processed by multiple threads

When there are too many tasks and the system is under great pressure, the number of threads in the thread pool increases dynamically.

When the task is reduced and the system pressure is reduced, the number of threads in the thread pool decreases dynamically.

5.1 several statistics related to worker thread running time:

T1: the time when the underlying asio library is called to receive a complete mongodb message

T2: all subsequent processing after receiving the message (including message parsing, authentication, engine layer processing, sending data to the client, etc.)

T3: the amount of time a thread waits for data (for example, if there is no traffic for a long time, it is now waiting to read the data)

5.2 how does a single worker thread determine that it is in an "idle" state:

Total thread run time = T1 + T2 + T3, where T3 is the useless wait time. If T3 has a large proportion of useless waiting time, the thread is idle. After each cycle, the worker thread determines the percentage of effective time. If it is less than the specified threshold, it will directly exit out and destroy it.

5.3 how to determine that the worker thread in the thread pool is "too busy":

The control thread is specifically used to determine the pressure of the worker thread in the thread pool to determine whether to create a new worker thread in the thread pool to improve performance.

The control thread checks the thread pressure state in the thread pool every certain time. The implementation principle is to simply record the current running status of threads in the thread pool in real time, counting for the following two types: the number of buses _ threadsRunning, and the number of threads currently running task tasks _ threadsInUse. If _ threadsRunning=_threadsRunning, it means that all worker threads are currently processing task tasks, and there is a lot of pressure on threads in the thread pool, so the control thread begins to increase the number of threads in the thread pool.

The model details the source code implementation process more details of our previously published article: MongoDB network transmission processing source code implementation and performance tuning-experience kernel performance extreme design

5.4 defects in the network threading model:

Thread pool to get task execution, there is lock competition, this will become a system bottleneck

5.5 typical case:

Mongodb dynamic adaptive threading model is suitable for scenarios where request processing is time-consuming, such as database services.

The model detailed source code optimization analysis implementation process reference: Mongodb network transmission processing source code implementation and performance tuning-experience kernel performance extreme design

6. Thread model 6. Optimization practice of single listener+ dynamic worker Thread (Multi-queue)-mongodb Network Thread Model

The threading model is shown below:

Description:

Split a global queue into multiple queues, and when tasks join the queue, they are hashed to their respective queues according to hash. When the worker thread acquires the task, it similarly acquires the task in the corresponding queue through hash. In this way, lock competition is reduced and the overall performance is improved.

6.1 typical case:

OPPO self-developed mongodb kernel multi-queue adaptive thread model optimization, which has a good performance improvement, and is suitable for scenarios where request processing is time-consuming, such as database services.

The model detailed source code optimization analysis implementation process also refer to: Mongodb network transmission processing source code implementation and performance tuning-experience kernel performance extreme design

6.2 questions? Why don't mysql, mongodb and other databases take advantage of the kernel's reuseport special-multithreading to process accept requests at the same time?

A: virtually all services can take advantage of this feature, including database services (mongodb, mysql, etc.). However, because database service access latency is generally ms-level, if the reuseport feature is used, the performance of dozens of us will be improved. Compared with the ms-level delay of database internal processing, the performance improvement of dozens of us can be ignored. This is also the reason why most database services do not support this feature.

Cache, proxy and other middleware, because the internal processing time is relatively small, but also at the us level, so we need to make full use of this feature.

This is the answer to the question about the commonly used high concurrency network thread model design and MongoDB thread model optimization practice. I hope the above content can be of some help to you. If you still have a lot of doubts to be solved, you can follow the industry information channel for more related knowledge.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.