Example Analysis of Redis caching IO Model 07/03 Update SLTechnology News&Howtos

Example Analysis of Redis caching IO Model

2025-07-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

Redis cache IO model example analysis, in view of this problem, this article introduces the corresponding analysis and solution in detail, hoping to help more partners who want to solve this problem to find a more simple and feasible method.

Preface

As one of the most widely used nosql databases, redis has experienced many upgrades, large and small. Before version 4. 0, single thread + IO multiplexing enabled redis performance to reach a very high level. The author also said that the reason why it is designed to be single-threaded is that the bottleneck of redis is not on cpu, and single-threading does not need to consider the lock overhead caused by multithreading.

However, with the passage of time, single thread is more and more unable to meet some application scenarios, such as redis4.0 has an asynchronous thread to solve the problem that large key deletion will cause blocking of the main thread.

In view of the fact that single thread can not meet higher concurrency due to the inability to take advantage of the characteristics of multi-core cpu, redis6.0 also introduces the multi-thread mode. So it is becoming less and less accurate to say that redis is single-threaded.

Event model

Redis itself is an event driver, by listening for file events and time events to complete the corresponding functions. The file event is actually the abstraction of socket, which abstracts each socket event into file event, and redis develops its own network event handler based on Reactor pattern. So what is the Reactor mode?

Communication

Think about the question, how does our server receive our data? First of all, both parties should establish a TCP connection, and after the connection is established, they can send and receive data. Send data to the buffer of socket, wait for the system to take the data from the buffer, and then send the data through the network card. After receiving the data, the receiver's network card will copy the data to the buffer of socket, and then wait for the application to pick it up. This is the general process of sending and receiving data.

Overhead of copy data

Because it involves system calls, the whole process can find that a copy of data needs to be copied first from the user state to the kernel state socket, and then from the kernel state socket to the user state process, which is the cost of data copy.

How do you know which socket to send the data?

With so many socket maintained by the kernel, how can the data from the network card be delivered to which socket?

The answer is port, and socket is a quad:

Ip (client) + port (client) + ip (server) + port (server)

Be careful not to say that the theoretical maximum concurrency of a machine is 65535, in addition to ports, there is also ip, which should be the number of ports * ip

This is why a computer can open more than one software at the same time.

How to tell the program to get the data of socket?

When the data has been copy from the network card to the corresponding socket buffer, how to inform the program to get it? What is the program doing if the socket data has not yet arrived? This actually involves the scheduling of processes by cpu. From the point of view of cpu, there are running state, ready state and blocking state in the process.

Ready: the process is waiting to be executed, the resources are ready, and the rest is waiting for cpu scheduling.

Running state: the process that is running, the process that cpu is scheduling.

Blocking state: because some conditions lead to blocking, do not occupy the cpu, waiting for the completion of certain events.

When there are multiple running processes, because of cpu's time slice technology, running processes will be executed by cpu for a period of time, as if running at the same time, this is the so-called concurrency. When we create a socket connection, it looks something like this:

Sockfd = socket (AF_INET, SOCK_STREAM, 0) connect (sockfd,....) recv (sockfd,...) doSometing ()

The operating system sets up a fd handle for each socket, and the fd points to the socket object we created, which contains the buffer, the waiting queue for the process. For a process that creates a socket, if the data does not arrive, it will be stuck at the recv, and the process will be hung in the waiting queue of the socket object. For cpu, the process is blocked, it does not actually occupy the cpu, it is waiting for the data to arrive.

When the data arrives, the network card will tell cpu,cpu to execute the interrupt program, copy the data of the network card into the buffer of the corresponding socket, then wake up the process in the waiting queue, put the process back into the running queue, and when the process is run by cpu, it can perform the final read operation. There are two problems with this model:

Recv can only receive one fd. What if I want to recv multiple fd?

The efficiency of cycling through while is slightly lower.

In addition to reading the data, the process also has to deal with the logic in the following. When the data does not arrive, the process is in a blocking state. Even if the while loop is used to monitor multiple fd, is the other socket blocking because one of the recv blocks, resulting in the blocking of the whole process.

In view of the above problems, Reactor mode and IO multiplexing technology appear.

Reactor

Reactor is a high-performance IO processing mode. In Reactor mode, the main program is only responsible for listening to whether there are events on the file descriptor, which is very important. The main program does not handle reading and writing of the file descriptor. So who does the readability and writeability of the file descriptor? The answer is other working programs. When a socket has a readable and writable event, the main program will inform the working program that it is the working program that really reads and writes data from the socket. The advantage of this mode is that the main program can carry concurrency without blocking, and the main program is very portable. Events can be queued to be executed by the working program. With the Reactor mode, we only need to register the handler (callback func) corresponding to the event with the Reactor, such as:

Type Reactor interface {RegisterHandler (WriteCallback func (), "writeEvent"); RegisterHandler (ReadCallback func (), "readEvent");}

When a client issues a set key value command to redis, such a command request will be written to the socket buffer. When the Reactor listens that the corresponding socket buffer has data, then the socket is readable, and the Reactor will trigger a read event, and complete the parsing and execution of the command through the pre-injected ReadCallback callback function. When the socket's buffer has enough space to write, the corresponding Reactor generates a writable event, and the pre-injected WriteCallback callback function is executed. When the initiating set key value finishes, the working program writes the OK to the socket buffer, and finally the client removes the written OK from the socket buffer. In redis, both ReadCallback and WriteCallback are done by a single thread, and they have to be queued if they arrive at the same time, which is the default mode of redis6.0 and the most widely circulated single-threaded redis.

The whole process can be found that the Reactor main program is very fast, because it does not need to perform real read and write, and the rest is what the working program does: read and write of IO, parsing of commands, execution of commands, return of results... this is very important.

IO multiplexer

From the above we know that Reactor is an abstract theory and a pattern, how to implement it? How to monitor the arrival of socket events? The easiest way is to poll, since we don't know when the socket event will arrive, so we keep asking the kernel, assuming that there are 1w socket connections now, then we have to cycle through the kernel for 1w times, which is obviously very expensive.

Switching from user mode to kernel mode involves context switching (context). Cpu needs to protect the scene, save the state of the register before entering the kernel, and restore the state from the register after the kernel returns, which is a lot of overhead.

Because the traditional polling method is too expensive, IO multiplexer appears. IO multiplexer includes select, poll, evport, kqueue and epoll. Redis defines the corresponding rules with the # include macro in the implementation source code of the Redis O multiplexer program, and the program automatically selects the highest performance Icando O multiplexing function library in the system at compilation time as the underlying implementation of the Icando O multiplexing program:

/ / Include the best multiplexing layer supported by this system. The following should be ordered by performances, descending.# ifdef HAVE_EVPORT# include "ae_evport.c" # else # ifdef HAVE_EPOLL # include "ae_epoll.c" # else # ifdef HAVE_KQUEUE # include "ae_kqueue.c" # else # include "ae_select.c" # endif# endif# endif

Here we mainly introduce two very classic multiplexers, select and epoll,select, which are the first generation of IO multiplexer. How does select solve the problem of constantly polling from user mode to kernel mode?

Select

Since each poll is troublesome, select gives a batch of fds collections of socket to the kernel at one time, and then the kernel traverses the fds itself, and then determines the readable and writeable state of each fd. When the state of a fd is satisfied, it is up to the user to get it.

Fds = [] int {fd1,fd2,...} for {select (fds) for iPart = 0; I < len (fds); iPart + {if isReady (FDS [I]) {read ()}

Disadvantages of select: when a process listens on multiple socket, select adds the waiting queue of all socket in the kernel to the process (many to one), so that when one of the socket has data, it tells cpu, wakes the process from blocking mode, wakes up to be scheduled by cpu, and removes the process from all socket waiting queues, when cpu runs the process. Because the process itself has passed a batch of fds collections, we do not know which fd has the data, so we can only traverse it once, which is wasted for the fd with no data coming. Because each time select traverses the socket collection, the large number of socket collections will affect the overall efficiency, which is why select supports a maximum of 1024 concurrency.

Epoll

If there is a way to avoid traversing all socket, it would be more efficient to trigger only the corresponding socket fd when a socket message arrives instead of blindly polling. Epoll appeared to solve this problem:

Epfd = epoll_create () epoll_ctl (epfd, fd1, fd2...) for {epoll_wait () for fd: = range fds {doSomething ()}}

First create an epoll object through epoll_create, which returns a fd handle, which, like the handle of socket, is managed under the fds collection.

Through epoll_ctl, bind the socket fd and epoll objects that you need to listen on.

Get the socket fd with data through epoll_wait. When no socket has data, it will block here. If there is data, then the fds collection with data will be returned.

How does epoll do it?

First of all, the socket of the kernel is no longer bound to the user's process, but to epoll, so that when the data of socket arrives, the interrupter will add a corresponding socket fd to a ready queue of epoll, which is full of socket with data, and then the process associated with epoll will be awakened. When the cpu runs the process, it can directly obtain the socket with events from the epoll's ready queue and perform the next read. Down the whole process, it can be found that the user program does not have to traverse without brain, and the kernel does not have to traverse, but to achieve the efficient performance of "who has data processing and who" through interrupts.

Evolution from single-thread to multi-thread

Combined with the idea of Reactor and high-performance epoll IO mode, redis developed a set of high-performance network IO architecture: single-thread IO multiplexer, IO multiplexer is responsible for receiving network IO events, events are finally queued to be processed, this is the most primitive single-threaded model, why use single thread? Because the redis of a single thread can already reach a load of 10w qps (if you do some complex set operations, it will be reduced), which meets most application scenarios. At the same time, a single thread does not have to consider the lock problem caused by multithreading. If it does not meet your requirements, then you can also configure the sharding mode so that different nodes can handle different sharding key. In this way, the load capacity of your redis server can further increase linearly with the growth of nodes.

Asynchronous thread

There is a problem in single-threaded mode that it is time-consuming (not continuous memory) to delete a large collection or hash, so the performance of a single thread is that other commands that are still queued have to wait. When there are more and more orders to wait, then bad things will happen. So redis4.0 has an asynchronous thread for the deletion of a large key. Use unlink instead of del to perform deletion, so that when we unlink, redis will detect whether the deleted key needs to be executed by an asynchronous thread (for example, the number of collections exceeds 64.), if the value is large enough, it will be processed in an asynchronous thread without affecting the main thread. Similarly, flushall and flushdb both support asynchronous mode. In addition, redis also supports the mode of whether asynchronous threads are required to process in some scenarios (off by default):

Lazyfree-lazy-eviction nolazyfree-lazy-expire nolazyfree-lazy-server-del noreplica-lazy-flush no

Lazyfree-lazy-eviction: asynchronous deletion will be started when the elimination policy of memory reaching maxmemory is set for redis. The disadvantage of asynchronous deletion in this scenario is that if the deletion is not timely, the memory cannot be released in time.

Lazyfree-lazy-expire: for key with ttl, synchronous deletion is not performed when it is cleaned by redis, and asynchronous thread is added to delete it.

Replica-lazy-flush: when the slave node joins in, it executes flush to clear its own data. If flush takes a long time, the more data is accumulated in the replication buffer, and the later slave synchronization data is relatively slow. When replica-lazy-flush is enabled, the flush of slave can be processed asynchronously, thus improving the speed of synchronization.

Lazyfree-lazy-server-del: this option is for some instructions, such as RENAME key newkey when a field of rename is executed. If newkey exists at this time, it will delete the old value of the newkey for rename. If the old value is very large, it will cause blocking. When this option is enabled, it will also be operated by the asynchronous thread, so that the main thread will not be blocked.

Multithreading

Redis single thread + asynchronous thread + sharding has been able to meet the vast majority of applications, and then there is no better, only better. Redis introduced the multi-thread mode in 6. 0. By default, multithreaded mode is turned off.

# io-threads 4 # number of work threads # io-threads-do-reads no # do you want to enable the function point of multithreading?

From the above we know that when we read data from a socket, we need to copy from the kernel to user space, and when we write data to socket, we need to copy from user space to the kernel. The calculation of redis itself is still very fast, and the slow part is mainly related to socket IO operations. When our qps is very large, single-threaded redis can not take advantage of multi-core cpu, so it is a good choice to use multi-core cpu to share IO operations by turning on multiple threads.

So for instance if you have a four cores boxes, try to use 2 or 3 I/O threads, if you have a 8 cores, try to use 6 threads.

If enabled, it is officially recommended to open 2-3 IO threads for a 4-core machine, and 6 IO threads if there are 8 cores.

The principle of multithreading

It should be noted that redis's multithreading only deals with socket IO read and write is multiple threads, and the actual execution of the instruction is performed by one thread.

Redis server listens for client requests through EventLoop. When a request arrives, the main thread does not immediately parse and execute it. Instead, it puts it into the global read queue clients_pending_read and marks each client with a CLIENT_PENDING_READ ID.

The main thread then assigns all tasks to the I / O thread and the main thread itself through the RR (Round-robin) policy.

According to the assigned task, each thread (including the main thread and child thread) only reads and parses the request parameters through the CLIENT_PENDING_READ identity of client (the command is not executed here).

The main thread will be busy polling and waiting for all IO threads to finish execution, and each IO thread will maintain a local queue io_threads_list and a local atomic counter io_threads_pending. The tasks between threads are isolated and will not overlap. When the IO thread finishes the task, the io_threads_ index is 0. When all IO threads are 0, the task is finished.

After all the read has been executed, the main thread performs the real exec action by traversing the clients_pending_read queue.

After completing the reading, parsing, and execution of the command, the result should be sent to the client. The main thread adds the client that needs to respond to the global clients_pending_write queue.

The main thread traverses the clients_pending_write queue and assigns all tasks to the I / O thread and the main thread through the RR (Round-robin) policy and asks them to write back the data to the client.

In multithreaded mode, each IO thread is responsible for processing its own queue without interfering with each other. IO threads are either reading or writing at the same time, not at the same time. The main thread will not attempt to assign the task again until all the tasks of the child threads have been processed. At the same time, the final command execution is completed by the main thread itself, and the whole process does not involve locking.

This is the answer to the sample analysis question about the Redis cache IO model. I hope the above content can be of some help to you. If you still have a lot of doubts to be solved, you can follow the industry information channel for more related knowledge.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.