In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
Today, I would like to talk to you about why snapshots of Redis will not block other requests. Many people may not understand it very well. In order to make you understand better, the editor summarized the following for you. I hope you can get something from this article.
Why Why's THE Design is a series of articles about programming decisions in the computer field. In each article in this series, we will raise a specific question and discuss the advantages and disadvantages of this design and its impact on specific implementation from different perspectives. If you have any questions you want to know, you can leave a message at the bottom of the article.
Although we often think of Redis as a pure memory key storage system, we also use its persistence function. RDB and AOF are the two persistence tools provided by Redis, of which RDB is the data snapshot of Redis. In this article, we want to analyze why Redis needs to use child processes when persisting data snapshots, rather than exporting in-memory data structures directly to disk for storage.
Overview
Before analyzing today's problem in detail, we first need to understand what Redis's persistent storage mechanism RDB is. RDB snapshots the current dataset in the Redis service at regular intervals. In addition to the Redis configuration file that can set the snapshot interval, the Redis client also provides two commands to generate RDB storage files, that is, SAVE and BGSAVE. We can guess the difference between the two commands by the name of the command.
The SAVE command will directly block the current thread when it is executed, and because Redis is single-threaded, the SAVE command will directly block all other requests from the client, which is unacceptable for Redis services that need to provide a strong availability guarantee.
We often need the BGSAVE command to generate the RDB file corresponding to all the data of Redis in the background. When we use the BGSAVE command, Redis will immediately fork a child process, which will execute the process of "saving the data in memory to disk in RDB format", while the Redis service can still handle requests from the client during BGSAVE work.
RdbSaveBackground is a function that handles saving data to disk in the background:
Int rdbSaveBackground (char * filename, rdbSaveInfo * rsi) {pid_t childpid; if (hasActiveChildProcess ()) return Clearer;... If ((childpid = redisFork ()) = 0) {int retval; / * Child * / redisSetProcTitle ("redis-rdb-bgsave"); retval = rdbSave (filename,rsi); if (retval = = C_OK) {sendChildCOWInfo (CHILD_INFO_TYPE_RDB, "RDB");} exitFromChild ((retval = = C_OK)? 0: 1) } else {/ * Parent * /.}.}
When the BGSAVE is triggered, the Redis server will call the redisFork function to create the child process and call rdbSave to persist the data in the child process. Although we have omitted some contents of the function here, the overall structure is still very clear. Interested readers can click on the link above to learn about the implementation of the entire function.
The ultimate goal of using fork must be to improve the availability of Redis services without blocking the main process, but here we can actually find two problems:
Why can the child process after fork get the data in the memory of the parent process?
Does the fork function incur additional performance overhead, and how can we avoid these overhead?
Since Redis chose to use fork to solve the problem of snapshot persistence, it means that these two questions have been answered. First of all, the child process after fork can obtain the data in the memory of the parent process, and the additional performance overhead caused by fork must be acceptable compared with blocking the main thread. Only with both of these points, Redis will eventually choose such a solution.
Design
In order to analyze the two questions raised in the previous section, we need to know the following here, which are the prerequisites for the Redis server to use the fork function and the key to ultimately prompt it to choose this implementation:
Parent and child processes generated through fork share resources, including memory space
The fork function does not incur significant performance overhead, especially by making a large number of copies of memory, which can postpone copying memory until it is really needed by copying at write time.
Child process
In the field of computer programming, especially in Unix and Unix-like systems, fork is an operation used by a process to create its own copy. It is often a system call implemented by the operating system kernel, and it is also the main method for the operating system to create new processes in * nix systems.
After the program calls the fork method, we can determine the parent and child processes by the return value of fork, thus performing different operations:
When the fork function returns 0, it means that the current process is a child process
When the fork function returns non-0, it means that the current process is the parent process, and the return value is the pid of the child process.
Int main () {if (fork () = = 0) {/ / child process} else {/ / parent process}}
In the fork manual, we will find that the parent and child processes after calling fork will run in different memory space, when fork occurs, the memory space of the two has exactly the same content, the writing and modification of memory, file mapping are independent, the two processes will not affect each other.
The child process and the parent process run in separate memory spaces. At the time of fork () both memory spaces have the same content. Memory writes, file mappings (mmap (2)), and unmappings (munmap (2)) performed by one of the processes do not affect other.
In addition, the child process is almost a complete copy of the parent process (Exact duplicate), but the two processes differ slightly in the following aspects:
Child processes are used for independent and unique processes ID
The parent process ID of the child process is exactly the same as the parent process ID
The child process does not inherit the memory lock of the parent process
The child process resets the process resource utilization and CPU timer
...
The most important point is that the memory fork of the parent and child processes is exactly the same, and the writes and modifications after the fork will not affect each other, which perfectly solves the problem of snapshots-only the data in memory at a certain point in time is needed, and the parent process can continue to modify its own memory, which will neither be blocked nor affect the generated snapshot.
Copy while writing
Since the parent process and the child process have exactly the same memory space and neither write to the memory will affect each other, does it mean that the child process needs to make a full copy of the parent process's memory when fork? Suppose the child process needs to copy the memory of the parent process, which is basically catastrophic for Redis services, especially in the following two scenarios:
A large amount of data is stored in memory, and copying memory space during fork will consume a lot of time and resources, which will cause the program to be unavailable for a period of time.
Redis occupies 10 gigabytes of memory, while the upper limit of physical or virtual machine resources is only 16 gigabytes. At this time, we cannot persist the data in Redis, that is, the maximum utilization of memory resources on machines by Redis cannot exceed 50%.
If the above two problems cannot be solved, the use of fork to generate memory images will not really hit the ground, and it is not a method that can really be used in a project.
Even without the Redis scenario, it is difficult to copy full memory in fork. Suppose we need to execute a command on the command line, we need to create a new process through fork and then execute the program through exec. The large amount of memory copied by fork may have no effect on child processes at all, but it introduces huge extra overhead.
The emergence of Copy-on-Write (copy on write) is to solve this problem, as we introduced at the beginning of this section, the main function of copy while writing is to postpone the copy until the write operation actually occurs, which avoids a large number of meaningless copy operations. On some early * nix systems, the system call fork did replicate the memory space of the parent process immediately, but on most systems today, fork does not trigger this process immediately:
When the fork function is called, the parent and child processes are allocated to different virtual memory spaces by Kernel, so it seems that the two processes access different memory:
When actually accessing virtual memory space, Kernel maps virtual memory to physical memory, so the parent and child processes share physical memory space
When the parent or child process modifies the shared memory, the shared memory is copied on a page-by-page basis, the parent process retains the original physical space, and the child process uses the copied new physical space.
In Redis service, the child process only reads the data in the shared memory, it does not perform any write operations, and only the parent process triggers this mechanism when writing, but for most Redis services or databases, the write request is often much smaller than the read request, so using the mechanism of fork plus copy while writing can bring very good performance and make the implementation of the BGSAVE operation very simple.
Summary
Redis implements background snapshots in a very ingenious way, which is easily realized through the fork and copy-on-write features provided by the operating system, from which we can see that the author's knowledge of the operating system is still very solid. When facing similar scenes, the method that most people think of may be to manually implement similar "copy-on-write" features, but this not only increases the workload. It also increases the possibility of problems with the program.
At this point, let's briefly summarize why Redis is implemented as a child process when using RDB for snapshots:
The child process created through fork can get exactly the same memory space as the parent process. The memory modification made by the parent process is invisible to the child process, and the two will not affect each other.
When a child process is created through fork, a large number of copies of memory will not be triggered immediately, and when the memory is modified, it will be copied on a page-by-page basis, which avoids the performance problems caused by a large number of memory copies.
Of the above two reasons, one provides support for the child process to access the parent process, and the other supports the reduction of additional overhead, both of which are indispensable, which are the reasons why Redis uses child processes to achieve snapshot persistence. In the end, let's take a look at some more open-ended issues. Interested readers can think carefully about the following questions:
The main process of Nginx fork a set of child processes at run time, which can handle requests separately, and which other services use this feature?
Copying while writing is actually a relatively common mechanism. Where else can I use it outside of Redis?
After reading the above, do you have any further understanding of why a snapshot of Redis does not block other requests? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.