In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/02 Report--
This article is a practical experience post by colleagues from Big U, and it is a complete case of finding, analyzing and solving problems. I hope it will be helpful to you.
Cause of event
The reason for this is that a colleague in the company post a problem in the internal mail group. After running a business program written by go1.8.3 for a period of time, some goroutine cards are waiting for a lock ForkLock. Colleagues think that this is go1.8.3 's bug, which has not been reproduced after upgrading to go1.10. To find out, my colleague posted an issue on github:
Https://github.com/golang/go/issues/26836, during the period also made a lot of attempts to reproduce, but did not reproduce.
I glanced through the business code where the problem occurred, and the approximate way to use it is for the parent process to call the Command open child process under os/exec to execute the shell command. The Command then calls the golang-encapsulated forkExec to open the child process and execute the command, and forkExec uses ForkLock.
Analysis of problems
The existence of ForkLock is to avoid the following situation: in the case of multiple goroutine fork exec at the same time, in order for the child process to inherit only the file descriptors it needs, it is necessary to add the O_CLOEXEC flag when the parent process creates these file descriptors, so that these descriptors are closed in the child process, and the child process can open the descriptors that it needs to inherit as needed.
Linux after 2.6.27, opening files or pipes, and setting O_CLOEXEC is an atomic operation, so it is not a big problem, but the requirements for the kernel version of golang are 2.6.23 or above. In addition, in Unix systems, open and setting O_CLOEXEC are two operations. If fork occurs between the two operations, the child process may inherit the file descriptor it does not need, so it needs to be locked. Focus on the source code for forkExec:
Judging from the phenomenon of the problem, some goroutine must have been stuck in forkExecPipe or forkAndExecInChild and the lock was not released, so some goroutine couldn't get the lock all the time and starved to death. The last thing forkExecPipe calls is kernel pipe2,forkAndExecInChild, and the last calls are kernel clone and exec.
Cause guess
Pipe2 is a fast system call, so it is possible that the system calls for block are clone and exec, plus the problem is not repeated on go1.10, compare the difference between go1.8 code and go1.9 in the forkAndExecInChild function:
Go1.8
Go1.9
Go1.9 added CLONE_VFORK and CLONE_VM. Clone with only SIGCHILD can be considered similar to fork (finally, do_fork is called). The problem with fork is that the more memory the parent process takes up, the worse the performance. For more information, please see this link:
Https://bugzilla.redhat.com/show_bug.cgi?id=682922
This case was proposed in 2011 and is still being updated in July this year. The problem reported by case is that although Linux kernel introduces the copy-on-write mechanism, it still needs to copy page table items when fork. The larger the virtual memory of the process, the more page table items need to be copied, so the slower the fork. Some people in Golang's discussion group have tested that fork can take up to milliseconds when heap size is 2G, which is normal and tens of microseconds, thousands of times the difference.
Go1.9 plus these two parameters is to allow the child process and the parent process to share memory, which is equivalent to calling vfork. There is no need to copy the page table items to speed up the creation. From the test effect, it is stable at dozens of subtleties.
So a reasonable guess is that in programs written below the go1.9 version, when the memory consumption of the program is large enough and the process is created frequently enough, it will cause ForkLock to wait a long time.
Experimental demonstration
I wrote a test program in go1.8.3 and tested it in a 2-core 4G virtual machine (kernel 3.10.0-693.17.1.el7.x86_64).
Every 10 seconds outside, send a SIGUSR1 signal to the program, print the runtime stack, and after running for a period of time, part of the goroutine takes longer and longer to get the ForkLock. See the following two pictures:
However, the above situation did not occur in go1.9 and above, and I think this result can already explain the problem. Upgrading the version to go1.9 or above can solve this problem.
Write at the end
Vfork is to solve the performance problem caused by fork copying page table items, and in most scenarios, fork is called exec,exec to delete all page tables and reset the new page table, so there is really no need to copy page table items. However, because the vfork parent and child processes share memory, you should be very careful. If the child process modifies a variable, it will affect the parent process, and kernel will suspend the parent process and let the child process execute first. These restrictions basically limit that vfork is only suitable for scenarios with exec, which is not as general as fork.
Precisely because of the need to be careful in the use of vfork, before go1.9 is ready to join the vfork release, someone pointed out that the code is not robust enough, because after rawVforkSyscall returns, instructions are also executed in the parent process segment, so the process has the opportunity to destroy the shared stack of both parties. Therefore, a commit is proposed to let rawVforkSyscall do nothing directly in the parent process segment after the return, so as to solve the mutual influence, as shown in the figure:
If you are interested in further understanding, you can take a look at this commit's review,Rob Pike and others have spoken.
Https://go-review.googlesource.com/c/go/+/46173
For more technical information, please follow "Cloud Computing Mobilization". We are here together to change the future with cloud computing.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 208
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.