SequoiaDB reports a solution to a thread creation failure 07/02 Update SLTechnology News&Howtos

SequoiaDB reports a solution to a thread creation failure

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

1. Problem background

For distributed databases and distributed environments, thread creation failures and other problems are also very common under the pressure of high concurrency and high performance. at this time, the experience of database administrators is very considered. need to be able to quickly locate the problem and bottleneck, quickly solve. This article is also a best practice to tell you how to locate problems, eliminate problems, and solve bottlenecks in the case of high concurrency.

two。 Problem positioning

The-10 error code of SequoiaDB in the cluster environment is found to be the failure of the operating system create thread after carefully consulting the node's diaglog log.

For example, in our test environment, the error log information of the diaglog of the SequoiaDB node

Read the contents of this error log by seeing key information similar to the following

Failed to create new agent: boost::thread_resource_error: Resource temporaily unavailable

Failed to create new agent, probe = 30

Failed to create subagent thread, rc =-10

Failed to start session EDU, rc =-10

So what parameters will the general operating system be limited to when creating threads: file handle limit, operating system handle limit and memory resources.

1) number of file handles

In the linux operating system, everything is claimed to be a file, whether it is a process, thread, socker or others, will eventually be classified as a file operation by the operating system. Operating system or process, each request for a resource, such as thread, socker, will open a file, then the file open state, can be simply understood as a file handle. Where the handle limit represents the limit on the maximum number of files that the operating system or a process can open.

Once you have this concept, let's take a look at how the operating system limits the number of file handles. In the operating system, there is a magic command-ulimit this command can set many limits, the number of process file handles is one of them.

For example, we can look at the ulimit output of the root user, and-n open file = 1024 is the maximum number of file handles that the root user allows the process to open.

We need to note here that since the root user is an administrator user in Linux, if the ulimit open file of the root user is set to 1024, then other users, such as test, mysql users, and so on, will not want to set the ulimit opon file to greater than 1024.

Therefore, before the ulimit value of the ordinary user is modified, we must pay attention to the ulimit value of the root user to ensure that the ulimit value of the ordinary user is smaller than the setting value of the root user.

2) number of operating system handles

In addition to the handle limit in the process, the handle limit of the entire operating system will also have an impact on the operation of the database. Under the limit of the number of handles, because of an operating system, the handle cannot be opened indefinitely. So another setting is introduced, the limit on the maximum number of handles that the operating system can open.

This value in centos 7 is saved in the / proc/sys/fs/file-max file.

If the total number of handles of the operating system has reached the limit, there will be insufficient handles even if the process has not started several threads.

If you want to temporarily modify the setting of the maximum number of handles of the operating system, you can do it directly, namely: echo 2000000 > / proc/sys/fs/file-max

If you want to permanently change the setting of the maximum number of handles of the operating system, you can edit the / etc/sysctl.conf file, add fs.file-max = 2000000, and then execute sysctl-p in the root user.

3) memory resources

For memory resource optimization, when a thread is created, it needs to be pre-allocated memory in Linux-also known as stack size-to store the value of the data in the thread.

We programmers all know that memory is mainly divided into two large parts, one is called "heap" and the other is called "stack". In a program, the "heap" is usually used by the program to hold constant and variable names, and the "stack" is usually used by the program to hold specific variable numbers.

We said earlier that if the system ran out of memory, it would not be possible to create threads. The reason for this is that when creating a thread, the operating system needs to allocate a piece of memory to the thread. How much memory is it, which is the size of-s stack size in ulimit. If the operating system can't even come up with stack size-sized content, the creation thread will fail.

The whole server resource, why is there no memory at all?

In fact, if you take a closer look at the operating system, you will find that there are so many processes, each process is running with so many threads, and each thread is requesting memory (note that this piece of memory is physical memory), and it is normal that there is not enough memory. This is also easy to remind people of JVM's OOM, but they are really not the same thing, so don't get me wrong.

It's also relatively easy to solve this problem-directly and rudely? Is to adjust the ulimit-s stack size a little smaller, each thread does not apply for so much memory, the operating system's memory resources will be more abundant. After all, programs, threads, these are all used up, it is impossible to permanently occupy memory.

3. Other points to pay attention to

In addition to the above solution, it is still unable to solve the problem of thread creation failure.

Execute the ulimit-a command, and the parameters look normal, but has the system finished setting up? We need to really confirm what the ulimit parameter of the SequoiaDB process is.

There are two ways to confirm:

In newer versions of sdb, the diaglog log when the node starts will print its own ulimit parameters, and the reader can flip through the log.

The other is more direct, looking directly at linux's system records. For example, if you know that the PID of a 11910 process is 123456, just open the / proc/123456/limits file directly and look at its contents. It is difficult to think about it or not.

4. Remarks: commands about the number of handles and threads

To see how many threads are opened by a process, you can

Cat / proc/$PID/status | grep Threads

Pstree-p $PID, and then + 1, because there is also the main process

Top-Hp $PID, then look at the header "Threads" parameter

Ps hH p $PID | wc-l

Check the total number of handles currently opened by linux

Lsof-n | awk'{print $2}'| sort | uniq-c | sort-nr | awk'{print $1}'| awk'{sum + = $1}; END {print sum}'

View the total number of handles opened by a process

Lsof-n | awk'{print $2}'| sort | uniq-c | sort-nr | grep $PID

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.