In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article mainly explains the "how to solve the feature/query branch, in the community warehouse, execute the following script, there are Crash problems", the article explains the content is simple and clear, easy to learn and understand, the following please follow the editor's ideas slowly in-depth, together to study and learn "how to solve the feature/query branch, in the community warehouse, the implementation of the following script, there is a Crash problem" bar!
Specific problems
On the feature/query branch, in the community repository, execute the following script, and Crash appears.
. / test.sh-f general/parser/col_arithmetic_operation.sim recurrence problem
I logged in to the specified machine and checked the core dump, and it was true. The screenshot of Call Stack is as follows:
Step 1: see which place to crash. It is a shash.c:250 line. Use the GDB command "f 1" to view stack 1 and * pObj, and you will see that hashFp is NULL, which will naturally lead to crash. But why is it set to empty? other parameters? it must be wrong to set dataSize to 0. So we can conclude that this structure is incorrect. We need to see if the previous call passed the correct parameters.
Step 2: use GDB "f 2" to look at line 605 of stack 2 OK and * pRpc, the parameters in these structures appear normal, including the pointer value to hash does not see any problem. So it can be concluded that the call is OK, and the call taosGetStrHashData should provide the correct parameters.
Step 3: since the parameters are right, look at the shash.c program, it can only be SHashObj this structure has been released, access, naturally invalid. Look at the code, there is only one possibility, the function taosCleanUpStrHash is called, so I immediately add a line to print the log in the modified function (pay attention to the log output control of TDengine, the parameter asyncLog in the system configuration file taos.cfg should be set to 1, otherwise the log may not be printed when crash). Rerun the script, check the log, and find that taosCleanUpStrHash has not been called. So now there is only one possibility that the memory of this piece of data has been written out by other threads.
Step 4: fortunately, we have a great runtime memory check tool, valgrind, which we can run to find clues. As soon as you run (valgrind has many options, I run valgrind-leak-check=yes-track-origins=yes taosd-c test/dnode1/cfg), you will find invalid write at once. The screenshot is as follows:
Step 5: as soon as you look at the valgrind output, you can see that rpcMain.c:585 has invalid write. Here is memcpy. From a coding point of view, this should not be a problem, because the copy is a fixed-size structure SRpcConn, which is executed every time it is run here. So the only possibility is that pConn points to an invalid memory area, so how can pConn be invalid? Let's take a look at the program:
Look at line 584, pConn = pRpc- > connList + sid. This sid is assigned by taosAllocateId. If sid exceeds pRpc- > sessions, then pConn undoubtedly points to an invalid area. How can I be sure?
Step 6: add 578 lines of the log, print out the assigned ID, compile, and rerun the test script.
Step 7: crash, looking at the log, you can see that sid can output to 99 (max is 100), and everything is fine, but then it crashes. Therefore, it can be asserted that it is because the allocated ID exceeds the pRpc → session.
Step 8: look at the program tidpool.c that allocates ID, and you can see why. ID allocation is from 1 to MAX, while the RPC module can only use 1 to Max-1. In this way, when ID returns max, the RPC module will naturally generate invalid write.
Solution
Now that you know the reason, it's easy to do, and there are two ways:
1. In tidpool.c,taosInitIdPool, reduce the maxId by one, so that the assigned ID will only be 1 to max-1.
two。 In the rpcOpen () function of rpcMain.c, set the
PRpc- > idPool = taosInitIdPool (pRpc- > sessions)
Change to
PRpc- > idPool = taosInitIdPool (pRpc- > sessions-1)
If the application requires a maximum of 100 session, if you change it this way, RPC will create a maximum of 99. In order to ensure a maximum of 100, set the
PRpc- > sessions = pInit- > sessions
Change to
PRpc- > sessions = pInit → sessions+1; authentication
Both methods, recompile, run the test script passed, crash no longer occurs.
Experience summary
When you encounter a scene where the memory is written out, be sure to run it with valgrind to see if there is an invalid write. Because it is a dynamic checking tool, all errors reported should be correct. Only to solve the invalid write first, and then to look at the crash problem.
How to avoid similar problems
The core of this BUG is because the ID assigned by tidpool.c is from 1 to max, while the ID allocation assumed by the RPC module is from 1 to max-1. So there is something wrong with the agreement between the modules.
Thank you for your reading, the above is "how to solve in the feature/query branch, in the community warehouse, execute the following script, there is a Crash problem" content, after the study of this article, I believe you on how to solve the feature/query branch, in the community warehouse, the implementation of the following scripts, the emergence of Crash problems have a more profound experience, the specific use of the situation also needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.