How to use dynamic tracking technology SystemTap to monitor the performance of MySQL and Oracle 07/02 Update SLTechnology News&Howtos

How to use dynamic tracking technology SystemTap to monitor the performance of MySQL and Oracle

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article mainly introduces how to use dynamic tracking technology SystemTap to monitor the performance of MySQL and Oracle. It is very detailed and has a certain reference value. Interested friends must read it!

This Chinese database technology conference has many keywords, such as source code, open source, self-research, etc., especially at the time of the ZTE incident, the domestic basic technology research ushered in a small and the most wonderful part.

As we all know, the study of basic technology is not simple, so, a friend often told me that he wanted to study the MySQL source code, but the code was so large that he gave up, or he had already compiled it, but unfortunately there was no such thing anymore. Why does this always happen? I don't think a positive feedback mechanism has been established.

Why do you say that? When you spend a lot of energy studying the source code, but after a long time, you find that you haven't gained much, and it's easy to give up. But if every week or two, I will have some small gains, so continue to accumulate, set a small victory into a big victory, and finally make a breakthrough in technology. This is to establish a positive feedback mechanism.

How to establish this feedback mechanism? This will mention SystemTap, dynamic debugging or dynamic tracing, which we will talk about below.

What is dynamic tracking? Is to dynamically observe the running program without affecting the program. Dynamic tracking technology is more influential is Dtrace, although I am not sure whether Dtrace is the ancestor of dynamic tracking, but I can be sure that many dynamic tracking technologies such as SystemTap emerged later.

SystemTap is launched by Redhat, if the server is Redhat, it does not need to be installed, but if it is other systems, it needs to be installed, but the installation is also very simple. SystemTap also needs to have a certain basis for use, such as basic development variables, conditions, loops, branches and the ability to write a Mini Program, and so on. In addition, to be familiar with dynamic tracking targets, such as tracking MySQL, then the basic knowledge and usage of MySQL need to be mastered.

Dynamic tracking has a very important concept called probe, which is equivalent to the trigger in the database, which is the trigger in the operating system. We can use SystemTap to dynamically insert the script into memory, wait for certain conditions, that is, the probe, is triggered to execute.

Dynamic tracking technology usually uses scripts inserted into memory, which are usually not too complex and are used to display memory values, memory status, statistical time, and so on, because too complex will slow down the target process or cause other instability.

This is a relatively simple example of counting MySQL logic read time consumption. Suppose it is known that the MySQL logical read operation starts from buf_page_get_gen to the buf_page_release_latch function. We wrote a SystemTap script that takes time to read by statistical logic, which is very simple, with a total of 15 lines. If you remove blank lines and parentheses, then the number of lines that the script is really valid is even less.

The first line defines a script interpreter; the third line defines a global variable, because probes are similar to functions, and variables in each function cannot pass data to each other, so if you want to pass data across probes, you must define global variables; lines 5 to 7 are SystemTap begin start probes, which are triggered when the script runs. I put a prompt to output "begin" here; the ninth line is a probe for the MySQL process, the probe at the entry point of the buf_page_get_gen function, which is equivalent to the trigger of the buf_page_get_gen function, and the probe will be triggered first when the system executes this function; the tenth line is to record the time, using SystemTap's gettimeofday to record the time in microseconds. Line 13 is the buf_page_release_latch function, and when this function is triggered, the current time minus the time at the beginning of the logical read is a complete logical read time.

There are two screen windows above, one is the MySQL client, and the other is the shell window. First run the lgr1.stp script in the shell window, where 3164 is the process number and-x is the cmd command, which displays begin. Execute a select statement in the MySQL client and perform a logical read.

After the execution, we see that the result is 63 microseconds. Of course, this time is not fixed, different test machine time will be different. MySQL completes a logical read in about 60 microseconds, which is not very different, but if there are multiple database comparisons, we can learn more about database performance details.

Let's compare Oracle time consumption. There are many ways to read Oracle logically, and we use a more common way, logical reading in the way of rowid. This logical read is very different from MySQL in that it first gets CBC Latch, gets Buffer Pin Lock under the protection of CBC Latch, and then reads the row data in the block. This is a complete logical read from the time the CBC Latch is obtained to the time the Buffer Pin Lock is released.

Where the kcbgtcr function corresponds to the buf_page_get_gen function in MySQL, that is, the get function. Kcbrls corresponds to the buf_page_release_latch function of MySQL.

With these two functions, scripts that count Oracle time consumption are easy to write. The content of the script is similar to that above, except that the function name is replaced.

Let's take a look at the results of Oracle time statistics. Run the lgr_o.stp script to track 2630 processes, and the corresponding SQL for this process is select * from T1 where rowed='AAADs6AAEAAAACRAAB. When the begin appears, it proves that the script is running. After the logical read is finished, we see that the time is 31 microseconds.

This result is obviously much faster than MySQL. Why? Because MySQL logical reads include index comparisons, it is necessary to determine whether the row is a row to be read by the user, and Oracle does not make such a comparison, so there is less work and less time consumption.

Oracle actually has logical reads similar to MySQL, but because the test environment here is that the InnoDB table cannot be completed. But you can also index and organize tables in Oracle to do the same logical reading. It is important to note that even with the same logical reading, Oracle is still a little faster than MySQL.

How can I know the name of the function corresponding to an operation? For example, how to know the buf_page_get_gen function and the buf_page_release_latch function. This requires us to explore slowly in the original, spend some time studying the source code, or write a small script to count the time, performance, and so on, which is also in line with the positive feedback we mentioned earlier.

Oracle has 1367 waiting events and more than 1000 performance data, and a total of more than 3000 indicators are used to reflect the status and performance of the database. By contrast, open source databases like MySQL are much weaker in this respect, but they also leave a lot of room for dynamic tracking technologies like SystemTap.

To get to the point, let's first introduce the original intention of the technology. Some time ago, a customer needed to do a load operation, which was performed once a day, which took about ten minutes to load millions of rows of data. I read it and found that the ten-minute time is normal because they don't use parallelism and single-threaded throughput is always limited. But customers feel slow and want to have a more detailed time list, so I have what I share today.

Today, we mainly introduce methods. I mainly count the time consumption of the five stages: parsing, processing binlog, processing UNDO, processing Redo, and total execution time. Let's take binlog as an example to explain how to use SystemTap+ source code to count the time spent processing binlog during computer execution. Note that this is the statistical DML execution time.

How do we start?

The first step is to find the function that deals with binlog, and then write a simple script to count the time consumption as before. The key problem here is how to find the corresponding function of binlog. There is an important skill, to go against the current and follow the wind. Going up against the current means following the flow of memory.

In the case of binlog, binlog data is eventually written to disk. When writing to the disk, it must be written to the disk from a certain buffer. We call this buffer buffer z, so where does the buffer data come from? Assuming that the data of buffer z is copied from buffer y, then we call buffer y the live line of buffer z. So who is buffer y's online? Who is online?... With such a trace, we can find the handler of binlog.

What is the specific method? The first step is to find the function that writes binlog to the file and see what function it calls to trigger the write IO in MySQL. The SystemTap script in the figure above is different from the previous script in that the name of the function in the script is fixed, and here you use * write*, which means tracking all functions with the word write in their names.

In addition, there is a conditional statement where Ulong_arg (1) is the first argument to the function when the probe is triggered. Such a function usually takes the file number as the first argument, buffer as the second argument, and length as the third argument. 131is the file number of the binlog file, which we can get from proc.

If it is satisfied that the function name has a write and the first argument is the binlog file number, then it is most likely a function written to the file. What exactly do I need to do? First, output three parameters, and then we use SystemTap's built-in function pp to display the corresponding information of the probe, including the function name of the probe and so on. After that, output a stack, which refers to the upper layer of the current function, and so on.

Take a look at the execution result and run the script first, where 3164 is the process number of mysqld. Wait for the begin output, then execute an insert statement, and then we see that there is nothing output. This is because the Insert statement does not trigger IO until commit is triggered.

Execute one more commit and we can see the output. In the box is the pp function corresponding to the information of the probe. The my_write function is the function that is called by writing binlog to the file in MySQL, which is on line 40 of my_write.c. The last three parameters are file number 83, 38114B0 is the aforementioned buffer z, from this place the binlog data is written to disk, but this is not the exact buffer z, 17 is the length, which translates to 279 bytes in decimal.

Let's take a look at the contents of my_weite.c. There is a my_write function in line 40 of my_write.c, and line 63 is that it calls the library function write to complete the actual IO operation, where the first parameter is still the file number, the second is buffer, and this parameter may be related to the exact address of buffer z.

Let's first prepare a recognizable SQL. What is a recognizable SQL? That is, there is a large amount of duplicate data in SQL, which is easy to identify. For example, the second column has eight lowercase aline hexadecimal ASCII codes 616161616161616161, and you will see eight consecutive 61 in memory.

Then we look for the exact buffer z address based on the values of the second column. In the specific operation, we also need to introduce another debugging tool called GDB.

Among them, 3164 is the process number that is running mysqld, and a lot of information will be output after execution. When GDB runs, any operation in the target program needs to be approved by GDB. At this point, if you execute a SQL statement in MySQL, then the SQL statement will be held by HANG. The reason is simple: we didn't issue an order in CBD to keep it running.

First, I will set a breakpoint on line 63 of my_write.c, which is similar to a probe and triggers when a function or operation is executed. After setting up, I will execute the C command, which is equivalent to continue in GDB, and then execute the Insert statement to continue. The statement that has just been resided by Hang does not trigger the probe, because the Insert statement will not trigger the IO of binlog.

After that, the commit is executed and is held by Hang again, when the breakpoint is triggered.

The information at the breakpoint is the information of the write library function on line 63 just now. The first parameter is the file number, the second parameter is the cache, and the third parameter is 279 length.

Use the p command in GDB to verify that the output Filedes is compared with the binlog file number.

The p command shows that the address of buffer z is 3810A40, which is the same as the output of the script. But 3810A4 is not the exact address of buffer z, the exact buffer z address should be the eight an in the Insert statement.

How to find it? First, use the X command to display the memory value of 381040.

The 3810B20 memory address has a data of 61080000, and GDB does an automatic conversion to change it to 00000861, and the next byte is 0x08616161, which is the second column of the Insert statement, so what is its exact address? 0x3810B20 moves back two bytes of 0x3810B22.

After that, a watch point is set with watch, which is equivalent to a breakpoint in GDB, which is triggered when the process reads and writes to the specified memory. As mentioned earlier, 0x3810b22 is the buffer z we found, and after the monitoring point is set, as long as anyone passes data to buffer z, the monitoring point will be triggered.

Then run the C command, let MySQL run, and execute another Insert statement in the MySQL client. After execution, we found that the probe was not triggered.

Then we execute commit, which is once again held by Hang, and the probe is triggered, and we see two values Old Value and New Value here.

Because the probe is used to detect that buffer z is modified, Old Value is the original value before the modification, and New Value is the new value. The new value is actually 08616161, so who is the function corresponding to the probe? memcpy_ssse3_back (), which is actually the copy function. By this time we have found buffer y, but because it is triggered during commit, it is not the final big boss I am looking for, and we need to keep looking.

How to find it? Again, set the observation point at the address of buffer y as before.

Here we will use a GDB command called info registers, which displays the registers. Why display registers? Because in a 64-bit system, rdi, rsi, rdx, and so on are all parameters of the function, the first parameter of the memcpy function is the target, and the second parameter is the source, in other words, it is copied from the second parameter to the first parameter, and in this case, rdi is buffer zjingrsi and buffer y.

If we look closely, we can see that rdi is obviously no longer the rdi it used to be. It was 3810B2 and now it is 3810B4. Why is that? This is because the watch observation point is triggered after the observed memory location has been modified, and it does not stop at the entry point of the memcpy function.

Rdi and rsi are the first and second parameters of the entry point of the memcpy function, but by the time you start from the entry point, a lot of code has been executed and the memory has been modified. At this point, rdi and rsi have been modified so that they are no longer strictly the first and second parameters of memcpy.

Although rdi is not buffer z, it is near buffer z. Similarly, rsi is also near buffer y. The exact address of Buffer z is 3810B22, and the current rdi is 3810B45, with a difference of 23 bytes (hexadecimal), so whether rsi minus 0x23 is the exact address of buffer y.

Let's check it out. Use the X command to display memory, where the last four bits of f0b9 are the value of rsi minus 0x23, and its memory value 0x61616108, which is the second column in the Insert statement, then f0b9 is the exact address of buffer y.

After finding the exact address of buffer y, delete the original observation point and set a new one at f0b9.

After that, we continue to execute the C command, execute a similar Insert statement on the MySQL client, inserting 8 b, but this time it is held by Hang while executing Insert, so it is very likely that we have found the handler function of binlog during DML execution this time.

How to verify whether there is a function related to binlog after the new breakpoint is triggered? Quite simply, we use a BT command in GDB to display the call stack.

The output above, # 0 is the level of the current memcpy function, # 1 calls # 0 # # 2 calls # 1's function, and so on. Where # 8 and # 9 are MySQL Insert functions that call the binlog_query function in THD. As soon as we look at this function name, we should know that this is the binlog handler we are looking for during the execution of DML.

The whole process seems quite simple, following the trail, looking all the way up, and after finding it, use BT to check if it is our goal.

Of course, although we found the target, it may not be the only target, in addition to binlog_query, it may have called other functions to handle binlog during DML execution. So we need to keep looking in this way.

To sum up, the first step is to confirm buffer x, which starts with the rsi and rdi parameters, then sets the observation point at buffer x, runs an approximate SQL, waits for the observation point to be triggered, and uses the BT command to see if there is a handler with the word binlog in the call stack.

When will it be found? As we can see from the right, we have done a lot of searching. If we use the BT call stack, we see parse in it. Parsing is the first step for SQL to execute, which proves that we have found the source and do not need to look further.

In addition to binlog_query,binlog_log_row, it is also a function that handles binlog during DML. If you want to learn more about the code, you can set breakpoints in the code, step through, and see how it is done.

We also know that binlog data is written to disk with the my_write function at the time of submission, so by setting an endpoint at my_ write and displaying the stack with BT, we can know which function triggers IO when binlog writes the file.

It should be the flush_cache_to_file function that first triggers IO and calls my_write to complete the IO. Why is flush_cache_to_file the first function to trigger? Because you can see it in # 3 in the BT call stack, but there are basically no functions with IO in or after # 4, 5, 6 or later.

If you want to know more, you can set breakpoints in your code to track how the function is executed. Or you can do a script to count the time, performance consumption, and so on. This is the positive feedback mentioned above, which not only has practical use, but also allows you to grow at the same time.

The script for counting binlog time consumption is simple, basically the same as the script for logical read time consumption. Execute the script first, and after the begin appears, execute a statement of Insert, and the time output is 80 microseconds, which is much longer than the previous logical read time. This is normal, however, because the DML operation is a much more complex operation than a query, dealing with redo, undo, and so on.

Using the way we just found binlog handling functions, we can also find other functions such as parsing, rollback, and so on. When studying the code, I suggest that you step by step, if the amount of code is very large, you can first try to start with relatively small operations, supplemented by dynamic tracking technology.

Let's talk about the key parts of the script. The first is that many global variables are defined to record time across multiple probes; the second statistical parsing is time-consuming, and there is one more thing tid () = $1, where $1 is the passed parameter-thread number, tid is the function for SystemTap to obtain the current thread number, considering that the tracked mysqld may be a multithreaded system, and we only need to trace the target thread, so we need to filter out all other threads. The third time probe, the 1 in parentheses means to trigger once a second. The time indicator is output after each trigger. Here is the analog vmstat 1, which displays one line per second.

Before we start running, we need to know the thread number, filter the target thread using tid, and then get the thread number through performance_schema.threads, the current client corresponding thread number is 3659.

3047 is the thread number of mysqld, and 3659 is the thread number to be tracked. After running the script, output one line per second, execute the Insert statement that inserts more than 4000 lines, and you can see the time output.

The time calculated by the script is 500000 microseconds, which is consistent with the time calculated by MySQL itself, but my statistics are much more detailed than those of MySQL clients, including that redo for undo accounts for 3.1% of the total time, page redo accounts for 4.6% of the total time, total redo time accounts for 7.7%, and undo accounts for 17.7% and 8.3% of the total time. Adding up this time is the additional time that InnoDB needs to spend for transactions and higher reliability, which accounts for 33.7% of the total time.

These are all the contents of the article "how to use dynamic tracking technology SystemTap to monitor MySQL and Oracle performance". Thank you for reading! Hope to share the content to help you, more related knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.