What happens if there are too many MySQL data queries? 04/27 Update SLTechnology News&Howtos

What happens if there are too many MySQL data queries?

2025-04-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article mainly introduces the relevant knowledge of "what will happen if there are too many MySQL data queries". The editor shows you the operation process through actual cases, the operation method is simple and fast, and it is practical. I hope this article "what will happen if there are too many MySQL data queries" can help you solve the problem.

The memory of the host is only 100 gigabytes. Now we have to scan a large table of 200 gigabytes. Will we use up the memory of the DB host?

When you make a logical backup, don't you just scan the whole database? If you do this, you will eat up the memory, and the logical backup will be dead long ago.

So the full table scan of the large table should look fine. Why is that?

The influence of full table scan on server layer

Suppose that we are now going to db1 a 200G InnoDB table. T, perform a full table scan. Of course, if you want to save the scan results on the client side, you will use a command like this:

Mysql-h$host-P$port-u$user-p$pwd-e "select * from db1.t" > $target_file

InnoDB data is stored on the primary key index, so a full table scan is actually the primary key index of table t that is scanned directly. Since this query statement has no other judgment conditions, each row found can be directly put into the result set and then returned to the client.

So where does this "result set" exist?

The server does not need to save a complete result set. The process of fetching and sending data is as follows:

Get a line and write to * * "net_buffer". The size of this block of memory is defined by the parameter "net_buffer_length" * *. The default is 16k.

Get the line repeatedly until * * "net_buffer" * * is full, and then call the network API to send it out.

If it is sent successfully, clear * * "net_buffer", then continue with the next line and write "net_buffer" * *.

If the sending function returns * * "EAGAIN" or "WSAEWOULDBLOCK" * *, the local network stack (socket send buffer) is full and waits. Continue sending until the network stack is writable again

Query result sending process

Visible:

In the process of sending a query, the largest amount of internal memory occupied by the MySQL is * * "net_buffer_length" * *, which will not reach 200G.

Socket send buffer cannot reach 200G (default definition / proc/sys/net/core/wmem_default). If the socket send buffer is full, the process of reading data will be suspended.

So MySQL is actually "post while reading". This means that if the client receives slowly, it will take longer for the MySQL server to execute the transaction because the result cannot be sent.

For example, the following status is the result seen on the server show processlist when the client does not read the * * "socket receive buffer" * * content.

Server send blocking

If you see that State is always "Sending to client", the network stack on the server side is full.

If the client uses the-quick parameter, it uses the mysql_use_result method: read one line and process one line. Suppose that the logic of a business is complex, and if the logic to be processed after each row of data is read is very slow, it will take a long time for the client to retrieve the next row of data, and the above result may appear.

Therefore, for normal online business, if a query does not return many results, it is recommended to use the * * "mysql_store_result" * * API to save the query results directly to local memory.

Of course, the premise is that the query does not return many results. If there are too many, the client consumes nearly 20 gigabytes of memory due to the execution of a large query, in which case you need to use the * * "mysql_use_result" * * interface instead.

If you see a lot of threads in "Sending to client" in the MySQL you are responsible for maintaining, you want to ask business developers to optimize query results and evaluate whether so many returned results are reasonable.

To quickly reduce the number of threads in this state, you can set the * * "net_buffer_length" * * larger.

Sometimes, you can see a lot of query statements with the status of "Sending data" on the instance, but there's nothing wrong with looking at the network. Why does Sending data take so long?

The state change of a query statement is as follows:

After the MySQL query statement enters the execution phase, first set the status to "Sending data"

Then, send the information related to the column of the execution result (meta data) to the client

Then continue the process of executing the statement

When the execution is complete, set the status to an empty string.

That is, "Sending data" does not necessarily mean "data is being sent", but may be at any stage in the executor process. For example, you can construct a lock waiting scenario to see the Sending data status.

Read the whole table locked:

Sending data statu

It can be seen that session2 is waiting for a lock, and the status is displayed as Sending data.

"Sending to client" is displayed only when a thread is in the state of "waiting for the client to receive the result".

If displayed as "Sending data", it just means "in progress"

Therefore, the result of the query is sent to the client in segments, so scanning the whole table and returning a large amount of data from the query will not burst the memory.

The above is the processing logic of the server layer, how to deal with it in the InnoDB engine?

The influence of full table scan on InnoDB

One of the functions of InnoDB memory is to save the updated results, together with redo log, to avoid random disk writing.

Data pages in memory are managed in Buffer Pool (referred to as BP for short), and BP plays a role in accelerating updates in WAL.

BP can also speed up queries.

Because of WAL, when a transaction commits, the data page on disk is old. If a query reads the data page immediately, do you want to apply redo log to the data page immediately?

I don't need it. Because at this time, the results of the memory data page are up to date, you can read the memory page directly. At this time, the query does not need to read the disk, directly access the results from the inside, the speed is very fast. Therefore, Buffer Pool can speed up the query.

The acceleration effect of BP on query depends on an important indicator, that is, memory hit rate.

You can view the current BP hit rate of a system in the show engine innodb status results. In general, a stable service online system, to ensure that the response time meets the requirements, the memory hit rate should be more than 99%.

When you execute show engine innodb status, you can see the word "Buffer pool hit rate", which shows the current hit rate. For example, the hit rate in the following picture is 100%.

If all the data pages required by the query can be obtained directly from memory, that is the best, corresponding to a hit rate of 100%.

The size of InnoDB Buffer Pool is determined by the parameter * * "innodb_buffer_pool_size" * *. It is generally recommended to set it to 60% / 80% of available physical memory.

About a decade ago, the amount of data on a single machine was hundreds of gigabytes, while the physical memory was a few gigabytes; now, although many servers can have 128 gigabytes or more of memory, the amount of data on a single machine has reached the T level.

Therefore, it is common for * * "innodb_buffer_pool_size" * * to be less than the amount of data on disk. If a Buffer Pool is full and you want to read a data page from disk, you must eliminate an old data page.

InnoDB memory management

The least recently used (Least Recently Used, LRU) algorithm eliminates the longest unused data.

Basic LRU algorithm

InnoDB manages the LRU algorithm for BP, which is implemented with linked lists:

State1. The header of the linked list is P1, indicating that P1 is the recently accessed data page.

At this point, a read request to access P3 becomes state 2 and P3 is moved to the front.

Status 3 indicates that the data page visited this time does not exist in the linked list, so you need to apply for a new data page Px in BP and add it to the link header. However, new memory cannot be requested because the memory is full. So clear the memory of the Pm data page at the end of the linked list, store the contents of the Px, and put it in the head of the linked list.

In the end, Pm, the data page that has not been accessed for the longest time, is eliminated.

What if you want to do a full table scan at this time? To scan a 200G table, which is a historical data table, there is usually no business to access it.

Then, scanning according to this algorithm will eliminate all the data in the current BP and save the contents of the data pages accessed during the scanning process. In other words, the main data in BP is the data of this historical data table.

This is not good for a library that is doing business services. You will see that the BP memory hit ratio drops sharply, the disk pressure increases, and the response of SQL statements slows down.

Therefore, InnoDB cannot use the original LRU directly. It is optimized by InnoDB.

Improved LRU algorithm

InnoDB divides the linked list into New area and Old area according to the proportion of 5:3. What LRU_old points to in the figure is the first location of the old area, which is the 5 Universe 8 of the entire linked list. That is, the 5-stroke 8 near the head of the linked list is the New region, and the 3-stroke 8 near the end of the linked list is the old region.

The execution flow of the improved LRU algorithm:

State 1, to access P3, because P3 is in the New area, just like the LRU before optimization, move it to the head of the linked list = "State 2"

After accessing a new data page that does not exist in the current linked list, the data page Pm is still eliminated, but the newly inserted data page Px is placed at * * "LRU_old" * *.

For a data page in the old area, the following judgment should be made every time it is accessed:

If the data page has been in the LRU linked list for more than 1 second, move it to the header of the linked list

If the data page exists in the LRU linked list for less than 1 second, the position remains unchanged. 1s is controlled by the parameter * * "innodb_old_blocks_time" * *. The default value is 1000, in ms.

This strategy is tailored to handle operations similar to full table scans. Or scan the 200G historical data table:

4. During the scanning process, data pages that need to be newly inserted are placed in the old area

5. There are multiple records in a data page, and the data page will be accessed many times, but because it is scanned sequentially, the interval between the first access and the last access of the data page will not exceed 1 second, so it will still be retained in the old area.

6. If you continue to scan the subsequent data, the previous data page will no longer be accessed, so there is no chance to move to the linked list header (New area) and will soon be eliminated.

As you can see, the biggest benefit of this strategy is that although BP is also used in the process of scanning this large table, it has no impact on the young area at all, thus ensuring the query hit rate of Buffer Pool response to normal business.

This is the end of the content about "what happens if there are too many MySQL data queries". Thank you for reading. If you want to know more about the industry, you can follow the industry information channel. The editor will update different knowledge points for you every day.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.