In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
I don't know if you have any understanding of articles like MySQL JDBC StreamResult communication principle before, but today I'm here to give you a brief talk. If you are interested, let's take a look at the body part. I believe you will definitely gain something after reading what MySQL JDBC StreamResult communication principle is.
[Let's review the simple communication first]:
The communication between JDBC and database is completed through Socket, so we can treat the database as a SocketServer provider, so when SocketServer returns data (similar to SQL result set return), its flow is: Server program data (database)-> Kernel Socket Buffer -> Network-> Client Socket Buffer -> Client program (JVM memory where JDBC is located)
So far, the JDBC that everyone in the IT industry has seen is: MySQL JDBC, SQL Server JDBC, PG JDBC, Oracle JDBC. Even NoSQL clients: Redis Client, MongoDB Client, Memcached, the return of data is basically such a logic.
[Using MySQL JDBC to read data directly by default, why does it hang?]
(1) MySQL Server side in the SQL result set initiated all through the OutputStream output data, that is, write data to the local Kennel corresponding socket buffer, this is a memory copy (memory copy this is not the focus of this article).
(2) At this time, when Kennel's Buffer has data, it will send the data back through the TCP link (Socket link initiated by JDBC). At this time, the data will be sent back to the machine where JDBC is located, and it will first enter the Kennel area and also enter a Buffer area.
(3) After JDBC initiates SQL operation, Java code blocks on inputStream.read() operation. When there is data in the buffer, it will wake up and read the data in the buffer into Java memory. This is a memory copy of JDBC.
(4) MySQL JDBC will continue to read buffer data into Java memory, MySQL Server will continue to send data. Note that before the data is fully assembled, the SQL operation initiated by the client will not respond, that is, to give you the feeling that the MySQL server has not responded, in fact, the data has already arrived locally, JDBC has not returned the first data of the result set to the place where the execute method is called, but continuously reads data from the buffer.
(5) The key is that this idiot is like reading this data, regardless of whether it is placed at home, it will read the contents of the entire table into Java memory, first FULL GC, and then memory overflow.
[Setting useCursorFetch=true on JDBC parameter can solve this problem]
This solution with FetchSize settings, really can solve the problem, this solution is actually to tell MySQL server how much data I want, how much data each time, the communication process is a bit like this:
It's like we go to the supermarket and buy whatever I need and as much as I need. However, this interaction is not like online shopping now, sitting at home can send things to the home, it must walk (network link), that is, the network time overhead, if the data has 100 million data, FetchSize set to 1000, will be 100,000 times back and forth communication; If the network delay is 0.02ms in the same room, then 100,000 times of communication will increase the time of 2 seconds, not too big. So if the delay time of 2ms across the computer room will be 200 seconds (that is, 3 minutes and 20 seconds), if the delay time of 10~40ms across the city in China, then the time will be 1000~4000 seconds, if it is 200~300ms across the country? There will be more than ten hours left.
In this calculation, we have not included the impact of the number of system calls increased a lot, the number of thread waiting and waking contexts increased, and the network packet retransmission situation on the overall performance, so this scheme seems reasonable, but the performance is not very good.
In addition, since MySQL does not know when the client will consume all the data, and its corresponding table may have DML write operations, MySQL needs to establish a temporary table space to store the data that needs to be taken away. So when you enable useCursorFetch to read large tables, you will see several phenomena on MySQL:
(1) IOPS spikes because there are a lot of IO reads, which may cause jitter in business writes if it is a normal hard disk.
(2) Disk space soaring, this temporary space may be larger than the original table, if this table occupies a considerable proportion of the entire library may cause the database disk to be full, space will be read after the completion of the result set or when the client initiates Result.close() by MySQL to recover.
(3) CPU and memory will have a certain proportion of the increase, according to the CPU's ability to determine.
(4) After the client JDBC initiates SQL, it waits for SQL response data for a long time. During this period of time, the server is preparing data. This wait is different from the original JDBC without setting any parameters. The former is always reading data from the network buffer and has no response to the business. Now MySQL database is preparing temporary data space and has no response to JDBC.
[Stream reading data]
We know that the first way will cause Java to hang, the second way is inefficient and has a large impact on MySQL database, and the client response is slow, only able to solve the problem, so now let's look at the Stream reading mode.
As mentioned earlier, when you use statement.setFetchSize(Integer.MIN_VALUE) or com.mysql.jdbc. STATEMENTImpl.enableStreamingResults(), you can enable Stream to read the result set. FetchSize cannot be manually set before executing, and make sure the cursor is FORWARD_ONLY.
This way is very magical, it seems that the memory is not hanging, the response is also faster, the impact on MySQL is also smaller, at least IOPS is not so large, and the disk occupation is not there. Previously only saw JDBC walk a separate code, think this is another communication protocol between MySQL and JDBC, but I don't know, it turned out to be "client behavior," yes, you are right, it is client behavior.
When it initiates enableStreamingResults(), it will hardly do any interaction with the server, that is, the server will return the data according to mode 1, so the server will try hard to buffer the data, how can the client withstand the pressure?
In JDBC, when you turn on Stream result set processing, it is not a read of all data into Java memory, that is, Figure 1 is not a one-time read of data into Java buffer, but each time read a package (this package can be understood as a byte[] array in Java), read up to so many at a time, and then see whether to continue reading down to ensure the integrity of the data. The service code is parsed into rows according to bytes and is also used by the service party.
The server just starts to store data into the buffer, and this data will also fill the kernel buffer of the client. When the buffers on both sides are full, when one Buffer of the server tries to transmit data to the receiver through TCP, at this time, because the buffer of the consumer is also full, the thread of the sender will block and wait for the other party to consume. When the other party consumes a part, it can push a part of the data. Even if the JDBC Stream data is not consumed in time, if the buffer data is full, then the MySQL thread sending the data is blocked, thus ensuring a balance (on this point, you can use Java Socket to try whether this is the case).
For JDBC clients, the data is retrieved in the local kernel buffer every time, just a distance away from the express package box in the community, so naturally it is much smaller than RT every time to the supermarket, and this process is prepared data, so there is no IO blocking process.(Unless MySQL server data is not as fast as consumer data processing, then generally only the consumer does not do any business, get the data directly give up the test code, such a thing will happen), this time regardless of: cross-room, cross-region, cross-country, as long as the server starts to respond will continue to transmit data, and this action even if the first way is also necessary to go through the process.
Compared with the first method, JDBC will not cause memory overflow when used, even if reading large tables does not cause memory overflow, it will take a long time to respond; however, this method has a relatively large impact on the database. In the process of passing data, the corresponding data line will be locked (to prevent modification). Using InnoDB will lock the segment, and using MyISAM will lock the whole table, which may lead to business blocking.
[Theoretically, you can go further, as long as you are willing]
Theoretically, this way is better, but in terms of perfectionism, we can continue to discuss it. For lazy people, we don't even have the motivation to go to the express parcel box downstairs in the community to get it. What we think in our hearts is that if someone gets it home to me and sends it to my mouth, even my mouth will be opened.
Technically, it can be done in theory, because JDBC takes time to copy memory from the kernel to Java. If another person does this, it will be sent to my home when I am doing other things at home. When I want to use it, I will come directly from home. This time is not saved. Every mistake, for you, is indeed saved, but the question is who will deliver it?
A thread must be added to the program to do this, copy kernel data into application memory, or even parse rows of data for direct use by the application, but is this perfect? In fact, there is a coordination problem in the middle. For example, if you want to cook at home, you lack a packet of seasoning. You can buy it downstairs, but you have to ask others to send it home. At this time, all the other dishes are in the pot, leaving only a packet of seasoning. Then you have no other way. You can only wait for this packet of seasoning to arrive at home before proceeding with the next process of cooking. So, ideally, it saves a lot of memory copy time and adds some coordination lock overhead.
Can I read data directly from kernel buffers?
This is theoretically possible. Before explaining this problem, let's first understand what other things are besides this memory copy:
After JDBC reads the data in the kernel buffer according to binary, it will further parse it into specific structured data. Because the structured data of specific rows of ResultSet is to be returned to the business party at this time, that is, the data generating RowData must be copied once, and when JDBC returns some object type data,(e.g. byte [] array), in some implementations it does not want you to modify the contents of byte [] in the returned result through the result set (byte[1] = 0xFF) To modify the content of ResultSet itself, it may also make another memory copy. In the process of using business code, there will be spelling strings, network output, etc., and a bunch of memory copies. These are unavoidable at the business level. Compared to this little copy, it is simply insignificant, so we didn't do this. We thought it was almost insignificant on the whole, unless your program bottleneck is here.
Therefore, from the overall point of view memory copy is unavoidable, more than this time is nothing more than a system-level call, the overhead will be a little larger, technically speaking, we can do directly from the kernel state to read data; but this time you need to take the Buffer from the data in accordance with the bytes to allow more remote data to pass over, there is no third place to store the Buffer, otherwise it will return to the kernel to the application memory copy up.
Relatively speaking, the server can optimize the direct transfer of data through direct IO (but the protocol of this method is consistent with the storage format of data, obviously only theoretically). To truly achieve custom protocols and send data directly through kernel state, you need to modify the OS-level file system protocol to achieve the purpose of conversion.
After reading what is MySQL JDBC StreamResult communication principle this article, what do you think? If you want to know more about it, you can continue to pay attention to our industry information section.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.