Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to query tens of millions of data in MyBatis

2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)05/31 Report--

In this issue, the editor will bring you about how to query tens of millions of data in MyBatis. The article is rich in content and analyzes and narrates it from a professional point of view. I hope you can get something after reading this article.

Regular query

By default, the complete set of search results is stored in memory. In most cases, this is the most efficient mode of operation and is easier to implement because of the design of the MySQL network protocol

Assuming that the amount of data in a single table is 500w, no one will load it into memory at once, and it will generally be paged.

@ SneakyThrows @ Override public void pageQuery () {@ Cleanup Connection conn = dataSource.getConnection (); @ Cleanup Statement stmt = conn.createStatement (); long start = System.currentTimeMillis (); long offset = 0; int size = 100; while (true) {String sql = String.format ("SELECT COLUMN_A, COLUMN_B, COLUMN_C FROM YOU_TABLE LIMIT% s,% s", offset, size); @ Cleanup ResultSet rs = stmt.executeQuery (sql) Long count = loopResultSet (rs); if (count = = 0) break; offset + = size;} log.info Paging query time:: {} ", System.currentTimeMillis ()-start);}

The above method is relatively simple, but without considering the LIMIT deep paging optimization, the online database server is cold, or you can wait a few days to retrieve the data.

Streaming query

If you are using ResultSet with a large number of data rows and cannot allocate the required memory heap space in JVM, you can tell the driver to return one row from the result stream

One thing to note about streaming queries is that all rows in the result set must be read (or closed) before any other query can be issued against the connection, otherwise an exception will be thrown

With streaming queries, concurrent access to the table referenced by the statement that produces the result set must be maintained as soon as possible because its query will monopolize the join

SneakyThrows public void streamQuery () {@ Cleanup Connection conn = dataSource.getConnection (); @ Cleanup Statement stmt = conn.createStatement (ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY); stmt.setFetchSize (Integer.MIN_VALUE); long start = System.currentTimeMillis (); @ Cleanup ResultSet rs = stmt.executeQuery ("SELECT COLUMN_A, COLUMN_B, COLUMN_C FROM YOU_TABLE"); loopResultSet (rs); log.info ("? Streaming query time:: {} ", (System.currentTimeMillis ()-start) / 1000);}

Flow query database table data 500w single call time consumption: ≈ 6s

Cursor query

In SpringBoot 2.x, the default connection pool is HikariPool and the connection object is HikariProxyConnection, so the following cursor setting method is not feasible.

((JDBC4Connection) conn) .setUseCursorFetch (true)

Need to concatenate & useCursorFetch=true in the database connection information. Second, set the number of data Statement reads at a time, such as 1000 at a time.

SneakyThrows public void cursorQuery () {@ Cleanup Connection conn = dataSource.getConnection (); @ Cleanup Statement stmt = conn.createStatement (ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY); stmt.setFetchSize (1000); long start = System.currentTimeMillis (); @ Cleanup ResultSet rs = stmt.executeQuery ("SELECT COLUMN_A, COLUMN_B, COLUMN_C FROM YOU_TABLE"); loopResultSet (rs); log.info ("? Cursor query time:: {} ", (System.currentTimeMillis ()-start) / 1000);}

Cursor query database table data 500w single call time consumption: ≈ 18s

JDBC RowData

The method loopResultSet is used above, but there is only a while loop inside the method. The core point of regular, streaming and cursor queries lies in the next method.

@ SneakyThrows private Long loopResultSet (ResultSet rs) {while (rs.next ()) {/ / Business Operation} return xx;}

The logic of ResultSet.next () is to implement the class ResultSetImpl to get the next row of data from RowData every time. RowData is an interface, and the implementation diagram is as follows

By default, ResultSet uses RowDataStatic instances. When generating a RowDataStatic object, it reads all the records in ResultSet into memory, and then reads them from memory one by one through next ().

The call to RowDataCursor is batch, and then cached internally. The process is as follows:

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

First of all, I will check whether there is any data in my internal buffer that has not been returned, and if so, return the next line.

If all are read, a new request is triggered to MySQL Server to read the number of fetchSize results.

And buffer the returned result into the internal buffer, and then return the first row of data.

When streaming is used, ResultSet uses the RowDataDynamic object, which next () initiates each call to IO to read a single row of data.

To sum up, the default RowDataStatic reads all the data into the client memory, that is, our JVM;RowDataCursor reads fetchSize rows at a time, consumes and then initiates a request call; RowDataDynamic reads one piece of data per IO call

JDBC communication principle

General query

The interaction between JDBC and MySQL server is done through Socket. Corresponding to network programming, MySQL can be regarded as a SocketServer, so a complete request link should be:

JDBC client-> client Socket-> MySQL-> retrieve data return-> MySQL kernel Socket buffer-> Network-> client Socket Buffer-> JDBC client

When querying a large amount of data, the JVM may be cool for the following reasons:

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

MySQL Server writes the retrieved SQL result set to the corresponding Socket Buffer of the kernel through the output stream.

The kernel buffer returns data through the TCP link initiated by JDBC. In this case, the data will first enter the kernel buffer where the JDBC client resides.

After JDBC initiates the SQL operation, the program will be blocked on the read operation of the input stream. When there is data in the buffer, the program will be awakened to read the buffer data into JVM memory.

MySQL Server will continue to send data, and JDBC will continue to read buffer data into Java memory. Although the data has arrived locally in the program where JDBC belongs, JDBC has not responded to the execute method call, because it will not be returned until the corresponding data is read.

The drawback is obvious. If you query too much data, you will continue to experience GC, followed by memory overflow.

Cursor query

As we know from the above, cursors can solve the memory overflow problem of ordinary queries with large amounts of data, but

Has the partner ever thought about the question that MySQL does not know when the client program consumption is completed, and what should be done if another connection causes DML writes to the table?

In fact, when we use cursor queries, MySQL needs to create a temporary space to store the data that needs to be read, so it will not conflict with DML write operations.

However, cursor queries can cause the following phenomena:

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

IOPS soars because the data to be returned needs to be written to temporary space, and there are a large number of IO reads and writes, which may cause write jitter in other businesses.

Disk space soars because the data written to the temporary space is outside the original table, and if the table data is too large, it may cause the database disk to be full in extreme cases, and there is no change in network output. The data written to temporary space will be reclaimed by MySQL when the read is completed or when the client initiates a ResultSet#close operation.

The client JDBC initiates a SQL query and may wait for a long time for the SQL response, which is the time to prepare the data phase for the server. However, the ordinary query wait time is inconsistent with the cursor query wait time in principle, the former is consistent in reading the data of the network buffer and does not respond to the business level, while the latter is that MySQL is preparing the temporary data space and does not respond to JDBC

After the data preparation is completed, when it comes to the data transmission stage, the network response begins to soar, and the IOPS changes from "read and write" to "read".

The communication efficiency of using cursor query is relatively low, because the client needs to initiate a request to the server after consuming fetchSize rows of data, and the IOPS will be very high in the pre-preparation phase of the database, taking up a lot of disk space and performance.

Streaming query

When the client establishes a connection with the MySQL Server side and interactively queries, the MySQL Server will return the SQL result set to the output through the output stream, that is, write data to the Socket Buffer corresponding to the local kernel, and then return the data from the kernel to the server kernel buffer corresponding to the JDBC through the TCP link.

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

JDBC reads kernel buffer data through the input stream read method. Because streaming reading is enabled, there is only one piece of data received by the business program at a time.

The MySQL server will continuously send data to the client kernel represented by JDBC until the client requests that the Socket buffer be full, and then the MySQL server will block

For the JDBC client, each time the data is read from the kernel buffer of the machine, so the performance will be faster, so you don't have to worry about no data consumption in the local kernel (unless the data passed by the MySQL server does not do any business logic on the client, and the data is given up directly, the consumption of the client will be ahead of the server.)

It seems that streaming is better than cursors, but things are often not as simple as they seem

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

Compared with the cursor query, the impact of streaming on the database is longer.

In addition, streaming query depends on the network, which leads to the possibility of network congestion.

Streaming Vernier memory Analysis

Table data quantity: 500W

Memory View tool: JDK comes with Jvisualvm

Set the JVM parameter:-Xmx512m-Xms512m

Single call memory usage

The streaming query memory performance report is as follows

Figure 1 data is for reference only

The memory performance report of cursor query is as follows

Figure 2 data is for reference only

According to the memory consumption, both cursor query and streaming query can prevent OOM very well.

Concurrent invocation memory usage

Concurrent calls: Jmete calls concurrently with 10 threads in 1 second

The streaming query memory performance report is as follows

Figure 3 data is for reference only

Concurrent calls are also very OK for memory footprint, and there is no superimposed increase.

Average time consumption of concurrent calls for streaming queries: ≈ 55s

The memory performance report of cursor query is as follows

Figure 4 data is for reference only

Average consumption of concurrent call time of cursor query: ≈ 83s

Because equipment restrictions and some cases only occur in extreme conditions, there is no production and testing multi-environment verification, and partners can test themselves if they are interested.

How to use streaming query in MyBatis

The above describes how to use JDBC native API for queries, and the ORM framework Mybatis also encapsulates streaming queries

The ResultHandler interface contains only handleResult methods, and you can get the converted Java entity class.

@ Slf4j @ Service public class MyBatisStreamService {@ Resource private MyBatisStreamMapper myBatisStreamMapper; public void mybatisStreamQuery () {long start = System.currentTimeMillis (); myBatisStreamMapper.mybatisStreamQuery (new ResultHandler () {@ Override public void handleResult (ResultContext)

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report