In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
How to carry out batch SQL optimization, in view of this problem, this article introduces the corresponding analysis and solution in detail, hoping to help more partners who want to solve this problem to find a more simple and feasible method.
Sometimes in the work, we need to persist a large amount of data to the database, if the amount of data is very large, the execution speed of direct insertion is very slow, and because the insertion operation does not have much room for sql optimization, it can only be optimized from the point of view of program code. So this article will try to optimize the insert operation in several different ways to see how you can minimize SQL execution time.
Take inserting 1000 pieces of data as an example, first prepare the data for database insertion testing:
Private List prepareData () {List orderList=new ArrayList (); for (int I = 1; I sqlSession.insert (sqlStatement, entity));}
The executeBatch method is called:
Protected boolean executeBatch (Collection list, int batchSize, BiConsumer consumer) {Assert.isFalse (batchSize)
< 1, "batchSize must not be less than one"); return !CollectionUtils.isEmpty(list) && executeBatch(sqlSession ->{int size = list.size (); int I = 1; for (E element: list) {consumer.accept (sqlSession, element); if ((I% batchSize = = 0) | | I = = size) {sqlSession.flushStatements ();} iTunes;});}
In the for loop, the accept of consumer performs the insert operation of sqlSession, and this stage is the stitching of sql. The batch of data will be refreshed to the database only after the execution of the for loop is completed. In other words, we have made 1000 requests to the database server before, but with bulk insert, we only need to initiate one request. If an exception is thrown, it is rolled back and no data is written to the database. However, although the number of database requests is reduced, there is no significant improvement in shortening execution time.
Parallel flow
Stream is a key abstract concept used to deal with collections in JAVA8, which can carry out complex operations such as search, filtering, data mapping and so on. On the other hand, parallel flow Parallel Stream can divide the entire data content into multiple data blocks, and use multiple threads to process the stream of each data block separately. In the insert operation of a large amount of data, there is no dependent coupling of the data, so it can be split and inserted using parallel streams. The code inserted by the test is as follows:
Public void stream () {List orderList = prepareData (); long startTime = System.currentTimeMillis (); orderList.parallelStream (). ForEach (order- > orderMapper.insert (order)); System.out.println ("Total time:" + (System.currentTimeMillis ()-startTime) / 1000.0 + "s");}
Or test the above code first:
You can find that it is much faster than before, because the underlying layer of the parallel flow uses the Fork/Join framework, specifically the idea of "divide and conquer", splits the task, executes it with different threads, and finally summarizes (students who are not familiar with Fork/Join can review the article on request for merge and divide and conquer, which describes its basic use). Parallel flow uses the ForkJoinPool thread pool at the bottom, and as you can see from the default constructor of ForkJoinPool, the number of default threads it has is equal to the number of logical processors on the computer:
Public ForkJoinPool () {this (Math.min (MAX_CAP, Runtime.getRuntime (). AvailableProcessors ()), defaultForkJoinWorkerThreadFactory, null, false);}
In other words, if our server is a logical 8 core, then there will be eight threads to perform the insert operation at the same time, greatly reducing the execution time. In order to improve the parallelism and throughput of tasks, ForkJoinPool thread pool adopts task theft mechanism, which can further shorten the execution time.
Fork/Join
In parallel flows, the number of threads created in the ForkJoinPool is fixed, so can the execution efficiency be further improved by manually modifying the number of threads in the thread pool? In general, it is OK to set the number of threads equal to the number of processors in the thread pool, because if you create too many threads, it will take extra time for threads to switch contexts frequently, which will increase the overall execution time. But for the insert operation of batch SQL, there is no complex business processing logic, it just needs to interact with the database frequently, which belongs to the intensive operation. For the intensive operation of CPU O, there are a lot of waiting time in the program, which leads to the low utilization rate of TCP. So let's try to increase the number of threads to see if we can further shorten the execution time.
Define the insert task, because there is no need to return, directly inherit the RecursiveAction parent class. Size is the number of tasks contained in each queue, which is passed in the constructor, and if the number of tasks in a queue is greater than it, then continue to split until the number of tasks is small enough:
Public class BatchInsertTask extends RecursiveAction {private List list; private BaseMapper mapper; private int size; public BatchInsertTask (List list, BaseMapper mapper, int size) {this.list = list; this.mapper = mapper; this.size = size;} @ Override protected void compute () {if (list.size () mapper.insert (item));} else {int middle = list.size () / 2 List left = list.subList (0, middle); List right = list.subList (middle, list.size ()); BatchInsertTask leftTask = new BatchInsertTask (left, mapper, size); BatchInsertTask rightTask = new BatchInsertTask (right, mapper, size); invokeAll (leftTask, rightTask);}
Using ForkJoinPool to run the tasks defined above, the number of threads in the thread pool is twice that of CPU threads, and the number of SQL executed is evenly divided into the execution queue of each thread:
Public class BatchSqlUtil {public static void runSave (List list, BaseMapper mapper) {int processors = getProcessors (); ForkJoinPool forkJoinPool = new ForkJoinPool (processors); int size = (int) Math.ceil ((double) list.size () / processors); BatchInsertTask task = new BatchInsertTask (list, mapper, size); forkJoinPool.invoke (task);} private static int getProcessors () {int processors = Runtime.getRuntime (). AvailableProcessors () Return processors
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.