In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-22 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
This article focuses on "how to ensure fast and safe mass insertion of data into the database", interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn how to quickly and safely insert data into the database in large quantities.
Preface
Recently, there is a need to parse an order file, and the description file can reach 10 million pieces of data, each data is about 20 fields, each field is separated by a comma, and needs to be stored in half an hour as far as possible.
Train of thought 1. Estimate file size
Because there are tens of millions of documents and each record is about 20 fields, you can roughly estimate the size of the entire order file. The method is also very simple to use FileWriter to insert 10 million pieces of data into the file to check the file size. After testing, it is about 1.5g.
two。 How to insert in bulk
It can be seen from the above that the file is relatively large, so it is certainly not possible to read memory at one time. The method is to intercept part of the data from the current order file each time, and then insert it in batches. How to insert in batches can use insert (...) values (...), (...) It has been tested that this way is quite efficient.
3. Integrity of data
When intercepting data, you need to pay attention to the need to ensure the integrity of the data. Each record is finally a newline character. According to this ID, you need to ensure that each intercept is the whole number, and there is no such situation as half a piece of data.
4. Does the database support batch data?
Because the batch data needs to be inserted, whether the database supports a large number of data writes, such as the mysql used here, you can ensure the amount of data submitted by the batch by setting max_allowed_packet.
5. The situation of making a mistake in the middle of the road.
Because it is a large file parsing, if an error occurs midway, such as when the data is just inserted to 900w, and the database connection fails, it is impossible to re-insert the data, all need to record the location of each data insertion, and need to ensure that the data inserted in the batch is in the same transaction, so that the insertion can continue from the location of the record after recovery.
Achieve 1. Prepare the data sheet
Two tables need to be prepared here: order status location information table and order table.
CREATE TABLE `file_ analysis` (`id` bigint (20) NOT NULL AUTO_INCREMENT, `file_ type` varchar (25585) NOT NULL COMMENT 'file type 01: type 1mem02: type 2traits, `file_ name`NOT NULL COMMENT' file name', `file_ path`varchar 'NOT NULL COMMENT' file path', `status`varchar 'file status 0 initialized successfully 2 failed: 3 processing', `position` bigint (20) NOT NULL COMMENT 'location where the last processing was completed', `upd_ time`datetime NOT NULL COMMENT 'creation time', `upd_ time` datetime NOT NULL COMMENT 'update time', PRIMARY KEY (`id`) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8CREATE TABLE `file_ order` (`id`bigint (20) NOT NULL AUTO_INCREMENT, `file_ id` bigint (20) DEFAULT NULL, `field1` varchar (255,255) DEFAULT NULL, `field2` varchar (255i) DEFAULT NULL, `field3` varchar (255i) DEFAULT NULL `field4` DEFAULT NULL, `field5` varchar DEFAULT NULL, `field6` varchar, `field7` varchar, `field8` varchar, `field9` varchar, `field10` varchar, `field11` varchar, `field12` varchar, `field13` varchar, `field14` varchar (255) DEFAULT NULL, `field15` varchar, `field16`varchar (255) DEFAULT NULL, `field17` varchar (255) DEFAULT NULL `field18` varchar (255) DEFAULT NULL, `upd_ time` datetime NOT NULL COMMENT 'creation time', `upd_ time` datetime NOT NULL COMMENT 'update time', PRIMARY KEY (`id`) ENGINE=InnoDB AUTO_INCREMENT=10000024 DEFAULT CHARSET=utf82. Configure database package size mysql > show VARIABLES like'% max_allowed_packet%' +-- +-+ | Variable_name | Value | +-+-+ | max_allowed_packet | 1048576 | | slave_max_allowed_packet | 1073741824 | + -- +-+ 2 rows in setmysql > set global max_allowed_packet = 1024 102410 Query OK, 0 rows affected
Set max_allowed_packet to ensure that the database can receive the packet size inserted by the batch; otherwise, the following error will occur:
Caused by: com.mysql.jdbc.PacketTooBigException: Packet for query is too large (4980577 > 1048576). You can change this value on the server by setting the max_allowed_packet' variable. At com.mysql.jdbc.MysqlIO.send (MysqlIO.java:3915) at com.mysql.jdbc.MysqlIO.sendCommand (MysqlIO.java:2598) at com.mysql.jdbc.MysqlIO.sqlQueryDirect (MysqlIO.java:2778) at com.mysql.jdbc.ConnectionImpl.execSQL (ConnectionImpl.java:2834) 3. Prepare test data public static void main (String [] args) throws IOException {FileWriter out = new FileWriter (new File ("D://xxxxxxx//orders.txt")); for (int I = 0; I < 1000000; iTunes +) {out.write ("vaule1,vaule2,vaule3,vaule4,vaule5,vaule6,vaule7,vaule8,vaule9,vaule10,vaule11,vaule12,vaule13,vaule14,vaule15,vaule16,vaule17,vaule18") Out.write (System.getProperty ("line.separator"));} out.close ();}
Using FileWriter traversal, you can insert 1000W pieces of data into a file, which is still very fast. Don't forget to add a newline character (\ n\ r) after each piece of data.
4. Intercept the integrity of data
In addition to setting the size of each read file, you also need to set a parameter to get a small part of the data at a time, and get the newline character (\ n\ r) from this small part of the data. If you don't get it until it is accumulated all the time, the setting size of this value is roughly the same as the size of each piece of data. Some of the implementations are as follows:
ByteBuffer byteBuffer = ByteBuffer.allocate (buffSize); / / apply for a cache long endPosition = batchFileSize + startPosition-buffSize;// subfile end location long startTime, endTime;for (int I = 0; I < count; ifile +) {startTime = System.currentTimeMillis (); if (I + 1! = count) {int read = inputChannel.read (byteBuffer, endPosition) / / read data readW: while (read! =-1) {byteBuffer.flip (); / / switch read mode byte [] array = byteBuffer.array (); for (int j = 0; j < array.length; jacks +) {byte b = array [j] If (b = = 10 | | b = = 13) {/ / judge\ n\ r endPosition + = j; break readW;}} endPosition + = buffSize; byteBuffer.clear (); / / reset cache block pointer read = inputChannel.read (byteBuffer, endPosition) }} else {endPosition = fileSize; / / the last file points directly to the end of the file}. Omit, more can see the complete code of Github.}
As shown in the above code, a buffer is opened up, which is set at about 200 bytes according to the data size of each row, and then traverses to find the newline character (\ n\ r) to find that the current position will be added to the previous end position in the future. ensures the integrity of the data
5. Batch insert data
Through insert (...) values (...), (...) To insert data in batches, part of the code is as follows:
/ / Save order and parse location guarantee in one transaction SqlSession session = sqlSessionFactory.openSession (); try {long startTime = System.currentTimeMillis (); FielAnalysisMapper fielAnalysisMapper = session.getMapper (FielAnalysisMapper.class); FileOrderMapper fileOrderMapper = session.getMapper (FileOrderMapper.class); fileOrderMapper.batchInsert (orderList) / / Update the last parsed location, and specify the update time fileAnalysis.setPosition (endPosition + 1); fileAnalysis.setStatus ("3"); fileAnalysis.setUpdTime (new Date ()); fielAnalysisMapper.updateFileAnalysis (fileAnalysis); session.commit (); long endTime = System.currentTimeMillis () System.out.println ("= = insert data cost:" + (endTime-startTime) + "ms===");} catch (Exception e) {session.rollback ();} finally {session.close ();}. Omit, more can see the complete code of Github.
As shown in the above code, both batch order data and file parsing location information are saved in a transaction. BatchInsert traverses the order list by using mybatis tags to generate values data.
Summary
The above shows part of the code, the complete code can view the Github address of the batchInsert module, the local setting of each intercept file size of 2m, after testing 1000W pieces of data (size 1.5G or so) inserted into the mysql database, about 20 minutes, of course, you can intercept by setting the file size, the time spent will be changed accordingly.
At this point, I believe you have a deeper understanding of "how to ensure fast and safe mass insertion of data into the database". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.