In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
This article will explain in detail how to import data in batches in PostgreSQL. The content of the article is of high quality, so the editor shares it for you as a reference. I hope you will have a certain understanding of the relevant knowledge after reading this article.
Overview
Considering the slow performance of PostgreSQL databases when importing data in bulk, there are only several factors: indexes, triggers, foreign keys, GUID primary keys, and possibly pre-written logs (WAL). Let's start with these factors to optimize. Of course, it is possible that none of the techniques mentioned in this article can effectively solve problems. When we encounter such problems, we need to analyze specific problems and solve them.
Turn off logging
For PostgreSQL 9.5 and later, you can first change the target table to UNLOGGED, and then change it back to LOGGED after loading the data:
ALTER TABLE SET UNLOGGED ALTER TABLE LOGGED
The UNLOGGED schema greatly optimizes the import process by ensuring that PostgreSQL does not log table writes to the write-ahead log (WAL) when variables import data. However, because the operation is not recorded, if a failure such as a crash or server shutdown occurs during the loading process, the data cannot be recovered. Any unrecorded tables will be truncated automatically when PostgreSQL is restarted.
In addition, unrecorded tables are not replicated to the standby server. In this case, the existing replication must be deleted before loading and recreated after loading. Depending on the amount of data in the primary node and the number of standby databases, it can take a long time to rebuild replication, which is unacceptable for high availability requirements.
The following methods are recommended to bulk insert data into unrecorded tables:
Back up tables and data before changing them to unrecorded mode
After the data is loaded, recreate any replication to the standby server
Use UNLOGGED bulk inserts (for example, large lookup tables or dimension tables) for tables that can be easily repopulated.
Delete index
Database indexes can cause severe delays during bulk data insertion. Because of the process of adding data, the corresponding index entries need to be updated in real time.
It is recommended that you delete the index in the target table as much as possible before starting the bulk insert and recreate the index after the import is complete. Similarly, creating an index on a large table can be time-consuming, but it is faster than updating the index during loading.
DROP INDEX,... CREATE INDEX ON (column1, … , column n)
It may be helpful to temporarily increase the maintenance_work_mem configuration parameters before creating the index. Increased working memory can help create indexes faster.
Another option for security is to replicate the target table in the same database using existing data and indexes. Then, test the performance comparison of bulk import data with index and delete index, and then choose a better method according to the test results.
Delete foreign key
Like indexes, foreign key constraints affect the performance of bulk imports. Because during the import process, you must check whether there is a corresponding primary key for each foreign key of each row of data inserted. When bulk imports, the trigger must be triggered for each row to check for foreign keys, thus increasing the overhead.
Unless limited by business rules, it is recommended that you first remove all foreign keys from the target table, load data in a single transaction, and then recreate the foreign keys after the transaction is committed.
ALTER TABLE DROP CONSTRAINT BEGIN TRANSACTION COMMIT ALTER TABLE ADD CONSTRAINT FOREIGN KEY () REFERENCES ().
Similarly, increasing the maintenance_work_mem configuration parameter can also improve the performance of recreating foreign key constraints.
Pause trigger
INSERT or DELETE triggers (if the import process also involves deleting records from the target table) can cause bulk data import delays. This is because each trigger will have logic to check and need to complete the operation immediately after each row is inserted or deleted.
It is recommended that you disable all triggers in the target table before bulk importing data and enable them after the import is complete. Disabling all triggers also forces system triggers that perform foreign key constraint checks.
ALTER TABLE DISABLE TRIGGER ALL ALTER TABLE ENABLE TRIGGER ALL
Use multi-valued INSERT
For bulk data loading, running thousands or hundreds of thousands of INSERT statements can be a bad choice. Because the query optimizer must parse and prepare each individual INSERT command, then do all constraint checks, run as separate transactions and log. The use of multi-valued single INSERT statements can save these unnecessary expenses.
INSERT INTO (,... ,) VALUES (,... ,), (,... ,), (,... ,), (,... ,).
The performance of multi-valued INSERT is affected by existing indexes. It is recommended that you delete the index before running the command before creating the index.
Another thing to note is the amount of memory that PostgreSQL can use to run multi-valued INSERT. When running a multi-valued INSERT, the RAM must hold a large number of input values, and unless there is enough memory available, the process may fail.
It is recommended to set the effective_cache_size parameter to 50% and set shared_buffer to 25% of the machine's total memory. For security reasons, divide the import into multiple multi-valued INSERT, with a value of no more than 1000 lines per statement.
Use the COPY command
It is recommended that you use the PostgreSQL COPY command to import data from one or more files. COPY makes additional optimizations for bulk data imports, which are faster than running a large number of INSERT statements or even multi-valued INSERTS.
COPY [(column1 >, …) ] FROM''WITH (,... ,)
There are many advantages to using COPY:
It supports text and binary file import
Is transactional in nature.
It allows you to specify the structure of the input file
It can import data conditionally using the WHERE clause.
Run ANALYZ
This has nothing to do with improving the performance of bulk data import, but it is strongly recommended that you run the ANALYZE command on the target table immediately after the bulk import. A large number of newly imported rows will greatly change the data distribution in the columns in the data table and make the table statistics obsolete. Query performance can be very slow when using outdated statistics with the query optimizer. Running the ANALYZE command ensures that the statistics are updated.
About how to bulk import data in PostgreSQL to share here, I hope the above content can be of some help to you, you can learn more knowledge. If you think the article is good, you can share it for more people to see.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.