GaussDB performance tuning-common solutions 04/28 Update SLTechnology News&Howtos

GaussDB performance tuning-common solutions

2025-04-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Directory 1 paging query 2 lock concurrency control 3 return large order of magnitude result set 4 mass INSERT 1 paging query paging query is very common in database applications, several common paging query scenarios. Paging queries generally require sorting, if you can not sort, the performance is generally no problem, the following focus on the need for sorting. There are two common paging query writing methods: ● first select * from (select A. query, rownum rn from (select * from tab1 u where status = 1 order by u.id) a where rownum = 1 ● second select * from tab1 u where status = 1 order by u.id limit 500, 1 scenario optimization and recommended ● scenario 1 description: select * from tab1 u where status = 1 finds a lot of data, and more than 90% of the data in the table meets the conditions. Optimization suggestion: in order to avoid scanning and sorting large tables, it is recommended that the index of the sort field avoid sorting (index full sacn), especially if most queries visit the first page or the first few pages, this performance is very high. ● scenario 2 scenario description: very little data is found in select * from tab1 u where status = 1. 5% of the data in the table meets the conditional optimization recommendations: you should follow the index of where condition at this time. ● scenario 3 description: if you are constantly turning pages on a large table until the data is processed, optimization suggestion: if you still use the SQL of ordinary paging query, then it will be slower and slower later, and a lot of data will be re-scanned, resulting in low efficiency. The better practices are as follows: select a.ranking, rownum rn from (select * from tab1 u where status = 1 and id >: max_id order by u.id) a where rownum: max_id order by id limit 500; each time you get the maximum ID of the last time, you specify that the maximum ID of the query is larger than the maximum ID of the last time, and the ID index is not sorted actually, so the efficiency will be higher, and the efficiency of each query will be relatively stable. ● scenario 4 description: the total number of paging data needs to be counted, and the amount of data is large. It is recommended to optimize: if the total number of paging queries needs to be counted, and the amount of data is still large, then the performance is very poor. It is recommended to cache the results or not to count the total, for example, the first 10 pages can be displayed. 2 Lock concurrency control scenario Analysis ● repeat insert data description: it is necessary to judge whether a piece of data exists, and insert the data if it does not exist. In a concurrency scenario, repeated data insertion may occur if there is no lock control and there is no unique key on the table. For example, the user determines whether the service exists before ordering the service, but the user ID and the business ID are not joint primary keys. In the concurrent scenario, when two sessions query whether a user has the service ordering relationship at the same time, the result does not exist, and then the two sessions will order the service repeatedly for the user. Solution:-through the row lock, lock a piece of data of the user before querying whether the user will have the ordering relationship. For example, if you lock the data in the user's information table, you certainly cannot lock the data to be inserted here, because the data does not exist and the public data cannot be locked, so that the concurrency of all users is affected. The best way to lock is to use select for update, which is less expensive than update. Disadvantages of the scheme: there must be data suitable for locking, which is not generic enough; it must be in a transaction, otherwise the commit post-lock will not exist. -Consulting lock is a user-defined lock. COMMIT/ROLLBACK will not release the lock. It requires the user to explicitly release the lock. If the session is interrupted, the lock will be released automatically. Use consulting locks to solve this problem without locking any user data, and it is recommended that you use this solution across transactions. The return value of ▪ GET_LOCK (name_expr [, timeout_expr]) GET_LOCK () is as follows: 1: lock was acquired successfully. 0: failed to get the lock. Locks acquired through GET_LOCK () can be released in two ways: explicitly: by calling RELEASE_LOCK (). Implicit release: locks held by the session are automatically released when the session is interrupted (normal or abnormal). ▪ RELEASE_LOCK (name_expr) function: releases the lock on the session that previously used the GET_LOCK () function by the lock name. The return value of RELEASE_LOCK () is as follows: 1: the specified lock was acquired successfully. NULL: the current session does not occupy the specified lock. ● concurrent update of the same data problem description: the number of users updating the same data on the same day after opening the session. If many sessions want to update a piece of data at the same time, transaction lock waiting will become a performance bottleneck. In general, the business should avoid such logic and, if necessary, minimize the impact of locks. If an open transaction executes 100ms, updating the number of open transactions requires 5ms, which is also in an open transaction, how to achieve high performance? Solution: the following three ways, although only the location of the number of updates is different, but the impact on the performance gap is very large. If the number of users is updated at the beginning of the open transaction, then the 100ms is almost locked, and if the number of users is updated at the end, the lock is up to a few milliseconds. Therefore, SQL that may block other sessions should be placed at the back of the transaction as far as possible. -update the number of entries first-the number of updates in the middle-the number of updates 3 returns a large order of magnitude result set. If the query result returns a lot of data, it is very important to set fetch_size. If the fetch_size is too small, it will lead to too many interactions and low query efficiency. If the fetch_size is too large or all of it is returned, the memory of the client program may burst. The default fetch_size of different databases is different. The default size of GaussDB T is 100. the corresponding parameter is _ PREFETCH_ROWS. This feature is very important, when the application returns a large amount of data result sets, do not query in batches, because batch queries are more troublesome, and the performance may be very poor, as long as a reasonable setting of fetch_size can query all the data at once, the application does not need to be queried in batches, but fetch in batches by the database. JDBC API: java.sql.PreparedStatement.setFetchSize (int). However, if you query too much data at one time, the snapshot may be too old, which needs to be taken into consideration. 4 large quantities of INSERT insert data is a common business scenario. In order to improve performance, you need to pay attention to the following aspects: ● uses dynamic extent tablespaces: the default extent performance of 8K is poor. ● should have enough redo:redo that is too small to cause the log to rear-end. ● uses partitioned tables and partitioned indexes: inserting data into an empty partition performs better than inserting data into a large table, mainly because the cost of maintaining the index is different; at the same time, insert to different partitions is more efficient than insert to the same partition at the same time, because it can reduce beffer busy waits. ● avoids too many indexes: having too many indexes on a table has a big impact on insert performance. ● parse once, bind multiple times: can reduce the number of parse. ● avoids commit-by-commit: commit-by-commit results in more waits for the log fi waiting sync. ● avoids distributed transactions: distributed transactions are more expensive than stand-alone transactions. March 13, 2020

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.