How to improve the performance of PostgreSQL 07/06 Update SLTechnology News&Howtos

How to improve the performance of PostgreSQL

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article introduces the relevant knowledge of "how to improve the performance of PostgreSQL". Many people will encounter such a dilemma in the operation of actual cases, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

Use Postgres to monitor slow Postgres queries

Earlier this week, a primary key query for a small table (10GB, 15 million rows) on our graphical editor caused a major query performance problem on one (or more) of our databases.

Queries from 99.9% to 99.9% are very fast and smooth, but in some places where a large number of enumerated values are used, these queries can take 20 seconds. Spending so much time on the database means that the user must wait for a response from the graphical editor in front of the browser. Obviously, just because of this 0.01% will have a very bad impact.

Query and query plan

Here is the query that went wrong

SELECT c.key, c.x_key, c.tags, x.name FROM context c JOIN x ON c.x_key = x.keyWHERE c.key = ANY (ARRAY [15368196,-- 11000 other keys--)]) AND c.x_key = 1 AND c.tags @ > array [E 'blah']

Table X has thousands of rows of data, and Table C has 15 million data. The primary key value "key" of both tables has the appropriate index. This is a very simple and clear primary key query. But interestingly, when increasing the number of primary key contents, such as when the primary key has 11000 values, we get the following query plan by adding EXPLAIN (ANALYZE, BUFFERS) to the query statement.

Nested Loop (cost=6923.33..11770.59 rows=1 width=362) (actual time=17128.188..22109.283 rows=10858 loops=1) Buffers: shared hit=83494-> Bitmap Heap Scan on context c (cost=6923.33..11762.31 rows=1 width=329) (actual time=17128.121..22031.783 rows=10858 loops=1) Recheck Cond: (tags @ >'{blah}':: text []) AND (x_key = 1) Filter: (key = ANY ('{15368196) (a lot more keys here)}':: Intel []) Buffers: shared hit=50919-> BitmapAnd (cost=6923.33..6923.33 rows=269 width=0) (actual time=132.910..132.910 rows=0 loops=1) Buffers: shared hit=1342-> Bitmap Index Scan on context_tags_idx (cost=0.00..1149.61 rows=15891 width=0) (actual time=64.614..64.614 rows=264777 loops=1) Index Cond: (tags @ >'{blah}':: Text []) Buffers: shared hit=401-> Bitmap Index Scan on context_x_id_source_type_id_idx (cost=0.00..5773.47 rows=268667 width=0) (actual time=54.648..54.648 rows=267659 loops=1) Index Cond: (x_id = 1) Buffers: shared hit=941-> Index Scan using x_pkey on x (cost=0.00..8.27 rows=1 width=37) (actual time=0.003..0.004 rows=1 Loops=10858) Index Cond: (x.key = 1) Buffers: shared hit=32575Total runtime: 22117.417 ms

As you can see at the bottom of the result, the query took a total of 22 seconds. We can very visually observe the cost of 22 seconds through the CPU utilization chart below. Most of the time is spent on Postgres and OS, and only a small part of it is spent on Imax O.

At the lowest level, these queries look like the peaks of these CPU utilization. The CPU diagram is rarely useful, but in this case it confirms a key point: the database is not waiting for the disk to read the data. It's doing things like sorting, hashing and row comparisons.

The second interesting measure is the trajectory close to these peaks, which is the number of rows "fetched" by Postgres (in this case, it's not returned, so let's just ignore it).

It is obvious that some actions have browsed many lines regularly and methodically: our query.

The problem with Postgres: bitmap scanning

The following is the query plan for row matching

Buffers: shared hit=83494-> Bitmap Heap Scan on context c (cost=6923.33..11762.31 rows=1 width=329) (actual time=17128.121..22031.783 rows=10858 loops=1) Recheck Cond: (tags @ >'{blah}':: text []) AND (x_key = 1)) Filter: (key = ANY ('{15368196, (a lot more keys here)}':: integer []) Buffers: shared hit=50919

Postgres uses bitmap to scan table C. When the amount of data of the primary key is small, it can effectively use the index to create a bitmap in memory. If the bitmap is too large, the optimal query plan changes the query mode. In our query, because the primary key contains a large amount of data, the query uses the optimal (judged by the system itself) to retrieve the candidate rows of the query, and immediately queries all the data that match the primary key. It is these things that take too much time to put into memory and query immediately (Recheck Cond in the query plan).

Fortunately, only 30% of the data is imported into memory, so it's not as bad as reading from a hard drive. But it still has a very significant impact on performance. Remember, the query is very simple. This is a primary key query so there are not many obvious ways to determine if it has dramatically restructured the database or application. PGSQL-Performance mailing list has been a great help to us.

Solution

This is another reason why we like open source and like to help users. Tom Lane, one of the most prolific programmers among the authors of open source code, suggests that we try the following:

SELECT c.key, c.x_key, c.tags, x.name FROM context c JOIN x ON c.x_key = x.keyWHERE c.key = ANY (VALUES (15368196),-- 11000 other keys--) AND c.x_key = 1 AND c.tags @ > array [E 'blah']

Change ARRAY to VALUES. Can you point out the difference between them?

We use ARRAY [...] Enumerate all the keywords for the query, but this deceives the query optimizer. However, Values (...) But it allows the optimizer to make full use of keyword indexes. It's just a change in one line of code, and it doesn't make any semantic changes.

Here is how the new query is written, except for lines 3 and 14.

Nested Loop (cost=168.22..2116.29 rows=148 width=362) (actual time=22.134..256.531 rows=10858 loops=1) Buffers: shared hit=44967-> Index Scan using x_pkey on x (cost=0.00..8.27 rows=1 width=37) (actual time=0.071..0.073 rows=1 loops=1) Index Cond: (id = 1) Buffers: shared hit=4-> Nested Loop (cost=168.22..2106.54 rows=148 width=329) (actual time=22.060..242.406 rows=10858 loops=1) Buffers: shared Hit=44963-> HashAggregate (cost=168.22..170.22 rows=200 width=4) (actual time=21.529..32.820 rows=11215 loops=1)-> Values Scan on "* VALUES*" (cost=0.00..140.19 rows=11215 width=4) (actual time=0.005..9.527 rows=11215 loops=1)-> Index Scan using context_pkey on context c (cost=0.00..9.67 rows=1 width=329) (actual time=0.015..0.016 rows=1 loops=11215) Index Cond: ( C.key = "* VALUES*" .column1) Filter: (c.tags @ >'{blah}':: text []) AND (c.x_id = 1) Buffers: shared hit=44963Total runtime: 263.639 ms

The query time has dropped from 22000ms to 200ms, and the efficiency of changing just one line of code has been increased by 100 times.

New queries used in production

A piece of code to be released:

It makes the database look more beautiful and relaxed.

This is the end of "how to improve PostgreSQL performance". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.