What is the principle of creating parallelism by PG INDEX? 07/06 Update SLTechnology News&Howtos

What is the principle of creating parallelism by PG INDEX?

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

This article introduces the knowledge of "what is the principle of creating parallelism in PG INDEX". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

It's a long long story, which introduced Heap-Only-Tuple from PG 8.3.The main function is to reduce the number of Icano needed for updating, the update of the principle row based on postgreql is equal to inserting new tuple, based on multi-version control MVCC, the update in Postgres includes finding the row to be updated and inserting the new version of the row into the database, the problem is obvious, indexing, which requires more Icano. The data is reinserted into each index on the table. During the insertion process, each relevant index needs to be read first, and the physical location of the new version is different from that of the old version. The more indexes there are in that table and the greater the amount of data changed, the greater the consumption of the indexes involved.

As can be seen from the above figure, if there is no HOT, then the index page needs another pointer to the modified INDEX data after the update, and if there is a HOT, there is no need to use another pointer C to point to the modified index, only through the original pointer to point to the new index data through t_cid on the original index tuple1. See the following figure for what it looks like in the data structure

Although the efficiency of HOT is high, it needs to meet some requirements before it can be realized. From the above picture,

1 data must (tuple) must be on a page

2 updated data cannot contain data of INDEX itself

Let's start to do relevant experiments to see what HOT is like in practice.

Create table test (id int, name varchar, age float, datetime timestamp)

Insert into test select generate_series (1Jing 10), random () * 100, random () * 1, now ()

Create index idx_test_name on test (name)

Create index idx_test_age on test (age)

Create index idx_test_datetime on test (datetime)

Make a detailed observation of the current index page through pageinspect

Test 1 update the value of ID = 1 in name, but the value can be changed as before.

2 Test 2 update the value of ID = 1 in datetime to the new value. It can be noticed that as long as you update to the indexed field, no matter which one, all INDEX updates will be triggered. In the figure, only the datetime value of id = 1 has been updated, but other indexes have also been updated, and 10 rows have been changed to 11 rows.

We continue to vacuum the table test.

You can see from the following picture that the garbage version of index tuple has been recycled (some screenshots are not complete due to screen reasons)

Why the index will not be updated when changing the same value because recheck_on_update is introduced as the default value for creating the index when PG11, so that the index will detect whether the value at UPDATE is different from the existing value, and the same will not trigger the operation of updating the index tuple.

In fact, we haven't come to INDEX CONCURRENTLY yet, it's just a rough introduction to HOT.

In fact, indexes are built in parallel, requiring three phase

1 begins by declaring in the catalog of the system that we are going to start building a new index, where the index is invisible to the old transaction and visible to the new transaction.

2 start indexing, and start indexing through MVCC snapshots of the table and visible rows in the MVCC that produce snapshots of the table.

3 when the second step to create the index is successful, in fact, in the interval between now and the last MVCC, there are operations to insert UPDATE, DELETE, these are the data inconsistencies between the current new index and the current row, this time will start the second MVCC, in the secondary MVCC to do a snapshot, and repair the new index, the subsequent is to start to maintain the index, at this time the index can work normally.

However, the creation of the overall parallel index is not finished, because the new index is invisible to the old transaction, so the work cannot be rolled out. So you have to wait for the transactions older than this index to be finished before the overall index is visible and working properly.

Therefore, large transactions and long-running transactions are obstacles to parallel index creation and completion as soon as possible, and two MVCC are needed, so generally speaking, parallel index creation time is longer than that of non-parallel creation with exclusive lock.

This is the end of "how PG INDEX creates parallelism". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.