PostgreSQL's MVCC vs InnoDB's MVCC. 07/09 Update SLTechnology News&Howtos

PostgreSQL's MVCC vs InnoDB's MVCC.

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

One of the most important functions of any database is extensibility. If you do not delete each other, there is as little lock competition as possible to achieve this goal. Since read, write, update, and delete are the most important and frequent operations in the database, it is important not to block these operations concurrently. To achieve this, most databases use a concurrency model such as multi-version concurrency control (Multi-Version Concurrency Control). This model can reduce competition to a minimum.

What is MVCC?

Multi Version Concurrency Control (MVCC) is an algorithm that provides a good concurrency control technique by maintaining multiple versions of the same object so that READ and WRITE operations do not conflict. WRITE here refers to UPDATE and DELETE, and does not include Insert because newly inserted records can be protected by their respective isolation levels. Each WRITE operation produces a new version of the object, and each concurrent read operation depends on the isolation level to read a different version of the object. Because READ and WRITE operate on different versions of the same object, these operations do not need to be completely locked, so they can be performed concurrently. Of course, these lock contentions still exist when two concurrent transactions WRITE the same record.

Currently, most database systems support MVCC. The core of this algorithm is to maintain different versions of the same object, so different databases create and maintain multiple versions in different ways, and their implementation is also different. Accordingly, database operations and data storage also change.

The most common ways to implement MVCC are the methods used by PostgreSQL, the methods used by InnoDB and Oracle. We will discuss the implementation of PG and InnoDB in detail below.

MVCC in PostgreSQL

To support multiple versions, PG adds additional fields for each object (PG terminology: Tuple):

1. Xmin: transaction ID that performs insert or update operations. In UPDATE, assign the transactional ID to the new version of tuple.

2. Xmax: transaction ID that performs delete or update operations. In UPDATE, assign the transactional ID to the existing tuple. The newly created tuple, which defaults to null.

PostgreSQL stores all data in HEAP (default 8KB per page). The newly recorded xmin is the old version of the transaction ID; (doing update or delete) of the transaction that created the record, and its xmax is the ID of the transaction performing the operation. There will be a linked list that connects the old version with the new version. The old version record can be reused during the rollback; depending on the isolation level, the READ statement reads an old version record and returns it.

For example, the following two records, T1 (value 1) and T2 (value 2), demonstrate the creation of the record through the following three steps:

As you can see from the figure, there are two initial records in the database: 1 and 2.

Second, update 2 to 3. A new value is created and stored in the next location in the same storage area. The old version 2 assigns the ID of the transaction to its xmax and points to the latest version record.

Similarly, step 3, when T1 is deleted, the record is virtually deleted (its xmax is assigned the current transaction ID), and there is no creation of a new record version.

Next, an example is given to explain how to create multiple versions of each operation without locking how to achieve the isolation level of the transaction. The default isolation level "READ COMMITTED" is used in the following example.

INSERT

Each time you insert a record, a new tuple is created and stored in the page of the table file.

You can see:

1. Session-A starts a transaction with a transaction ID of 495

2. Session-B starts a transaction with a transaction ID of 496,

3. Session-An inserts a tuple and stores it in HEAP

4. The xmin of the new tuple is 495.But xmax is null.

5. Because the transaction of Session-An is not committed, session-B cannot see the value inserted in step 3

6. Session-A submission

7. You can see the newly inserted tuple

UPDATE

PostgreSQL's UPDATE is not a "IN-PLACE" update and does not replace an existing object update with a new value, but creates a new object. So UPDATE involves the following steps:

1. Mark the current object as deleted

2. A new version of the insert object

3. Point the old version of the object to the new version

Therefore, even if many records remain the same, HEAP takes up space, just like inserting another record.

As shown above:

1. Session-A starts a transaction with a transaction ID of 497

2. Session-B starts a transaction with a transaction ID of 498,

3. Session-A updates an existing record

4. Session-A can see the latest version of tuple and Session-B can see another old version. Session-A sees that the newly recorded xmin is 497 Magi XMAX for null;Session-B and the old version xmin is 495 Magi XMAX 497 i.e. Session-A transaction ID. Both versions of tuple are stored in HEAP, even on the same page if space allows.

5. Session-A commits the transaction and the old version disappears

6. All sessions can now see the same version of the record.

DELETE

DELETE operates like UPDATE, except that a new version is not added. UPDATE, for example, simply marks the current object as deleted.

1. Session-A starts a transaction with a transaction ID of 499

2. Session-B starts a transaction with a transaction ID of 500

3. Session-A deletes existing records

4. Session-A cannot see the deleted record of the current transaction; Session-B sees the deleted record of the old version whose xmax is 499499

5. Session-A commits the transaction and the record of the old version disappears

6. None of the sessions can see the previous version

As you can see, none of these operations directly delete existing records and add an additional version if necessary.

Let's take a look at how SELECT works in multiple versions: depending on the isolation level, SELECT needs to read all versions of tuple until the appropriate tuple is found. Suppose you have a tuple T1 that is updated to the new version T1, and then updated to T1variant:

1. When the SELECT operation enters the heap of this table, first check T1. If the xmax transaction of T1 has been committed, find the next version of the tuple.

2. T1' is also submitted to find the next version

3.] finally find the xmin of T1posts' see that xmax has not been submitted or is null, then the xmin of 'T1posts' is visible, and finally read the tuple of T1posts'.

You can see that you need to traverse the three versions of the tuple to find the appropriate visible version, until the VACUUM process reclaims the record tagged with delete.

MVCC in InnoDB

To support multiple versions, InnoDB maintains several additional fields for line records:

1. DB_TRX_ID: transaction ID for inserting or updating navigation records

2. DB_ROLL_PTR: roll back the pointer to the undo log record in the rollback segment

Compared to PostgreSQL, InnoDB also creates multiple versions of row records, but stores older versions differently.

InnoDB stores older versions of row records in a separate tablespace / storage space (rollback segment). Unlike PostgreSQL, InnoDB stores only the latest version of the row record in the tablespace of the table and the old version in the rollback segment. The role of undo log in the rollback segment: used for rollback operations; depending on the isolation level, perform multi-version reading and read the old version.

For example, two rows of records, T1 (with a value of 1) and T2 (with a value of 2), can explain the process of creating a new record in the following three steps:

As you can see from the above figure, initially, there are two records 1 and 2 in the table.

In the second stage, the row record T2 value 2 is updated to 3. At this point, the record creates a new version and replaces the old version. The old version is stored in the rollback segment (note that the data in the rollback segment contains only the changed value, that is, delta value), while the rollback pointer in the new line record points to the old version in the rollback segment. Unlike PostgreSQL, the InnoDB update is "IN-PLACE".

Similarly, the third step is to delete T1 and mark it as a virtual delete (delete only on one of the bit bits specified in the row record) and insert a corresponding new version in the rollback segment. Similarly, the rollback pointer points to the undo log in the rollback segment.

On the face of it, all operations look the same as PostgreSQL, except that multiple versions are stored in different ways internally.

MVCC:PostgreSQL vs InnoDB

The following is an analysis of the main differences in MVCC between PostgreSQL and InnoDB:

1. The size of the old version

PostgreSQL only updates the xmax of the older version of tuple, so the size of the old version is the same as the corresponding inserted record size. This means that if there are three versions of an older tuple, they are all the same size (if the update values are of different sizes, the actual size of each update will be different).

The older version of InnoDB is stored in the rollback segment and is smaller than the corresponding insert record, because InnoDB only writes the changed values to undo log.

2. INSERT operation

When INSERT, InnoDB writes additional records to the rollback segment, while PostgreSQL only creates a new version in UPDATE.

3. Restore the old version on rollback

When rolling back, PostgreSQL does not need any specific content, so note that the old version of xmax is equal to update the transaction ID of the record. Therefore, in the concurrent snapshot, the record is considered alive until the transaction of the transaction ID commits.

InnoDB, once rolled back, needs to reconstruct the old version of the object.

4.] recycle the space occupied by the old version

In PG, the space occupied by older versions can only be reclaimed when no concurrent snapshots are used, which is considered dead. Then VACCUM can reclaim the space. VACCUM can be triggered manually or in background tasks depending on the configuration.

The undo log of InnoDB is divided into INSERT UNDO and UPDATE UNDO. The INSERT UNDO is released immediately after the transaction is committed. UPDATE UNDO can be released only when there are no other concurrent snapshots in use. InnoDB does not show VACUUM operations but has a similar PURGE recycling undo log.

5. The influence of delayed vacuum

As shown earlier, PostgreSQL latency vacuum has a significant impact. Even if delete is performed frequently, it will cause the table to bloat and take up a lot of storage space. This also results in the need to perform a costly operation VACUUM FULL when a point is reached.

6. Sequential scanning when the table is expanding

Even if all records are in dead state, PostgreSQL's sequential scan scans all older versions of the object until vacuum deletes the dead's record. This is a common and frequently discussed problem in PG. The main PG stores all older versions of a tuple in the same storage area.

InnoDB, on the other hand, does not need to read undo log unless necessary. If all undo records have been invalidated, you only need to read the latest version of all objects.

7. Index

PostgreSQL stores the index independently and connects the index to the real data in HEAP. So even if the index is not changed, it is sometimes necessary to update the index. This problem is then resolved by HOT (Heap Only Tuple), but there are still limitations, and if there is not enough space on the same page, it goes back to the normal UPDATE operation.

InnoDB does not have this problem because of its use of clustered indexes.

Conclusion

PostgreSQL's MVCC has some disadvantages, especially when it has a frequent UPDATE/DELETE load, which can cause the table to swell. Therefore, when you decide to choose PG, you need to configure VACUUM carefully.

The PG community is aware of this problem and has begun to cover undo-based MVCC (tentatively named ZHEAP), which we will see in future releases.

Original text

Https://severalnines.com/blog/comparing-data-stores-postgresql-mvcc-vs-innodb

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.