The principle of MVCC implementation of PostgreSQL, Oracle/MySQL and SQL Server 07/02 Update SLTechnology News&Howtos

The principle of MVCC implementation of PostgreSQL, Oracle/MySQL and SQL Server

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Transfer to: http://www.bkjia.com/oracle/1068936.html

The principle of MVCC implementation of PostgreSQL, Oracle/MySQL and SQL Server

Relational database management systems use MVCC (Multiversion Concurrency Control multi-version concurrency control) to avoid the concurrency problem of write operations blocking read operations. MVCC is a mechanism that ensures non-conflict between read and write through the use of multiple versions of data. Different databases have different implementations, which is also a headache for database systems. Relational databases look simple and convenient on the surface, and it is reassuring to use standard SQL statements to operate. However, with the increase of the size of the system and the increase of concurrent users, the performance of the database will decrease. At this time, we may need to go from external fine-tuning to in-depth study of internal principles, and the principle of concurrency in each database is different. If we have multiple different databases, we need different tuning methods. At this time, the core database of the production system begins to become less reassuring. This paper provides different internal MVCC implementations of several popular databases on the market.

There are two different implementations of MVCC. The first is to keep multiple versions of the data records in the database, and when these different versions of the data are no longer needed, the garbage collector collects the records. This method is adopted by PostgreSQL and Firebird/Interbase, and SQL Server uses a similar mechanism, except that the old version of the data is not saved in the database, but is saved in another database tempdb, which is different from the main database. / the second implementation only stores the latest version of the data in the database, but dynamically reconstructs the old version of data when using undo, which is used by Oracle and MySQL/InnoDB. Let's look at the specific database implementation mechanism. PostgreSQL's MVCC in PostgreSQL, when a row of records is updated, a new version of the data (called tuple) is created and inserted into the table. The previous version provides a pointer to the new version, which is marked as "expired" out of date, but remains in the database until the garbage collector collects it. To support multiple versions, each tuple has the following additional data records: xmin-the ID xmax that inserts the update record and the transaction that created the tuple-the transaction that deletes the record or creates this new version of tuple or deletes the record. This field starts with null. The transaction state is saved in the CLOG of $Data/pg_clog. This table contains two bytes of status information for each transaction. Possible states are in-progress, committed, or aborted. When a transaction ends, PostgreSQL does not roll back changes to database records by undo, it just marks the transaction as aborted in CLOG. An PostgreSQL table may contain a lot of data that such aborted exits the transaction. A process called Vacuum cleanup provides garbage collection of recorded versions of expired expired / aborted exited, and the Vacuum cleaner removes index entries related to tuple that have been garbage collected. When a tuple's xmin is valid and the xmax is invalid, it is visible. "Valid is valid" means "either committed or represents the current transaction". To avoid repeatedly manipulating the CLOG table, PostgreSQL maintains a status identity in the tuple to indicate whether the tuple is "known committed" or "known aborted". Oracle's MVCC Oracle holds the old version in the rollback segment (that is, 'undo log'). A transaction ID is not a sequential number, but consists of a series of numbers that point to the header transaction slot (slot) of the rollback segment. Rollback segments enable new transactions to reuse storage and reuse transaction slots used by old transactions that have been committed or exited. This automatic reuse mechanism enables Oracle to manage a large number of transactions using a limited number of rollback segments. The header block of the rollback segment is used as a transaction table, where the state of the transaction is saved, called System Change Number or SCN. Oracle is not a transaction ID that stores every record in the page, but saves space through an array array of unique transaction ID for each row of records on the page. Only the array offset offset of the record is saved, and a pointer is saved with each transaction ID. Pointing to the last undo record created by the page transaction, not only the table record is stored in this way, but the index record also uses the same technique, which is one of the main differences between Oracle and PostgreSQL. When an Oracle transaction starts, it marks a current transaction status SCN. When reading a table or an index page, Oracle uses SCN numbers to determine whether the page contains transaction effects that should not be known to the current transaction. Oracle checks the status of the transaction by looking for associated rollback segment headers, but to save time, the first time is to actually query the transaction, and the status of the query completion is recorded on the page to avoid further query If the page is found to contain the effects of invisible transactions, Oracle recreates the old version of the page by undoing each such transaction impact. It scans records related to each transaction and applies these transaction effects to the page until all transaction effects are removed, and new pages created in this manner are used to access the tuple in it. Record header in Oracle: a record header does not grow and always has a fixed size. For non-clustered tables, the record header is 3 bytes, one byte is used to store the identity, one byte is used to show whether the record is locked (for example, it is updated but does not confirm the commit committed), and one byte is used for column counting. SQL Server's MVCC uses record versions within SQL Server databases to achieve snapshot isolation and read commit. Only databases that need this item must be opened and incur corresponding costs. When a record is modified or deleted, the version can be effectively started using the copy-on-write mechanism, and Row versioning-based transactions can effectively "view" consistent versions of the data from the past to the present. The record version Row version is stored in the version store and resides in a tempdb database other than the main database. More specifically, when a record in a table or index is modified, the new record will carry the "sequence_number" of the transaction on which the modification was performed. The old version of the record is copied to the version store, and the new record contains a pointer to the old record in the version store. If multiple long-running long-running transactions exist and multiple "version versions" are required, the record in the version store may contain a pointer to an earlier version of the record. SQL Server version storage cleanup: SQL Server automatically manages the size of the version store and maintains a cleanup thread to ensure that the number of versions recorded in the version store does not exceed the need. For queries running under snapshot isolation, the version store retains the record version until the transaction that modifies the data is completed, and all statements contained in the transaction that need to modify the data are completed. For SELECT statements running under Read Committed snapshot isolation, a special record version is no longer needed and is removed as soon as the SELECT statement is executed. If tempdb has no free space, SQL Server will call the clear function to increase the file size, of course, the premise is that our configuration file is automatically growing, if the disk has no space, the file can not automatically grow, SQL Server will stop producing the version, if this happens, any snapshot query that needs to read the version will fail because of space restrictions. The first 4 bytes of records in SQL Server-two bytes of record metadata (record type)-point forward to the NULL bitmap bitmap in the record. This is the difference in the actual size of the record (fixed length column) offset. Version tag Versioning tag-this is a 14-byte structure that contains a timestamp plus a pointer to the version store in tempdb, where the timestamp is trasaction_seq_number, the time when version information is added to the record when a version operation is needed.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.