Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the underlying principle of InnoDB?

2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly shows you "what is the underlying principle of InnoDB", the content is easy to understand, clear, hope to help you solve your doubts, the following let the editor lead you to study and learn "what is the underlying principle of InnoDB" this article.

InnoDB, one of the database engines of MySQL, is now the default storage engine of MySQL and one of the standards for publishing binary for MySQL AB. InnoDB was developed by Innobase Oy and acquired by Oracle in May 2006. Compared with traditional ISAM and MyISAM, the most important feature of InnoDB is that it supports ACID-compatible transaction (Transaction) function, similar to PostgreSQL.

Storage engine

Many articles start directly with which storage engines are available, without introducing the storage engine itself. So what exactly is a storage engine? Have you ever thought about how MySQL stores the data we throw in?

In fact, the storage engine is also very simple. I think it is a storage solution that implements the functions of adding data, updating data, indexing and so on.

What are the existing storage engines for us to choose from?

InnoDB 、 MyISAM 、 Memory 、 CSV 、 Archive 、 Blackhole 、 Merge 、 Federated 、 Example

There are many kinds of storage engines, but at present, the only commonly used storage engines are InnoDB and MyISAM, which I will focus on.

InnoDB is currently the most widely used MySQL storage engine, and MySQL has been the default storage engine since version 5. 5 InnoDB. Do you know why InnoDB is so widely used? Putting this question aside, let's take a look at the underlying principles of the InnoDB storage engine.

The memory architecture of InnoDB is mainly divided into three blocks, buffer pool (Buffer Pool), redo buffer pool (Redo Log Buffer) and extra memory pool.

Buffer pool

InnoDB stores the data on disk in order to persist the data. However, in the face of a large number of requests, the gap between the processing speed of CPU and the IO speed of disk is too big. In order to improve the overall efficiency, InnoDB introduces buffer pool.

When there is a request to query the data, if it is not in the cache pool, it will look in the disk and put the matching data into the cache pool. Similarly, if there is a request to modify the data, MySQL will not directly modify the disk, but will modify the data already in the page of the buffer pool, and then brush the data back to disk, which is the role of the buffer pool, speed up reading, speed up writing, and reduce IO interaction with the disk.

To put it bluntly, the buffer pool is to throw the data from the disk into memory, so since it is memory, there will be no memory space to allocate. So the buffer pool uses the LRU algorithm to eliminate pages when there are no free pages in the buffer pool. But using this algorithm will bring a problem called buffer pool pollution.

When you are doing a batch scan or even a full table scan, you may replace all the hot pages in the buffer pool. This may lead to a cliff degradation of MySQL performance. So InnoDB made some optimizations to LRU to avoid this problem.

MySQL uses log first. Before actually writing data, it will first record a log, called Redo Log, and periodically use CheckPoint technology to brush the new Redo Log into disk, which will be discussed later.

In addition to the data, there are index pages, Undo pages, insert buffers, adaptive hash indexes, InnoDB lock information, and data dictionaries. Let's choose a few more important ones to have a brief chat.

Insert buffer

The operation of the insert buffer is to update or insert, and we consider the worst case, that is, the data that needs to be updated is not in the buffer pool. Then there will be the following two options at this time.

It is obvious that a piece of data is written directly to disk and other data reaches a certain threshold (for example, 50 pieces). The second scheme is better and reduces the interaction with disk IO.

Write twice

Since we're all talking about insertion buffering, I have to write twice, because I think the features of the two InnoDB complement each other.

Inserting buffering improves the performance of MySQL, while writing twice improves the reliability of data. We know that when the data is still in the buffer pool, when the machine goes down, a write failure occurs and there is Redo Log to recover. But what if there is an outage when data is flushed back to disk from the buffer pool?

This situation is called partial write failure, and redoing the log will not solve the problem.

Dry information without water: the underlying principle of InnoDB: dry information without water: the basic principle of InnoDB

When scrubbing dirty pages, instead of brushing them directly into the disk, copy them to the Doublewrite Buffer in memory, then copy them to the disk shared tablespace (you can understand it as disk), write 1m each time, and wait for copy to complete, then write the pages in Doublewrite Buffer to the disk file.

With the write mechanism twice, even if the downtime occurs while brushing dirty pages, when the instance is restored, you can find a copy of the Doublewrite Buffer page from the shared tablespace and directly overwrite the original data page.

Adaptive hash indexing

Adaptive index is the same as JVM will dynamically compile some hot code into Machine Code. InnoDB will monitor the query of all indexes and establish a hash index on the pages accessed by hot spots, so as to improve access speed.

You may have seen a keyword page many times, so let's talk about what the page is.

Page

Page, which is the smallest unit of data management in InnoDB. When we query the data, it loads the data from the disk into the buffer pool in pages. By the same token, updating data is also on a page-by-page basis, brushing our changes to the data back to disk. The default size of each page is 16k, each page contains several rows of data, and the structure of the page is shown in the following figure.

Dry information without water: the underlying principle of InnoDB: dry information without water: the basic principle of InnoDB

Don't worry too much about what each district is for, we just need to know what the benefits of this design are. The data of each page can be passed through the data of the previous page and the next page in FileHeader, and a two-way linked list can be formed between pages. Because in the actual physical storage, the data is not continuously stored. You can think of it as the distribution of G1 Region in memory.

For the row data contained in a page, an one-way linked list is formed between rows. The row data we store ends up in User Records, although User Records doesn't take up any storage space at first. As we store more and more data, the User Records will become larger and larger, and the Free Space space will become smaller and smaller, until we run out of data, we will apply for a new data page.

The data in User Records is sorted by the primary key id, and when we search by the primary key, we look all the way back along this one-way linked list.

Redo log buffering

As mentioned above, the page data update in the buffer pool in InnoDB will precede the disk data update, and InnoDB will also use the log first (Write Ahead Log) strategy to refresh the data. When the transaction starts, the Redo Log is logged to Redo Log Buffer before the buffer pool page data is updated.

The data in Redo Log Buffer is written to the redo log at a certain frequency. The changed pages will be marked as dirty pages, and InnoDB will brush the dirty pages to disk according to the CheckPoint mechanism.

Journal

The above mentioned Redo log, this section is devoted to the log, the log is divided into the following two dimensions.

MySQL level

InnoDB level

MySQL log

MySQL log can be divided into error log, binary file, query log and full query log.

The error log is easy to understand, which is the log of serious errors that occur during the operation of the service. When our database fails to start, we can come here to see what the specific reason is that it cannot be started. It has another name that you should be familiar with, Binlog, which records all changes to the database. The query log records all statements from the client slow query log. Here, we record all SQL statements whose response time exceeds the threshold. We can set this threshold by ourselves. The parameter is long_query_time, and its default value is 10s. It is turned off by default and needs to be opened manually.

InnoDB log

There are only two kinds of InnoDB logs, Redo Log and Undo Log

The Redo Log redo log is used to record changes in transaction operations and records the modified values. The transaction is recorded regardless of whether the transaction is committed or not. For example, when updating data, the updated record is first written to Redo Log, and then the data in the page in the cache is updated. Then, according to the set update policy, the data in memory is brushed back to disk. Undo Log records a version of the recorded transaction before it starts and can be used for rollbacks that occur after a transaction fails. Redo Log records changes on a specific data page and can only be used in the current Server, while Binlog can be understood to be used by other types of storage engines. This is also an important role of Binlog, that is, master-slave replication, and another role is data recovery.

As mentioned above, all changes to the database are recorded in Binlog, and there are three formats for logging. They are Statement, Row and MixedLevel.

Statement records all the SQL that will modify the data, which only records the SQL, and does not need to record all the rows affected by this SQL, reducing log volume and improving performance. However, because only recording the execution statement, there is no guarantee that it can be executed correctly on the Slave node, so we need to record some additional context information. Row only keeps the modified record. Compared with Statement, which only records the execution of SQL, Row will generate a large number of logs. But Row doesn't have to record context information, it just needs to focus on what it's changed to. MixedLevel is a mixture of Statement and Row. Which kind of log to use needs to be decided according to the actual situation. For example, a UPDATE statement updates a lot of data, using Statement will save more space, but relatively, Row will be more reliable.

The difference between InnoDB and MyISAM

Since MyISAM is not commonly used, I'm not going to delve into some of its underlying principles and implementations. Let's simply compare the difference between the two storage engines here. Let's describe it bit by bit.

Transaction InnoDB supports transaction, rollback, transaction security, and crash recovery. MyISAM does not support it, but the query speed is faster than InnoDB primary key InnoDB stipulates that if no primary key is set, a 6-byte primary key is automatically generated, while MyISAM allows the absence of any index and primary key. The index is the address of the row foreign key InnoDB supports foreign keys, while MyISAM does not support table locks InnoDB supports row locks and table locks, while MyISAM only supports table locks full-text indexing InnoDB does not support full-text indexing, but you can use plug-ins to implement the corresponding functions While MyISAM itself supports full-text index rows, when InnoDB gets the number of rows, it needs to scan the table. On the other hand, MyISAM saves the total number of rows in the current table, which can be read directly. So, to sum up, MyISAM is only suitable for scenarios where queries are larger than updates. If your system queries account for the vast majority of cases (such as reporting systems), you can use MyISAM to store them. Otherwise, it is recommended to use InnoDB.

The above is all the content of this article "what is the underlying principle of InnoDB?" Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report