In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly introduces "what is the knowledge of mysql architecture and InnoDB storage engine". In daily operation, I believe that many people have doubts about the knowledge of mysql architecture and InnoDB storage engine. Xiaobian consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful to answer the questions of "what is the knowledge of mysql architecture and InnoDB storage engine?" Next, please follow the editor to study!
MySQL basic architecture diagram
Generally speaking, MySQL can be divided into two parts: Server layer and storage engine layer.
The Server layer includes connectors, query caching, parsers, optimizers, executors, etc., covering most of MySQL's core service functions, as well as all built-in functions (such as date, time, math, and encryption functions, etc.). All cross-storage engine functions are implemented in this layer, such as stored procedures, triggers, views, and so on.
Connector
Connectors are the ones you use when you connect to the database and are responsible for establishing connections with clients, obtaining permissions, maintaining and managing connections.
Command: mysql-h$ip-P$port-u$user-p, enter the password after enter, or enter the password after-p, but there is a risk of password disclosure.
Show processlist, you can check the connection. A Sleep in the Command column indicates that the connection is idle.
The idle connection is disconnected by default for 8 hours and can be configured by the wait_timeout parameter.
In the database, a persistent connection means that after a successful connection, the client always uses the same connection if there are persistent requests from the client. A short connection means that the connection is disconnected after a few queries are executed, and a new one is established next time.
Since establishing a connection consumes more resources, it is recommended to use persistent connections as far as possible, but after using persistent connections, the memory consumed by MySQL increases rapidly, because the memory temporarily used by MySQL during execution is managed in the connection object. These resources are not released until the connection is disconnected. Therefore, if the long connection accumulates, it may lead to too much memory consumption and be forcibly killed by the system (OOM). From the phenomenon, it is that MySQL restarts abnormally.
Solution:
Disconnect the long connection regularly. Use it for a period of time, or after it is judged in the program that a large memory-consuming query has been executed, disconnect, and then query and reconnect.
If you are using MySQL version 5.7 or later, you can reinitialize the connection resources by executing mysql_reset_connection after each large operation. This process does not require reconnection and redo permission verification, but restores the connection to the state it was when it was just created.
Query cache
The query cache caches the previously executed statements and their results in memory in the form of key-value pairs. Key is the statement of the query and value is the result of the query. If your query can find key directly in this cache, then the value will be returned directly to the client.
The query cache is removed when it is stored in MYSQL8, because the query cache expires frequently and the hit rate is low.
Analyzer
The analyzer will first do a "lexical analysis" to identify what the strings are and what they represent. Then you need to do a "parsing" to determine whether the SQL statement you typed satisfies the MySQL syntax.
Optimizer
Actuator
The storage engine layer is responsible for data storage and extraction. Its architecture mode is plug-in, supporting multiple storage engines such as InnoDB, MyISAM, Memory and so on. The most commonly used storage engine today is InnoDB, which has been the default storage engine since MySQL version 5.5.5.
A Select statement execution flow
The above figure takes the InnoDB storage engine as an example, and the processing process is as follows:
The user sends a request to tomcat, establishes a connection through the tomcat link pool and the mysql connection pool, and then sends the SQL statement to MySQL through the connection
MySQL has a separate listening thread that reads the request data and gets the SQL statement requested in the connection
Send the acquired SQL data to the SQL interface to execute
The SQL interface sends SQL to the SQL parser for parsing
Send the parsed SQL to the query optimizer, find the best query path strength, and then send it to the executor.
According to the optimized execution scheme, the executor calls the interface of the storage engine to execute in a certain order and steps.
For example, an executor might first call an interface of the storage engine to get the first row of data in the "users" table, and then determine whether the value of the "id" field of this data is equal to a value we expect, and if not, continue to call the interface of the storage engine to get the next row of data in the "users" table. Based on the above idea, the executor will go according to a set of execution plan generated by our optimizer, and then keep calling various interfaces of the storage engine to complete the execution plan of the SQL statement, which is roughly constantly updating or extracting some data.
Several issues are involved here:
What on earth is a MySQL driver?
Take java as an example. If we want to access a MySQL database in a Java system, we must add a MySQL driver to the dependency of the system, for example, in Maven.
Mysql mysql-connector-java 5.1.46
So what exactly is this MySQL driver? In fact, L driver will establish a network connection with the database at the bottom, there is a network connection, and then can send a request to the database server! Let the system written in the language access the database through MySQL driver, as shown in the following figure
What exactly is database connection pooling for?
Suppose that developing a web service with java is deployed on tomcat, and tomcat can process requests concurrently with multiple threads, so first of all, it is impossible to create only one database connection (how inefficient it is for multiple requests to grab a connection).
Second, what if each request creates a database connection? This is also very bad, because each time to establish a database connection is very time-consuming, it is not easy to establish a connection, execute the SQL statement, and destroy the database connection to destroy, frequently create and destroy performance problems.
So generally use database connection pool, that is, to maintain multiple database connections in a pool, so that multiple threads use different database connections to execute SQL statements, and then after executing SQL statements, do not destroy the database connection, but put the connection back into the pool, and you can continue to use it later. Based on such a database connection pool mechanism, we can solve the problem that multiple threads concurrently use multiple database connections to execute SQL statements, and avoid the problem of destroying database connections after they are used up.
What is the connection pool of the MySQL database used for?
The function of MySQL database connection pool is the same as that of java application connection pool, which plays the role of multiplexing connections.
InnoDB storage engine
A brief Analysis of InnoDB Architecture
As can be seen from the figure, the InnoDB storage engine consists of memory pools, background threads and disk files.
Here's another picture that highlights the point:
InnoDB storage engine part 1: memory structure
Buffer Pool buffer pool
The InnoDB storage engine is based on disk storage and manages the records as pages, but because of the gap between CPU speed and disk speed, disk-based database systems usually use buffer pool records to improve the overall performance of the database.
Read in the database, put the page read from the disk in the buffer pool, and the next time you read the same page, first determine whether the page is in the buffer pool. If the page is hit in the buffer pool, read the page directly, otherwise read the page on disk.
For the modification of pages in the database, the pages in the buffer pool are first modified, and then refreshed to disk at a certain frequency. The refresh of pages from the buffer pool to disk is not triggered every time the page is updated, but is flushed back to disk through a mechanism called CheckPoint. Therefore, the size of the buffer pool directly affects the overall performance of the database, can be set by the configuration parameter innodb_buffer_pool_size, the buffer pool is 128MB by default, or a bit small, if your database is a 16-core 32G machine, then you can allocate a 2GB memory to Buffer Pool.
Because the buffer pool is not infinite, as the data pages on the disk are constantly loaded into the buffer pool, the buffer pool will always be used up. At this time, only some cache pages can be eliminated. The elimination method uses the least recently used algorithm (LRU). Specifically, a new LRU linked list is introduced. Through this LRU linked list, you can know which cache pages are the least recently used. So when you need to free up a cache page to flush into disk, you can choose which cache page is the least used in the LRU linked list.
The data page types cached in the buffer pool are: index page, data page, undo page, insert buffer, adaptive hash index, lock information stored by InnoDB and data dictionary information.
Data page and index page
Page is the most basic structure of Innodb storage and the smallest unit of Innodb disk management. All the content related to the database is stored in the Page structure. There are several types of Page, and data pages and index pages are the two most important types.
Insert buffer (Insert Buffer)
When inserting on the InnoDB engine, it is generally necessary to insert according to the primary key order in order to achieve higher insertion performance. When there is a non-clustered and non-unique index in a table, when inserting, the data pages are stored in the order of the primary key, but the insertion of the leaf nodes of the non-clustered index is no longer sequential. at this time, discrete access to non-clustered index pages is needed, and the performance of the insert operation is degraded due to the existence of random reads.
So the InnoDB storage engine pioneered the design of Insert Buffer, for nonclustered index insert or update operation, not every time directly inserted into the index page, but first to determine whether the inserted nonclustered index page is in the buffer pool, if yes, then directly insert; if not, first put into an Insert Buffer object, like cheating. The nonclustered index of the database has been inserted into the leaf node, but it is not, it is just stored in another location. Then merge (merge) operations of Insert Buffer and secondary index page child nodes are performed at a certain frequency and situation, when multiple inserts can usually be merged into one operation (because in a single index page), which greatly improves the performance of nonclustered index inserts.
However, the use of Insert Buffer needs to meet the following two conditions:
The index is a secondary index (secondary index)
The index is not unique.
When the above two conditions are met, the InnoDB storage engine uses Insert Buffer, which improves the performance of the insert operation. However, consider a situation where the application does a lot of insertions, which involve non-unique nonclustered indexes, that is, using Insert Buffer. If the MySQL database is down at this time, a large number of Insert Buffer will not be merged into the actual nonclustered index.
So it can take a long time to recover, or even several hours in extreme cases. The secondary index cannot be unique because when the buffer is inserted, the database does not look for the index page to determine the uniqueness of the inserted record. If you look for it, it is certain that discrete reads will happen again, causing Insert Buffer to lose its meaning.
You can view information about insertion buffering by using the command SHOW ENGINE INNODB STATUS
Seg size shows that the current size of Insert Buffer is 11336 × 16KB, about 177MB; free list len represents the length of the free list; and size represents the number of pages that have been merged. Line 2 of the boldface section may be of real concern to the user because it shows an improvement in insertion performance. Inserts represents the number of records inserted; merged recs represents the number of inserted records merged; and merges represents the number of merges, that is, the number of pages actually read. Merges: the merged recs is about 1:3, which means that the insert buffer reduces the discrete IO logical requests for nonclustered index pages by about 2 to 3.
As mentioned earlier, one of the current problems with Insert Buffer is that in write-intensive situations, inserting buffers takes up too much buffer pool memory (innodb buffer pool), which by default can take up up to 1max 2 buffer pool memory. The following is the initialization of insert buffer in the source code of the InnoDB storage engine:
Change Buffer
InnoDB has introduced Change Buffer since version 1.0.x, which can be regarded as an upgraded version of Insert Buffer. The InnodB storage engine can buffer DML operations-INSERT, DELETE, and UPDATE. They are: Insert Buffer, Delete Buffer, Purge buffer, of course, the same as before Insert Buffer, and the objects applicable to Change Buffer are still non-unique secondary indexes.
UPDATE a record can be done in two processes:
Mark the record as deleted
Actually delete the record
So the Delete Buffer corresponds to the first procedure of the UPDATE operation, which marks the record as deleted. PurgeBuffer corresponds to the second process of the UPDATE operation, which is about to record the actual deletion. At the same time, the InnoDB storage engine provides the parameter innodb_change_buffering, which is used to turn on various Buffer options. The optional values for this parameter are: Inserts, deletes, purges, changes, all, none. Inserts, deletes, and purges are the three cases discussed earlier. Changes means to enable Inserts and deletes,all means to enable all, none means not to enable. The default value of this parameter is all.
Starting with the InnoDB1.2.x version, the maximum amount of memory used by Change Buffer can be controlled by the parameter innodb_change_buffer_max_size:
Mysql > show variables like 'innodb_change_buffer_max_size' +-- +-+ | Variable_name | Value | +-+-+ | innodb_change_buffer_max_size | 25 | +- -+-+ 1 row in set (0.05sec)
The innodb_change_buffer_max_size value defaults to 25, which indicates that a maximum of 1x4 buffer pool memory space is used.
It is important to note that the maximum valid value of this parameter is 50. In the MySQL5.5 version, you can observe something similar to the following through the command SHOW ENGINE INNODB STATUS:
You can see that merged operations and discarded operation are shown here, and the number of times each operation in Change Buffer is shown below. Insert means Insert Buffer; delete mark means Delete Buffer; delete means Purge Buffer; discarded operations means that when merge occurs in Change Buffer, the table has been deleted, so there is no need to merge records into the secondary index.
Adaptive hash indexing
According to the frequency and pattern of access, InnoDB will establish a hash index for hot pages to improve query efficiency. The InnoDB storage engine monitors queries on each index page on the table, and builds a hash index if it is observed that the establishment of a hash index can lead to a speed increase, so it is called an adaptive hash index.
Adaptive hash indexing is built from the B+ tree pages of the buffer pool, so it is fast to build, and there is no need to hash the entire data table. It has a requirement that the continuous access mode for this page must be the same, that is, the conditions of its query must be exactly the same, and must be continuous.
Lock Information (lock info)
We all know that the InnoDB storage engine locks table data at the row level, but when InnoDB opens a table, it adds a corresponding object to the data dictionary.
Data dictionary
A collection of meta-information about data, library objects, table objects, and so on in a database. In MySQL, the content of data dictionary information includes table structure, database name or table name, data type of field, view, index, table field information, stored procedure, trigger and so on. MySQL INFORMATION_SCHEMA library provides metadata, statistical information, and access information about MySQL Server (for example, database name or table name, data type and access rights of fields, etc.). The information stored in this library can also be called MySQL's data dictionary.
Pre-reading mechanism
MySQL's read-ahead mechanism is that when you load a data page from disk, it may load other data pages adjacent to that data page into the cache!
For example, suppose there are two free cache pages now, and then when loading a data page, even one of his adjacent data pages is loaded into the cache, and each data page is put into a free cache page!
Under what circumstances will the pre-reading mechanism of MySQL be triggered?
One parameter is innodb_read_ahead_threshold, and its default value is 56, which means that if multiple data pages in an area are accessed sequentially, and the number of data pages accessed exceeds this threshold, the pre-read mechanism will be triggered to load all data pages in the next adjacent area into the cache.
If 13 consecutive data pages in an area are cached in Buffer Pool, and these data pages are accessed frequently, the pre-read mechanism will be directly triggered to load all other data pages in this area into the cache. This mechanism is controlled by the parameter innodb_random_read_ahead. It defaults to OFF, that is, this rule is closed.
So by default, the first rule may trigger the read-ahead mechanism to load data pages from many adjacent regions into the cache at once.
The benefits of the pre-reading mechanism are to improve performance. Suppose you read data page 01 into the cache page, then it is possible to sequentially read data page 01 adjacent data page 02 to the cache page. At this time, is it possible to initiate a disk IO again when reading data page 02?
So in order to optimize performance, MySQL designed a pre-reading mechanism, that is, if in an area, you read a lot of data pages sequentially, for example, data pages 01 to 56 are read sequentially by you, MySQL will judge, you may then continue to read the following data pages sequentially. At this point, a large number of subsequent data pages (such as data pages 57 to 72) are read into Buffer Pool in advance.
Buffer pool memory management
Here you need to know three linked lists (Free List, Flush List, LRU List)
The data page and cache page on the Free List disk correspond to each other, both of which are 16KB, and one data page corresponds to one cache page. The database will design a free linked list for Buffer Pool, which is a two-way linked list data structure. In this free linked list, each node is the address of a description data block of an idle cache page, that is, as long as you have a cache page that is free, then his description data block will be put into the free linked list. At the beginning of the database startup, all the cache pages may be free, because there may be an empty database with no data, so the description data blocks of all cache pages will be put into the free linked list. In addition, the free linked list has a basic node, which refers to the head and tail nodes of the linked list, and how many nodes describe the data blocks in the linked list. That is, how many free cache pages are there.
Flush List is similar to Free List linked list. Flush linked list essentially uses two pointers in the description data block of the cache page to make the description data block of the modified cache page form a two-way linked list. Any modified cache page will add its description data block to the flush linked list. Flush means that these are dirty pages, and then the flush will be flushed to disk.
LRU List because the size of the buffer pool is certain, in other words, the free cache page data in the free linked list is certain, when you keep loading the data pages on the disk into the free cache pages, the free linked list keeps removing free cache pages, sooner or later, there will be no free cache pages in the free linked list, so you need to eliminate some cache pages, so who will be eliminated? This requires the use of cache hit rate, cache hits are commonly used, those that are not commonly used can be eliminated. So the LRU linked list is introduced to determine which cache pages are not commonly used.
What is the elimination strategy of LRU linked list?
Suppose we load a data page from disk to cache page, put the description data block of this cache page to the head of LRU linked list, then as long as there is data cache page, he will be in LRU, and recently loaded data cache page will be put into the head of LRU linked list, then add some cache page at the end, as long as there is a query, move it to the head, then the final tail needs to be eliminated.
But is this really all right?
In the first case, the pre-reading mechanism is destroyed.
Because the pre-read mechanism loads adjacent unaccessed data pages into the cache, only one cache page is actually accessed, and the other cache page loaded through the pre-read mechanism is actually not accessed, so both cache pages can be in front of the LRU linked list, as shown in the following figure
At this time, if there is no free cache page, then at this time to load a new data page, it is not necessary to take out the so-called "recently least used cache page" from the end of the LRU linked list, brush it into the disk, and then vacate a free cache page. This is obviously very unreasonable.
The second scenario may lead to the elimination of frequently accessed cache pages
The full table scan led him to load all the data pages in the table directly from disk to Buffer Pool. At this time, all the data pages of this table may be loaded into each cache page one by one! At this point, it is possible that a large list of cache pages in front of the LRU linked list are all cache pages loaded by a full table scan! What if the data in the table is hardly used after the full table scan? At this point, the tail of the LRU linked list may be full of cache pages that have been accessed frequently before! Then when you want to eliminate some cache pages to make room, you will eliminate the cache pages that have been accessed frequently at the end of the LRU linked list, leaving a large number of infrequently accessed cache pages loaded in the previous full table scan!
Optimizing LRU algorithm: designing LRU linked list based on the idea of separating hot and cold data
When designing the LRU linked list, MySQL actually adopts the idea of separating hot and cold data. The LRU linked list will be divided into two parts, one is hot data and the other is cold data. The proportion of hot and cold data is controlled by the innodb_old_blocks_pct parameter, which defaults to 37%, that is, 37% of cold data. When the data page is first loaded into the cache, the cache page is actually placed in the header of the linked list in the cold data area.
Then MySQL sets a rule, and he designs an innodb_old_blocks_time parameter with a default value of 1000, that is, 1000 milliseconds, that is, after a data page is loaded into the cache page, after 1 second, you access the cache page, it will be moved to the head of the linked list in the hot data area. Because suppose you load a data page into the cache, and then after 1s you also visit the cache page, indicating that you are likely to visit it frequently later, this time limit is 1s, so only after 1s you visit the cache page, he will put the cache page to the head of the linked list in the hot data area.
In this way, the pre-read and full table scan data will only be in the cold data header and will not go into the hot data area at the beginning.
Extreme optimization of LRU algorithm
The access rules of the hot data area of the LRU linked list are optimized, that is, only the cache page in the last 3x4 part of the hot data area is accessed, it will be moved to the head of the linked list. If you are accessing the cache page of the first 1A4 in the hot data area, it will not be moved to the head of the linked list.
For example, assuming that there are 100 cache pages in the linked list of the hot data area, then the first 25 cache pages will not be moved to the head of the linked list even if accessed. But for the bottom 75 cache pages, as long as he is accessed, he will move to the head of the linked list. In this way, he can reduce the movement of nodes in the linked list as much as possible.
Time for LRU linked list to eliminate cache pages
When MySQL executes CRUD, the first thing to do is to cache a large number of pages and corresponding linked lists. Then when the cache pages are full, we must find a way to brush some cache pages into the disk, then empty these cache pages, and then load the required data pages into the cache pages!
We already know that he eliminated cache pages according to the LRU linked list, so when did he flush the cache pages in the cold data area of the LRU linked list to disk? In fact, he has the following three opportunities:
Regularly flush some of the cache pages at the end of the LRU into the disk
The background thread runs a scheduled task, which periodically brushes some cache pages at the end of the cold data area of the LRU linked list into the disk, empties these cache pages, and adds them back to the free linked list.
Regularly flush some cache pages from the flush linked list to disk.
It is not enough to simply flush the cache pages from the cold data area of the LRU linked list to disk, because many cache pages in the hot data area of the linked list may also be frequently modified, will they never be flushed to disk?
So this background thread will also flush all the cache pages in the flush linked list into disk when MySQL is not too busy, so that the data you have modified will be brushed to disk sooner or later!
As long as a wave of cache pages in the flush list is flushed to disk, the cache pages will also be removed from the flush list and lru list and added to the free list!
So the overall effect is to keep loading data into the cache page, constantly querying and modifying the cache data, then the cache page in the free linked list is constantly decreasing, the cache page in the flush linked list is constantly increasing, and the cache page in the lru linked list is constantly increasing and moving.
On the other hand, your background thread is constantly flushing the cache pages of the cold data area of the lru linked list and the cache pages of the flush linked list into disk to empty the cache pages, and then the cache pages in the flush linked list and lru linked list are decreasing, and the cache pages in the free linked list are increasing.
Free linked list does not have free cache pages
If all free linked lists are used, if you want to load a data page from disk into a free cache page, a cache page will be found at the end of the cold data area of the LRU linked list. It must be the least frequently used cache page! Then brush it into the disk and empty it, and then load the data page into the free cache page!
To sum up, the use of the three linked lists, Buffer Pool is used, in fact, frequently load data pages from disk into his cache page, and then free linked list, flush linked list, lru linked list will be used at the same time, such as data loaded into a cache page, the free linked list will remove the cache page, and then the head of the cold data area of the lru linked list will be put into the cache page.
Then if you modify a cache page, the dirty page will be recorded in the flush linked list, and the lru linked list may move you from the cold data area to the head of the hot data area.
If you are querying a cache page, the cache page will be moved to the hot data area in the lru linked list, or possibly to the header in the hot data area.
Redo log Buffer redo log buffering
InnoDB has buffer pool (bp for short). Bp is the cache of database pages, any modification to InnoDB will first be carried out on the page of bp, and then such pages will be marked as dirty (dirty pages) and placed on a special flush list, and then these pages will be periodically written to disk (disk or ssd) by master thread or special dirty thread.
This advantage is to avoid a large number of random IO caused by each write operation, periodic brushing can merge multiple changes to the page into an IO operation, and asynchronous writes also reduce the access delay. However, if dirty page shuts down abnormally before server is brushed to disk, these modifications will be lost, if a write operation is in progress, or even make the database unavailable due to corrupted data files.
To avoid the above problems, Innodb writes all changes to the page to a special file and recovers from this file when the database is started, which is called redo log file. This technology delays the refresh of bp pages, thus improves the throughput of the database and effectively reduces the access delay.
The problems are the extra overhead of writing redo log operations (sequential IO, of course, very soon) and the time it takes to recover the operation when the database starts.
The redo log consists of two parts: redo log buffer and redo log file (described in the disk file section). Innodb is a storage engine that supports transactions. When a transaction commits, all logs of the transaction must be written to the redo log file, and the whole transaction operation is not completed until the commit operation of the transaction is completed. The fsync operation needs to be called every time the redo log buffer is written to the redo log file, because the redo log buffer only writes the content to the operating system's buffer system first, and does not ensure that it is written directly to disk, so a fsync operation must be performed. Therefore, the performance of the disk also determines the performance of the transaction commit to a certain extent (the redo log disk dropping mechanism is described later).
The InnoDB storage engine first puts the redo log information into the redo log buffer, and then flushes it to the redo log file at a certain frequency. Generally, the redo log buffer does not need to be set very large, because the redo log buffer is generally flushed to the log file every second. It can be controlled by the configuration parameter Innodb_log_buffer_size. The default is 8MB.
Double Write double write
If Insert Buffer brings performance improvements to the InnoDB storage engine, then Double wtite brings the reliability of data pages to the InnoDB storage engine.
The Page Size of InnoDB is generally 16KB, and its data check is also calculated for this 16KB. Writing data to disk is operated in units of Page. We know that since the file system is not atomic for a big data page (such as InnoDB's 16KB) in most cases, this means that if the server goes down, only partial writes may be done. 16K of data, write 4K, the system power outage os crash occurred, only part of the write is successful, in this case is the partial page write problem.
An experienced DBA might think that if a write failure occurs, the MySQL can recover based on the redo log. This is an approach, but it must be clearly recognized that physical changes to the page are recorded in redo log, such as offsets of800and write 'aaaa' records. If the page itself has been damaged, there is no point in redoing it. MySQL checks the checksum,checksum of the page during the recovery process is to check the last transaction number of the page. When the partial page write problem occurred, the page was corrupted and the transaction number in the page could not be found. In InnoDB's view, such a data page cannot be validated by checksum and cannot be recovered. Even if we force it to pass verification, we will not be able to recover from the crash, because some of the log types that currently exist in InnoDB, some of which are logical operations, are not idempotent.
To solve this problem, InnoDB implements double write buffer, which simply writes the data page to a separate physical file location (ibdata) before writing the data page, and then to the data page. In this way, in case of data page corruption during downtime restart, before applying redo log, you need to restore the page through a copy of the page, and then redo it with redo log, which is called double write. What doublewrite technology brings to innodb storage engine is the reliability of data pages. Let's analyze doublewrite technology.
As shown in the figure above, the Double Write consists of two parts, one is the in-memory double write buffer with the size of 2MB, and the other is the contiguous 128pages of shared tablespace on the physical disk, also the size of 2MB. When refreshing the dirty pages of the buffer pool, we do not write directly to the disk, but copy the dirty pages to this area of memory first through the memcpy function, and then write them sequentially to the physical disk of the shared tablespace twice through double write buffer, and then immediately call the fsync function to synchronize the disk to avoid the problems caused by the operating system buffered write. After the double write page is written, the pages in double wirite buffer are written to each tablespace file.
In this process, doublewrite is written sequentially with little overhead. After completing the doublewrite write, the doublewrite buffer is written to each tablespace file, which is a discrete write.
If the operating system crashes while writing a page to disk, during recovery, the InnoDB storage engine can find a copy of the page from the double write in the shared tablespace, copy it to the tablespace file, and apply the redo log.
InnoDB storage engine part 2: background threads
IO thread
In InnoDB, a large number of AIO (Async IO) are used to do read and write processing, which can greatly improve the performance of the database. Before InnoDB 1.0, there were four IO Thread, write,read,insert buffer and log thread, and later versions increased the number of read thread and write thread to four, a total of 10.
-read thread: responsible for the read operation, loading data from disk into the cache page page. 4
-write thread: responsible for the write operation, flushing the cache dirty pages to disk. 4
-log thread: responsible for flushing the contents of the log buffer to disk. 1
-insert buffer thread: responsible for flushing the write buffer to disk. 1
Purge thread
After the transaction commits, the undo logs it uses will no longer be needed, so Purge Thread will need to recycle the allocated undo pages. Show variables like'% innodb*purge*threads%'
Page Cleaner thread
The function is to refresh the dirty data to the disk, and the corresponding redo log can be overwritten after the dirty data is brushed, which can not only synchronize the data, but also achieve the purpose of redo log recycling. Write thread threading is invoked. Show variables like'% innodb*page*cleaners%'
InnoDB storage engine part 3: disk files
The main disk files of InnoDB are mainly divided into three blocks: the first is the system table space, the second is the user table space, and the third is the redo log file and archive file.
Files such as binaries (binlong) are maintained by the MySQL Server layer, so they are not included in the disk files of InnoDB.
System and user tablespaces
The system tablespace contains InnoDB data dictionaries (metadata and related objects) and storage areas for double write buffer, change buffer, and undo logs.
The system tablespace also contains table and index data created by any user in the system tablespace by default.
A system tablespace is a shared tablespace because it is shared by multiple tables.
The system tablespace consists of one or more data files. By default, a system data file named ibdata1 with an initial size of 10MB is created in the data directory of MySQL. Users can use innodb_data_file_path to configure the size and number of data files.
The format of innodb_data_file_path is as follows:
Innodb_data_file_path=datafile1 [, datafile2]...
Users can use multiple files to form a tablespace and define the properties of the file at the same time:
Innodb_data_file_path = / db/ibdata1:1000M;/dr2/db/ibdata2:1000M:autoextend
Here, the / db/ibdata1 and / dr2/db/ibdata2 files are made up of the system tablespace. If the two files are on different disks, the load on the disk may be evenly distributed, thus improving the overall performance of the database. The file names of both files are followed by attributes, indicating that the size of the file ibdata1 is 1000MB, the size of the file ibdata2 is 1000MB, and can automatically grow after running out of space.
After the innodb_data_file_path parameter is set, the data of all tables based on the InnoDB storage engine are recorded in the system tablespace, and if the parameter innodb_file_per_table is set, the user can generate a separate user space for each table based on the InnoDB storage engine.
The naming convention for user tablespaces is: table name .ibd. In this way, the user does not have to store all the data in the default system tablespace, but the user tablespace only stores the data, index, insert buffer BITMAP and other information of the table, and the rest of the information is stored in the default system tablespace.
The following figure shows how the InnoDB storage engine stores files, where frm files are table structure definition files that record table structure definitions for each table.
Redo log files (redo log file) and archive files
By default, there are two files named ib_logfile0 and ib_logfile1 under the data directory of the InnoDB storage engine, which is InnoDB's redo file (redo logfile), which records the transaction log for the InnoDB storage engine.
Redo log files come in handy when errors occur in InnoDB's data storage files. The InnoDB storage engine can use redo log files to restore the data to the correct state to ensure the correctness and integrity of the data.
Each InnoDB storage engine has at least 1 redo log file, and each filegroup has at least 2 redo log files, plus the default ib_logfile0 and ib_logfile1.
In order to achieve higher reliability, users can set up multiple mirror log groups and put different filegroups on different disks to improve the high availability of redo logs.
Each redo log file in the log group is of the same size and runs as a circular write. The InnoDB storage engine first writes redo log file 1, switches to redo log file 2 when the file is full, and then switches to redo log 1 when redo log file 2 is also full.
Users can use Innodb_log_file_size to set the size of the redo log file, which has a great impact on the performance of the InnoDB storage engine.
If the setting of the redo log file is too large, it may take a long time to recover when the data is lost; on the other hand, if the setting is too small, the redo log file is too small, which will cause dirty pages to be refreshed to disk frequently according to checkpoint check, resulting in performance jitter.
The downloading mechanism of redo log
InnoDB obeys WAL (write ahead redo log) and Force-log-at-commit rules for flushing data files and log files, both of which ensure the persistence of transactions. WAL requires that before data changes are written to disk, the log in memory must be written to disk first; Force-log-at-commit requires that when a transaction commits, all generated logs must be flushed to disk. If the database downtime occurs before the data in the buffer pool is refreshed to disk after the log refresh is successful, then the database can recover data from the log when it is restarted.
As shown in the figure above, when InnoDB changes data in the buffer pool, it will first write the relevant changes to the redo log buffer, and then write to disk on time (such as refresh per second) or when the transaction commits, which is in line with the Force-log-at-commit principle; when the redo log is written to disk, the change data in the buffer pool will be written to disk according to the checkpoint mechanism, which conforms to the WAL principle.
In the checkpoint timing mechanism, there is a judgment that the redo log file is full, so, as mentioned earlier, if the redo log file is too small and often full, it will frequently cause checkpoint to write the changed data to disk, resulting in performance jitter.
The file system of the operating system is cached, and when InnoDB writes data to disk, it may just write to the cache of the file system, and there is no real "safe".
The innodb_flush_log_at_trx_commit property of InnoDB controls the behavior of InnoDB each time a transaction commits. When the attribute value is 0, when the transaction commits, it does not write to the redo log, but waits for the main thread to write on time; when the attribute value is 1, the transaction commits, the redo log is written to the file system cache, and the fsync of the file system is called to actually write the data in the file system buffer to disk storage to ensure that there is no data loss. When the property value is 2, when the transaction commits, the log file is also written to the file system cache, but instead of calling fsync, the file system is left to determine when the cache is written to disk.
The flushing mechanism of the log is shown in the following figure:
Innodb_flush_log_at_commit is a basic parameter of InnoDB performance tuning, which involves the write efficiency and data security of InnoDB. When the parameter value is 0, the write efficiency is the highest, but the data security is the lowest; when the parameter value is 1, the write efficiency is the lowest, but the data security is the highest; when the parameter value is 2, both are medium level. It is generally recommended that the attribute value be set to 1 to achieve higher security, and only when it is set to 1 can the persistence of the transaction be guaranteed.
Use a UPDATE statement to learn about the InnoDB storage engine
With the above introduction to the architecture of the InnoDB storage engine, let's analyze the specific process of updating UPDATE data again.
Let's divide this picture into two parts, the upper part is the MySQL Server layer processing flow, and the lower part is the MySQL InnoDB storage engine processing flow.
MySQL Server layer processing flow
This part of the processing flow is not related to which storage engine, it is handled by the Server layer, and the specific steps are as follows:
Various actions of the user trigger the execution of the background sql, and establish a network connection with the database connection pool of the database server through the database connection pool included in the web project, such as dbcp, c3p0, druid, etc.
After listening to the request, the thread in the database connection pool responds the received sql statement to the query parser through the SQL interface. The query parser parses the sql to which fields of which table to query and what are the query conditions according to the syntax of sql.
Then, through the query optimizer processing, the optimal execution plan of the sq is selected.
Then the executor is responsible for calling a series of interfaces of the storage engine to execute the plan and complete the execution of the entire sql statement.
This part of the process is basically consistent with the analysis of an Select request processing flow analyzed above.
InnoDB Storage engine processing flow
The specific execution statement needs to be completed by the storage engine, as shown in the figure above:
Update this data of id=10 in the users table. If there is no such data in the buffer pool, you must first load the original data of the updated data from disk into the buffer pool.
At the same time, in order to ensure the security of concurrent update data, this data will be locked first to prevent other transactions from updating.
Then back up and write the values before the update to the undo log (to facilitate the rollback of old data when the transaction is rolled back). For example, the update statement stores the values before the updated fields.
Update the cached data in buffer pool to the latest data, then the data in memory is dirty (the data in memory is inconsistent with the data in disk)
This completes the execution process in the buffer pool (as shown in the figure above).
After updating the data in the buffer pool, you need to write the update information to the Redo Log blog in order, because the data in memory has been modified, but the data on disk has not been modified. At this time, if the machine where MySQL is located goes down, it will inevitably lead to the loss of modified data in memory. Redo log is to record what changes you have made to the data. For example, change the value of the name field to xxx for "id=10". This is a log that is used to recover your updated data in the event of a sudden downtime of MySQL. Note, however, that Redo Log has not yet been set to the log file at this time.
At this point, consider a question: what if the MySQL goes down when the transaction has not been committed?
We know that so far, we have modified the in-memory data, and then recorded the Redo Log Buffer log buffer. if MySQL crashes at this time, both memory data and Redo Log Buffer data will be lost, but the data loss does not matter at this time, because an update statement does not commit the transaction, it means that it has not been executed successfully. At this time, MySQL downtime leads to the loss of all data in memory, but you will find that The data on the disk remains the same.
Next, when the transaction is to be submitted, the redo log will be flushed from the redo log buffer to the disk file according to a certain policy, which is configured through innodb_flush_log_at_trx_commit.
Innodb_flush_log_at_trx_commit=0, which means that the data in the redo log buffer will not be brushed into the disk file. At this time, you may have committed the transaction, resulting in mysql downtime, and then all the data in memory is lost, so this method is not advisable.
Innodb_flush_log_at_trx_commit=1,redo log is flushed into the disk file from memory, and as long as the transaction is committed successfully, the redo log must be on disk, so if MySQL crashes at this time, you can recover the data according to the Redo Log log.
Innodb_flush_log_at_trx_commit=2, when committing a transaction, write the redo log to the os cache cache corresponding to the disk file instead of going directly to the disk file. It may take 1 second to write the data in os cache to the disk file.
When you commit a transaction, binlog,binlog is also written to different flushing strategies. There is a sync_binlog parameter that controls the flushing strategy of binlog. Its default value is 0. When you write binlog to disk, you actually enter the os cache memory cache instead of directly entering the disk file. In order to ensure that the data is not lost, we will configure the double-1 policy, and both Redo Log and Binlog will choose 1.
After the Binlog is set up, the name of the Binlog, the path information where it is located, and the commit mark are written to the Redo log in synchronization order. The meaning of this step is to keep the redo log log consistent with the binlog log. The commit flag is an important criterion for determining whether a transaction is successfully committed. For example, if MySQL collapses after the successful execution of step 5 or step 6, the transaction can be judged to be unsuccessful because there is no final transaction commit marked in the redo log. It will not be said that there is a log of this update in the redo log file, but there is no log of this update in the binlog log file, so there will be no data inconsistency.
After doing the previous work, the in-memory data has been modified, the transaction has been submitted, and the log has been set down, but the disk data has not been modified synchronously. There is an IO thread in the backend of the InnoDB storage engine. During the low peak of database compression, the data in the buffer pool updated by transactions but not written to disk (dirty data, because disk data and memory data are no longer available) will be brushed to disk to complete the persistence of transactions.
So the InnoDB processing write process can be represented by the following figure
At this point, the study on "what is the knowledge of mysql architecture and InnoDB storage engine" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.