Relearning Mysql Database 2: "simple" MySQL and InnoDB 07/11 Update SLTechnology News&Howtos

Relearning Mysql Database 2: "simple" MySQL and InnoDB

2025-07-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

This article is from the Internet.

This series of articles will be sorted out in my Java interview Guide warehouse on GitHub. Please check out more wonderful content in my warehouse.

Https://github.com/h3pl/Java-Tutorial

Have some trouble with Star if you like.

The article was first posted on my personal blog:

Www.how2playlife.com

This article is one of the "re-learning MySQL database" of Wechat official account [Java technology jianghu]. Part of this article comes from the Internet. In order to explain the topic of this article clearly and thoroughly, it also integrates a lot of technical blog content that I think is good. I quote some good blog articles among them. If there is any infringement, please contact the author.

This series of blog posts will show you how to start to advanced, from the basic usage of sql, from MySQL execution engine to index, transaction and other knowledge, to learn the implementation principles of MySQL-related technologies step by step, to better understand how to optimize sql based on this knowledge, to reduce SQL execution time, to analyze SQL performance through execution plans, and then to master-slave replication and master-slave deployment of MySQL. So that you can have a more complete understanding of the whole MySQL technical system and form your own knowledge framework.

If you have any suggestions or questions about this series of articles, you can also follow the official account [Java Technology jianghu] to contact the author. You are welcome to participate in the creation and revision of this series of blog posts.

As a developer, he will inevitably come into contact with the database in his daily work, whether it is file-based sqlite or MySQL and PostgreSQL, which are widely used in engineering, but he has not had a very clear and systematic understanding of the database, so he has read several books related to the database and read the official documents of MySQL in the last two months. I hope it will be helpful for those who understand the database but do not understand the database.

In this paper, the introduction and research of the database are carried out on MySQL, if the content or implementation of other databases are involved, it will be pointed out separately in the article.

Definition of database

At the beginning, many developers have a vague understanding of the database, thinking that the database is a collection of data, but it is actually much more complicated. There are two words in the database field that are very easy to confuse, that is, the database and the instance:

Database: a collection of physical file systems or other forms of file types; example: MySQL database consists of background threads and a shared memory area

The definition of databases and instances comes from the inside of MySQL technology: the InnoDB storage engine, which can be read by readers who want to know about the InnoDB storage engine.

Database and instance

In MySQL, instances and databases often correspond to each other one by one, and we can't operate the database directly, but we have to operate the database file through the database instance, which can be understood as a special interface provided by the database for the upper layer for operation.

On Unix, starting a MySQL instance often produces two processes, mysqld is the real database service daemon, and mysqld_safe is a control program used to check and set mysqld startup, which is responsible for monitoring the execution of MySQL processes. When an error occurs in mysqld, mysqld_safe will check its status and restart under appropriate conditions.

Architecture of MySQL

It has been more than 20 years since the first version of MySQL was released. In so many years of development and evolution, the architecture of the whole application has become more and more complex:

The top layer for connection and threading is not invented by MySQL, and many services have similar components; the second layer contains most of the core services of MySQL, including parsing, analysis, optimization and caching of SQL, where stored procedures, triggers and views are implemented The third layer is the storage engine in MySQL which is really responsible for data storage and extraction, such as InnoDB, MyISAM and so on. The introduction of storage engine in this paper is the analysis of InnoDB implementation.

Storage of data

Throughout the database architecture, we can use different storage engines to store data, and most storage engines store data in binary form; this section describes how data is stored in InnoDB.

In the InnoDB storage engine, all data is logically stored in the table space. The table space (tablespace) is the highest storage logical unit in the storage engine, and the lower part of the table space includes segments (segment), extent (area), and page (page):

All tablespaces of the same database instance have the same page size; by default, the page size in the tablespace is 16KB. Of course, you can also change the default size by changing the innodb_page_size option. It is important to note that different page sizes will eventually lead to different extents:

As you can see from the figure, in the InnoDB storage engine, the minimum size of an extent is 1MB, and the number of pages is at least 64.

How to store tables

When MySQL uses InnoDB to store tables, information such as table definitions and data indexes are stored separately, with the former in .frm files and the latter in .ibd files, which are described in this section.

.frm file

No matter which storage engine is selected in MySQL, all MySQL tables create a .frm file on the hard disk to describe the format or definition of the table; the format of the .frm file is the same on different platforms.

CREATE TABLE test_frm (column1 CHAR (5), column2 INTEGER)

When we use the above code to create a table, we generate a test_frm.frm file in the datadir folder on disk that contains information about the table structure:

The article 11.1 MySQL .frm File Format in the official MySQL documentation has a very detailed description of the binary content in the .frm file format, so I won't cover it here.

.ibd file

The file used to store data in InnoDB has two parts, one is the system tablespace file, including ibdata1, ibdata2 and other files, in which the InnoDB system information and user database table data and indexes are stored, which are common to all tables.

When the innodb_file_per_table option is turned on, the .ibd file is the tablespace unique to each table, and the file stores the data for the current table and related index data.

How to store records

Like most existing storage engines, InnoDB uses pages as the minimum unit for disk management; data in the InnoDB storage engine is stored by row, and each 16KB-sized page can hold 2-200 rows of records.

When InnoDB stores data, it can be stored in different row formats; MySQL version 5.7 supports row storage in the following formats:

Antelope is the initial support for InnoDB file format, it contains two line formats Compact and Redundant, it does not have a name at the beginning; the name of Antelope is after the emergence of the new file format Barracuda, the emergence of Barracuda introduced two new line formats Compressed and Dynamic;InnoDB for the file format will be compatible forward, and the official document will appear after the new file format pre-defined name: Cheetah, Dragon, Elk and so on.

The two row record formats, Compact and Redundant, are stored on disk as follows:

The biggest difference between the Compact and Redundant formats is the first part of the record format; in Compact, the first part of the row record stores the length of the column in a row of data in reverse order (Length), while what is stored in Redundant is the offset of each column (Offset). Overall, the Compact row record format can reduce storage space by 20% compared to the Redundant format.

Row overflow data

When InnoDB uses Compact or Redundant format to store very long VARCHAR or BLOB large objects, we do not store all the content directly in the data page node, but store the first 768 bytes of the row data in the data page, and then point to the overflow page by offset.

But when we use the new row record format Compressed or Dynamic, only 20-byte pointers are stored in the row records, and the actual data is stored in the overflow page.

Of course, in actual storage, TEXT and BLOB columns of different lengths may be optimized, but this is not the focus of this article.

To learn more about the data formats recorded in the InnoDB storage engine, read InnoDB Record Structure

Data page structure

The page is the smallest disk unit managed by the InnoDB storage engine, and the B-Tree node is the page that actually stores the data in the table. Here we will describe how the page organizes and stores records. First, an InnoDB page has the following seven parts:

Each page contains two pairs of header/trailer: the internal Page Header/Page Directory is concerned with the page's status information, while the Fil Header/Fil Trailer is concerned with recording the page's header information.

Between the head and tail of the page is the user record and the free space. Each data page contains two virtual records, Infimum and Supremum (which can be understood as placeholders). The Infimum record is a value smaller than any primary key value on the page, and Supremum is the maximum value on the page:

User Records is the part of the page that is really used to store row records, and Free Space is the free space. It is a linked list data structure. In order to ensure the efficiency of insertion and deletion, the whole page will not sort all records according to the primary key order. It will automatically look for blank nodes from the left to the right to insert, and the row records are not in order on the physical storage. The order between them is controlled by the pointer next_record.

When looking for the corresponding record, the B+ tree does not find the corresponding row record directly from the tree. It can only get the page where the record is located, load the whole page into memory, and then retrieve the corresponding record through the sparse index and n_owned and next_record attributes stored in Page Directory. However, because this operation is carried out in memory, the time-consuming of this part of the search is usually ignored.

The storage of data in the InnoDB storage engine is a very complex topic, and this section only analyzes and introduces the storage of tables, row records and pages. Although the author believes that this knowledge is enough for most developers, it takes a lot of effort and practice to really digest this part of the content.

Indexes

Index is a very important concept in the database, it is a secret weapon that the storage engine can locate records quickly, and plays a very important role in improving the performance of the database and reducing the burden of the database server. index optimization is the most effective means to optimize query performance, which can easily improve query performance by several orders of magnitude.

Data structure of index

In the previous section, we talked about the storage of row records and pages, where we will look at how data is stored in InnoDB at a higher level The InnoDB storage engine uses B+ tree to build indexes in most cases, which is the most commonly used and effective index in relational databases, but the B+ tree index can not find the specific value corresponding to a given key, it can only find the page corresponding to the data row, and then, as mentioned in the previous section, the database reads the whole page into memory and looks for specific data rows in memory.

The B + tree is a balanced tree, it takes exactly the same time to find any node, and the number of comparisons is the height of the B + tree; here, we will not analyze or implement a B + tree in depth, just give a brief introduction to its characteristics.

Clustered index and secondary index

The B+ tree index in the database can be divided into clustered index (clustered index) and secondary index (secondary index). The biggest difference between them is that the clustered index stores all the information of a row record, while the secondary index contains only the index column and a "bookmark" used to find the corresponding row record.

Clustered index

The tables in the InnoDB storage engine are organized by indexes, that is, they are stored in the order of keys; clustered indexes build a B + tree in the order of the primary keys in the table, and store the row record data in the table in the leaf node.

CREATE TABLE users (id INT NOT NULL, first_name VARCHAR (20) NOT NULL, last_name VARCHAR (20) NOT NULL, age INT NOT NULL, PRIMARY KEY (id), KEY (last_name, first_name, age) KEY (first_name))

If you use the SQL above to create a table in the database, the B+ tree uses id as the key for the index and stores all the information in a record in the leaf node.

The description of the B+ tree in the figure is somewhat different from the data structure in the real Btree, but the main meaning here is that the whole row record is saved in the clustered index leaf node, not part of it.

Clustered indexes are closely related to the physical storage of tables. All normal tables should have one and only one clustered index (in most cases, primary keys). All row record data in the table are stored in the order of the clustered index.

When we use the clustered index to retrieve the data in the table, we can directly get the page where the whole row record data corresponding to the clustered index is located, and there is no need for a second operation.

Auxiliary index

The database divides all nonclustered indexes into secondary indexes, but this concept does not help us to understand the secondary index; the secondary index is also implemented through the B+ tree, but its leaf node does not contain all the data of the row record, but only contains all the keys in the index and a "bookmark" used to find the corresponding row record, which is the primary key of the current record in InnoDB.

The existence of the secondary index does not affect the clustered index, because the B + tree formed by the clustered index is the actual form of data storage, and the secondary index is only used to accelerate the lookup of the data. so there are often multiple secondary indexes on a table to improve the performance of the database.

A table must contain a B + tree composed of a clustered index and a B + tree composed of several secondary indexes.

If there is a secondary index (first_name, age) in the table users, then the B + tree formed by it is roughly like this. The data in the table is sorted in alphabetical order (first_name, age). When the primary key is found, the whole row record is obtained through the clustered index.

The figure above shows a process of finding a table record using a secondary index: finding the corresponding primary key through the secondary index, and finally using the primary key to get the corresponding row record in the clustered index, which is also the usual way to find row records.

Design of index

The design of index is not only a very important content, but also a very complex content. The design and creation of the index is very important to improve the query performance of the database, but this is not what this article wants to introduce. You can read the book on index design and optimization of the database. The book provides a very scientific and reasonable method to help us build the most suitable index in the database. Of course, the author may also briefly introduce and analyze the design of the index in later articles.

Lock

We all know that the types of locks are generally divided into optimistic locks and pessimistic locks. Pessimistic locks are used in the InnoDB storage engine, and they can also be divided into row locks and table locks according to the granularity of locks.

Concurrency control mechanism

Optimistic lock and pessimistic lock are both mechanisms of concurrency control, and they are essentially different in principle.

Optimistic lock is a kind of thought, in fact, it is not a real "lock". It will try to modify the resource first, judge whether the resource has changed when writing back, and write it back if it has not changed, otherwise it will retry. In fact, the database was not locked during the whole execution process. A pessimistic lock is a real lock, which locks the resource before it is acquired, ensuring that only a limited number of threads can access the resource at the same time, and that other operations that try to obtain the resource will enter a waiting state. it is not until the thread completes the operation on the resource and releases the lock that other threads can re-operate the resource.

Although optimistic lock and pessimistic lock are not the same thing in essence, one is an idea, the other is a real lock, but they are both a concurrency control mechanism.

Optimistic locks do not have the problem of deadlock, but due to verification after update, pessimistic locks are more recommended when conflict frequency and retry cost are high. however, using optimistic locks can better solve the problem when very high response speed and large concurrency are needed, and serious performance problems may occur when pessimistic locks are used. When choosing the concurrency control mechanism, we need to comprehensively consider the above four aspects (conflict frequency, retry cost, response speed and concurrency).

Types of locks

In fact, there are only two kinds of operations on data, namely, read and write, and the database will also use different locks for these two operations when implementing locks; InnoDB implements standard row-level locks, that is, shared locks (Shared Lock) and mutexes (Exclusive Lock); the functions of shared locks and mutexes are very easy to understand:

Shared lock (read lock): allows a transaction to read a row of data; mutex (write lock): allows a transaction to delete or update a row of data

Their names also imply another feature: shared locks are compatible, while mutexes are not compatible with any other lock:

If you think about their use a little bit, you can figure out why they are designed this way, because the shared lock represents the read operation and the mutex represents the write operation, so we can read in parallel in the database, but can only write serially. Only in this way can we ensure that thread competition will not occur and achieve thread safety.

Lock granularity

Both shared locks and mutex locks only lock a data row. InnoDB supports multiple granularity locks, that is, row locks and table locks. In order to support multi-granularity locking, InnoDB storage engine introduces Intention Lock, which is a table-level lock.

Similar to the two types of locks mentioned in the previous section, there are also two types of intention locks:

Intention shared lock: if a transaction wants to acquire the shared lock of some records in the table, it needs to add the intention shared lock on the table; intention mutex: if the transaction wants to acquire the mutex of some records in the table, you need to add an intentional mutex to the table first.

With the addition of intention locks, the compatibility matrix between lock types becomes more complex:

Intention locks do not actually block any requests other than full table scans, their main purpose is to indicate whether someone is requesting to lock a row of data in the table.

Some people may not fully understand the purpose of the intention lock. We can give an example here: if there is no intention lock, when someone has used a row lock to modify a row in the table, if another request is to modify the whole table, then it is necessary to scan whether all rows are locked. In this case, the efficiency is very low. However, after the introduction of the intention lock, when someone uses a row lock to modify a row in the table, it will first add the intention mutex (IX) for the table, and then add the mutex (X) for the row record. At this time, if someone tries to modify the whole table, there is no need to determine whether every row of data in the table is locked, just wait for the intention mutex to be released.

Algorithm of lock

So far, we have some understanding of the granularity of locks in InnoDB, and it is clear that different locks will be acquired when reading and writing to the database. In this section, we will describe how locks are added to the corresponding data rows. We will introduce the algorithms of three kinds of locks: Record Lock, Gap Lock and Next-Key Lock.

Record Lock

A record lock (Record Lock) is a lock added to an index record. Suppose we have the following table users:

CREATE TABLE users (id INT NOT NULL AUTO_INCREMENT, last_name VARCHAR (255) NOT NULL, first_name VARCHAR (255), age INT, PRIMARY KEY (id), KEY (last_name), KEY (age))

If we use id or last_name as the filter condition of the WHERE statement in SQL, then InnoDB can find row records and add indexes through the B+ tree established by the index, but if we use first_name as the filter condition, because InnoDB does not know the specific location of the record to be modified and cannot judge in advance which record will be modified, the whole table will be locked.

Gap Lock

Record locks are the most common locks in the storage engine. In addition to record locks, there is also a Gap Lock in InnoDB, which is a lock on a contiguous area of an index record; when a SQL statement like SELECT * FROM users WHERE id BETWEEN 10 AND 20 FOR UPDATE; is used, it prevents other transactions from inserting records with id = 15 into the table because the entire range is locked by gap locks.

Gap locks are performance and concurrency tradeoffs made by the storage engine and are used only for certain transaction isolation levels.

Although gap locks are also divided into shared locks and mutexes, they are not mutually exclusive, that is, different transactions can hold a shared lock and mutex of the same scope at the same time. The only thing it prevents is that other transactions add new records to this scope.

Next-Key Lock

The Next-Key lock is slightly more complex than the first two. It is a combination of the record lock and the gap lock before the record, and has the following records in the users table:

+-|-+ | id | last_name | first_name | age | |-| 4 | stark | tony | 21 | | 1 | tom | hiddleston | 30 | | 3 | morgan | freeman | 40 | | 5 | jeff | dean | 50 | | 2 | donald | trump | 80 | +-|-+

If Next-Key locks are used, Next-Key locks can lock the following ranges when needed:

(- ∞, 21] (21, 30] (30, 40] (40, 50] (50, 80] (80, ∞)

Since it is called a Next-Key lock, it should lock the current value and the following range, but in fact it is not. The Next-Key lock locks the current value and the previous range.

When we update a record, for example, SELECT * FROM users WHERE age = 30 FOR UPDATE;,InnoDB will not only add a Next-Key lock on the range (21,30], but also a gap lock on the range behind the record (30,40), so all records inserted in the (21,40] range will be locked.

The role of Next-Key locks is actually to solve the problem of phantom reading, which we will cover in detail in the next section when we talk about transactions.

Occurrence of deadlock

Since locks implemented in InnoDB are pessimistic, different transactions may wait for each other to release locks, resulting in deadlocks, resulting in transaction errors; the problem of creating deadlocks in MySQL is actually very easy:

Both sessions hold a lock, and deadlocks occur when trying to acquire each other's locks, but MySQL can also detect problems in time when deadlocks occur and ensure that one of the transactions works properly, which is also good news for us.

Transaction and isolation level

After introducing the lock, let's talk about a very important concept in the database-transaction. I believe that as long as a qualified software engineer knows something about the characteristics of the transaction, what is often mentioned is the atomicity of the transaction. in the data submission work, either ensure that all changes can be committed, or all changes will be rolled back.

But transactions also follow the four characteristics of ACID, including atomicity: Atomicity, Consistency, Isolation and Durability; this article will not introduce all of these four features, I believe you can easily get the concepts about them through Google and database-related books, the last part of this article is to introduce the four isolation levels of transactions.

Several isolation levels

The isolation of transactions is one of the bases for a database to process data, and the isolation level is actually a configuration item that is provided to the user to make choices and tradeoffs between performance and reliability.

The ISO and ANIS SQL standards establish four transaction isolation levels, while InnoDB follows the four isolation levels in the SQL:1992 standard: READ UNCOMMITED, READ COMMITED, REPEATABLE READ, and SERIALIZABLE;. The isolation level for each transaction actually solves one more problem than the previous level:

RAED UNCOMMITED: use query statements without locking, and may read uncommitted rows (Dirty Read); READ COMMITED: only add record locks to records, not gap locks between records, so new records are allowed to be inserted near locked records, so when you use query statements many times, you may get different results (Non-Repeatable Read) REPEATABLE READ: reading the same range of data multiple times will return a snapshot of the first query, and will not return different data rows, but Phantom Read may occur. SERIALIZABLE:InnoDB implicitly adds a shared lock to all the query statements, which solves the problem of phantom reading.

The default transaction isolation level in MySQL is REPEATABLE READ, but it can also solve the problem of phantom reading to some extent through Next-Key locks.

Next, we create the following table in the database and use an example to show what happens under different transaction isolation levels:

CREATE TABLE test (id INT NOT NULL, UNIQUE (id))

In one transaction, uncommitted data from other transactions is read.

When the isolation level of the transaction is READ UNCOMMITED, the uncommitted data we inserted in SESSION 2 is accessible in SESSION 1.

Non-repeatable

In a transaction, the same row of records are accessed twice and get different results.

When the isolation level of the transaction is READ COMMITED, although the problem of dirty reading is solved, if a row of data is queried in SESSION 1 first, and then the same row of data is modified and committed in SESSION 2, at this time, if the same query statement is used again in SESSION 1, you will find that the results of the two queries are different.

The reason why it cannot be read repeatedly is that under the isolation level of READ COMMITED, the storage engine will not add a row lock when querying the record and lock the record with id = 3.

Illusory reading

In one transaction, when records in the same scope are read, other transactions add new records to that scope.

Re-open two sessions SESSION 1 and SESSION 2, in SESSION 1 we query the information of the whole table, but did not get any records; insert a piece of data into the table in SESSION 2 and submit; because of REPEATABLE READ, when we query the data of the whole table again, we still get an empty set, but there is an error in inserting the same data into the table.

This phenomenon is called phantom reading in the database, and although we get an empty collection using query statements, we get errors when inserting data, as if the previous query was an illusion.

In the standard transaction isolation level, phantom reading is resolved by the higher isolation level SERIALIZABLE, but it can also be resolved through the Next-Key lock provided by MySQL:

REPERATABLE READ and READ UNCOMMITED are actually contradictory. If you guarantee the former, you will not see the transactions that have been committed, and if you guarantee the latter, the results of the two queries will be different. MySQL provides us with a compromise way to access the submitted data in REPERATABLE READ mode, which itself can not solve the problem of illusion, but through the Next-Key lock mentioned earlier in the article.

Summary

Most of the content in this article comes from high-performance MySQL, MySQL technology insider: InnoDB storage engine, database index design and optimization, and MySQL official documentation.

Due to the limited space, we can only briefly introduce and summarize some important contents in the database, so it is inevitable that there are some omissions in the article. If you have any questions about the content of the article, you can comment and leave comments below the blog.

The difference and Application scenario between Innodb and Myisam engine

1. Difference:

(1) transaction processing:

MyISAM is non-transaction-safe, while InnoDB is transaction-safe (supports advanced processing such as transaction processing)

(2) the locking mechanism is different:

MyISAM is a table-level lock and InnoDB is a row-level lock

(3) select, update, insert, delete operations:

MyISAM: it's better to do a lot of SELECT,MyISAM.

InnoDB: if your data does a lot of INSERT or UPDATE, you should use the InnoDB table for performance reasons

(4) the number of rows in the query table is different:

MyISAM:select count () from table,MyISAM simply reads out the saved rows. Note that when the count () statement contains the where condition, the operation of the two tables is the same.

InnoDB: the specific number of rows of the table is not saved in InnoDB, that is, when select count (*) from table is executed, InnoDB scans the entire table to calculate how many rows there are.

(5) Foreign key support:

Mysiam tables do not support foreign keys, while InnoDB does

Why MyISAM is faster than Innodb.

When INNODB does SELECT, it has a lot more to maintain than the MYISAM engine.

1) data blocks, INNODB needs to be cached, MYISAM only caches index blocks, and there is a reduction in swapping in and out

2) innodb addressing should be mapped to blocks, and then to lines, MYISAM records the OFFSET of the file directly, and the location is faster than INNODB.

3) INNODB also needs to maintain MVCC consistency; although your scenario does not exist, it still needs to check and maintain

MVCC (Multi-Version Concurrency Control) multi-version concurrency control

3. Application scenario

MyISAM is suitable for: (1) doing a lot of count calculations; (2) inserting infrequently and querying very frequently; (3) no transactions.

InnoDB is suitable for: (1) high reliability requirements, or transactions; (2) table updates and queries are quite frequent, and the chances of row locking are high.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.