What is the core knowledge of Mysql database? 07/15 Update SLTechnology News&Howtos

What is the core knowledge of Mysql database?

2025-07-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article mainly shows you "what are the core knowledge of Mysql database", which is easy to understand and well organized. I hope it can help you solve your doubts. Let me lead you to study and learn this article "what is the core knowledge of Mysql database?"

I. Mysql logical architecture

The Mysql logical architecture is mainly divided into three layers:

Layer 1: responsible for connection processing, authorization, security, etc.

1. Each client connection has a thread in the server process, and the server maintains a thread pool, so there is no need to create or destroy threads for each new connection.

2. When the client connects to the Mysql server, the server authenticates it, either by user name and password, or by SSL certificate

3. Once the client connects successfully, the server continues to verify that the client has permission to execute a particular query.

Layer 2: responsible for compiling and optimizing SQL

1. This layer includes query parsing, analysis, optimization, caching and all built-in functions

2. For SELECT statements, the server will check the query cache before parsing the query, and if the corresponding query results can be found in it, the query results will be returned directly without the need for query parsing and optimization.

3. All the functions across the storage engine are implemented in this layer: stored procedures, triggers, views.

Layer 3: storage engine

1. The storage engine is responsible for storing and extracting data in MySQL.

2. The storage engine communicates with the upper layer through API. These API shield the differences between different storage engines and make these differences transparent to the upper query process.

3. The storage engine will not parse the SQL, and different storage engines will not communicate with each other, but simply respond to the requests of the upper server.

Second, master-slave replication

The principle of master-slave replication, in short, is a three-step process:

1. There is a binlog binary file in the main database, which records all the additions, deletions and modifications of Sql statements. (binlog thread)

2. Copy the sql statement of the binlog file of the master database from the database. (Io thread)

3. Execute these sql statements again from the relay log redo log file in the database. (Sql execution thread)

Third, InnoDB file storage structure

Physically, InnoDB tables consist of shared tablespace files (ibdata1), exclusive tablespace files (ibd), table structure files (.frm), and log files (redo files, etc.).

IV. Table structure file

When any data table is established in MYSQL, there is a .frm file corresponding to the database directory of its data catalog. the .frm file is used to save the meta information of each data table, including the definition of the table structure, etc., and the .frm file has nothing to do with the database storage engine, that is, the data table of any storage engine must have .frm file.

V. Table space structure

1. Tablespace (tablespace)

All data is placed in the tablespace. If the innodb_file_per_table option is enabled, InnoDB opens up a tablespace for each table. However, it should be noted that the table space stores only data, indexes and insert buffered bitmap pages, while other data such as undo information, insert buffer index pages, system transaction information, and secondary write buffering will still be placed in the original shared tablespace.

If the shared tablespace does not shrink automatically after rollback, it will determine whether the space is needed (such as undo space), and if not, mark it as free space for the next undo.

2. Paragraph (segment)

The tablespace consists of segments, such as data segments, index segments, rollback segments, and so on.

3. Zone (extent)

An extent consists of consecutive pages, and in any case the size of the range is 1m. The InnoDB storage engine requests about 4-5 zones (4-5m) from disk at a time. By default, the page size is 16KB, which means that there are about 64 consecutive pages in an extent.

4. Page (page)

The smallest unit of InnoDB disk management. B-tree node = a physical Page (16K), the data is Page according to 16KB slice and numbered, the number can be mapped to the physical file offset (16K * N), and a bi-directional linked list is formed before and after the B+ tree leaf node. There are several types of Page, and data pages and index pages are the two most important types.

VI. Buffer pool

The InnoDB storage engine is based on disk storage and manages the records as pages, but because of the gap between CPU speed and disk speed, disk-based database systems usually use buffer pool records to improve the overall performance of the database.

To read in the database, first put the page read from the disk in the buffer pool, and the next time you read the same page, first determine whether the page is in the buffer pool. If the page is said to be hit in the buffer pool, the page is read directly. Otherwise, read the page on the disk.

7. Redo log

By default, there are two files under the database data folder, ib_logfile0/ib_logfile1, which is the redo log file, which records the transaction log for the InnoDB storage engine.

Each Innodb storage engine has at least 1 redo log filegroup, and each group contains at least 2 redo log files (ib_logfile0,ib_logfile1).

You can improve the high availability of redo logs by setting up multiple mirror log groups (mirrored log groups) to put different groups on different disks.

The files in the log group are of the same size and run in a circular manner. When file 1 is full, switch to file 2, and when file 2 is full, switch to file 1 again. The files in the log group are of the same size and run in a circular manner. When file 1 is full, switch to file 2, and when file 2 is full, switch to file 1 again (write from scratch).

In order to ensure the security of data, redo log will continue to be generated during the transaction. When the transaction is committed, a flush operation is carried out and saved to disk. Redo log is written sequentially, and the sequential read and write speed of the disk is much faster than that of random read and write. When the database or host fails and restarts, the data will be recovered according to the redo log. If there is a transaction commit in the redo log, the transaction commit will be carried out to modify the data. In this way, the atomicity, consistency and persistence of transactions are realized.

The operation of writing to the rewrite log file is not to write directly, but to write to a redo log buffer (redo lopg buffer) first, and then write to the log file according to certain conditions.

When the dirty pages of the corresponding transaction are written to disk, the mission of redo log is completed, and the space occupied by the redo log can be reused (overwritten).

The size of the rewrite log buffer can be configured with innodb_log_buffer_size.

There are two points in time for writing to disk from the log buffer:

1. The main thread writes the redo log buffer to the disk's redo log file every second, regardless of whether the transaction has been committed or not

2. The other is controlled by the parameter innodb_flush_log_at_trx_commit, which means that the redo log is processed when the transaction is committed.

The values that can be set for the parameter innodb_flush_log_at_trx_commit are 0,1,2. 0 means that when a transaction is committed, the redo log of the transaction is not written to the log file on disk, but waits for the main thread to refresh every second. The difference between 1 and 2 is that 1 writes the redo log buffer to disk synchronously during commit, and 2 writes the redo log asynchronously to disk, that is, there is no guarantee that the redo log file will be written to disk.

8. Roll back the log

In addition to the redo record redo log, the data withdrawal operation used by undo log,undo log is also recorded when the data is modified. It records the reverse operation of the modification, such as inserting and deleting, and modifying the corresponding modification to the original data. Transaction rollback can be realized through undo log, and the data can be traced back to a specific version of the undo log to achieve MVCC, that is, unlocked read.

Before the transaction starts, generating the current version of undo log,undo will also generate redo to ensure the reliability of the undo log. After the transaction is committed, the undo log cannot be deleted immediately, but will be put into the linked list to be cleaned. It is up to the purge thread to determine whether other transactions can clean up the log space of undo log by using the version information of the previous transaction of the table in the undo segment.

By default, the undo file is kept in the shared tablespace, that is, the ibdatafile file. When some large transactional operations occur in the database, a large amount of undo information is generated, all saved in the shared tablespace. As a result, shared tablespaces may become very large, and by default, when undo logs use shared tablespaces, shared tablespaces that are "enlarged" will not and cannot be automatically shrunk.

IX. ACID

ACID is the four major characteristics of a transaction.

1. Atomicity (Atomicity)

A transaction must be regarded as an indivisible minimum unit of work, and all operations in the whole transaction either commit successfully or fail to roll back. For a transaction, it is impossible to perform only part of the operations. This is the atomicity of the transaction.

2. Consistency (Consistency)

A database always transitions from one consistent state to another.

3. Isolation (Isolation)

Changes made by one firm are not visible to other transactions until they are finally committed.

4. Persistence (Durability)

Once a transaction commits, its changes are not permanently saved to the database

10. Isolation of transactions

If you do not consider the isolation of transactions, the following problems occur:

1. Dirty reading

A transaction updates a record, and before committing, the second transaction reads the updated record of the first transaction, then the second transaction reads dirty data, resulting in a dependency on the first uncommitted data. Once the first transaction is rolled back, the data read by the second transaction will be the wrong dirty data.

2. Illusory reading

A transaction queries previously retrieved data according to the same query conditions, and does find that the number of retrieved result sets increases or decreases (inserted or deleted by other transactions), similar to hallucinations.

3. Non-repeatable (virtual reading)

After reading some data for a period of time, a transaction reads the data again and finds that the content of the data it reads has changed, that is, it can not be read repeatedly.

The difference between phantom reading and unrepeatable reading is that the number of data items has changed (insert and delete), while unconflicting reading lies in that the data has been updated and the results before and after reading are inconsistent.

Eleventh transaction isolation level

Dirty reading, unrepeatable reading and phantom reading are actually database consistency issues, which must be solved by a certain transaction isolation mechanism. There are four transaction isolation levels under InnoDB:

1. Read unsubmitted (Read uncommitted)

One transaction can read uncommitted modifications from another transaction. This will lead to dirty reading, phantom reading, and unrepeatable reading problems. (basically useless)

2. Read submitted (RC, Read Commit)

One transaction can only read modifications that have been committed by another transaction. It avoids dirty reading, but it still has the problems of unrepeatable reading and phantom reading.

3. Readable (RR, Repeatable Read)

The result of reading the same data multiple times in the same transaction is the same. It avoids the problems of dirty reading and unrepeatable reading, but phantom reading still exists.

4. Serialization (Serializable)

The transaction is executed serially. All the above problems are avoided. The default level of MySQL is that Repeatable read can be read repeatedly, and the higher the level, the more secure the data, but the lower the performance.

12. MVCC

MVCC (Mutil-Version Concurrency Control), multi-version concurrency control, is used to query rows that are being updated by another transaction and to see their values before they are updated. This is a powerful technique that can be used to enhance concurrency so that the query does not have to wait for another transaction to release the lock.

MVCC is implemented by saving a snapshot of the data at a certain point in time (redo log). This means that no matter how long a transaction runs, you can see a consistent view of the data in the same transaction. Depending on the time the transaction starts, it also means that the data in the same table seen by different transactions at the same time may be different.

Save two additional hidden columns in each row of data: the version number when the current row was created and the version number at the time of deletion (which may be empty). The version number here is not the actual time value, but the system version number. Each time you start a new transaction, the system version number is automatically incremented. The system version number at the beginning of the transaction is used as the version number of the transaction, which is used to compare with the version number of each row of records queried.

Each transaction has its own version number, so when the CRUD operation is performed within the transaction, the data version control is achieved by comparing the version number.

Under the default isolation level (REPEATABLE READ), add and delete lookups are changed to this:

SELECT: reads records where the created version is less than or equal to the current transaction version number and deletes records whose version is empty or greater than the current transaction version number. This ensures that the record exists before reading.

INSERT: saves the version number of the current transaction to the created version number of the row

UPDATE: insert a new row with the version number of the current transaction as the creation version number of the new row, and set the deleted version number of the original record line to the current transaction version number

DELETE: saves the version number of the current transaction to the deleted version number of the row

XIII. InnoDB index structure

Mysql index uses B+ tree as data structure; B+Tree in Mysql optimizes on the basis of classical B+Tree and adds sequential access pointers. Add a pointer to the adjacent leaf node in each leaf node of the B+Tree to form a B+Tree with sequential access pointers. This improves interval access performance: if you want to query all data records whose key is from 18 to 49, when you find 18, you can access all data nodes at once simply by traversing the nodes and pointers, greatly mentioning the interval query efficiency (there is no need to return to the upper parent node to repeat traversing lookups to reduce IO operations).

Both MyISAM and InnoDB use the B+Tree index structure. But the underlying index storage is different, MyISAM uses a non-clustered index, while InnoDB uses a clustered index.

Clustered index: the index and data file are the same file.

Non-clustered index: an index that is separate from a data file.

MyISAM indexing principle: non-clustered index-MyISAM myi index file is separated from myd data file, and the index file only stores the pointer address of the data record. The leaf node data field stores the pointer address to the data record.

The MyISAM index searches by B+Tree, takes out the value of its data field if the specified Key exists, and then reads the corresponding data record with the data field value-data pointer address. There is no structural difference between the secondary index and the primary index, except that the primary index requires key to be unique, while the key of the secondary index can be repeated.

The InnoDB index uses a clustered index, the InnoDB data-index file is an idb file, the table data file itself is the main index, and the adjacent indexes are nearby storage. The leaf node data field holds the complete data record (data [column data except primary key id] + primary index [index key: table primary key id]). The leaf node stores the data record directly, takes the primary key id as the key, and stores the data record directly in the leaf node. (underlying storage structure: * * frm-table definition, ibd: innoDB data & index file)

Because the InnoDB is stored in a clustered index structure, the data files of the index InnoDB need to be clustered according to the primary key, so InnoDB requires that the table must have a primary key (MyISAM can be absent). If mysql is not specified, it automatically selects a column that uniquely represents the data record as the primary key, and if no such column exists, mysql automatically generates an implicit field (6-byte length integer) as the primary key for the InnoDB table. All secondary indexes of InnoDB refer to the primary key of the data record as the data field.

14. InnoDB lock type

1. Locking mechanism

Optimistic lock and pessimistic lock are two kinds of concurrency control ideas, which can be used to solve the problem of lost updates.

2. Optimistic lock

Every time I go to get the data, I am very optimistic that there will be no concurrency problems. Therefore, accessing and processing data is unlocked every time. However, in the update, according to the version number or timestamp to determine whether there is a conflict, some will be processed, and no transaction will be committed.

3. Pessimistic lock

Every time I go to get the data, I am very pessimistic and feel that it will be modified by others and there will be concurrency problems. Therefore, exclusive locks are added before accessing and processing data. The data is locked throughout data processing, and the lock is not released until the transaction is committed or rolled back.

4. Lock granularity

Table lock: low overhead, fast locking; strong locking, high probability of lock conflict, the lowest degree of concurrency; no deadlock.

Row lock: high overhead, slow locking; deadlock; small lock granularity, low probability of lock conflict and high concurrency.

Page locks: overhead and locking speed are between table locks and row locks; deadlocks occur; locking granularity is between table locks and row locks, with average concurrency.

5. Compatibility

01. Shared lock

Also known as read lock (S lock). One transaction acquires a shared lock, other transactions can acquire a shared lock, not an exclusive lock, and other transactions can read and not write. SELECT... LOCK IN SHARE MODE display with shared lock.

02. Exclusive lock

Also known as write lock (X lock). If transaction T adds an exclusive lock to data A, other transactions can no longer add any kind of blockade to A. Transactions that are granted an exclusive lock can both read and modify data. SELECT... FOR UPDATE shows adding exclusive locks.

6. Lock mode

Record lock: a lock on the corresponding index record on a row that locks a row record.

Gap lock: a lock on an intermittent index record that locks an interval.

Next-key lock: a combination of a record lock and a lock on the gap before this index record, locking the row record + interval.

Intention locks: to support the simultaneous existence of multiple granularity locks.

These are all the contents of this article entitled "what is the Core knowledge of Mysql Database?" Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.