Summary of the interview questions: it may be the best MySQL knowledge point in the whole network. 07/06 Update SLTechnology News&Howtos

Summary of the interview questions: it may be the best MySQL knowledge point in the whole network.

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Author: Snailclimb collation, editor: SegmentFault

This article was originally published in the SegmentFault column JavaGuide, such as invading and deleting.

The title is a bit of a party title, but I hope you won't have this idea after reading the article-this article is the product of the author's previous summary of MySQL knowledge, which can be used to review the basics of MySQL and prepare for common interview questions in MySQL.

What is MySQL?

MySQL is a relational database that is not commonly used in Java enterprise development because MySQL is open source, free and easy to extend. Alibaba database system also uses a lot of MySQL, so its stability is guaranteed. MySQL is open source, so anyone can download it under the license of GPL (General Public License) and modify it according to their individual needs. The default port number for MySQL is 3306.

Transaction related what is a transaction?

A transaction is a logical set of operations that either execute or do not perform.

The most classic transaction is often cited as an example of money transfer. If Xiao Ming wants to transfer 1000 yuan to Xiao Hong, this transfer will involve two key operations: reduce Xiao Ming's balance by 1000 yuan and increase Xiao Hong's balance by 1000 yuan. In case a sudden error occurs between these two operations, such as the collapse of the banking system, resulting in a decrease in Xiao Ming's balance and no increase in Xiao Hong's balance, this will be wrong. The transaction is to ensure that either these two key operations succeed or fail.

Introduce the four characteristics of things (ACID)?

Atomicity: transactions are the smallest unit of execution and segmentation is not allowed. The atomicity of the transaction ensures that the action is either completed or does not work at all

Consistency: the data is consistent before and after the transaction, and the result of multiple transactions reading the same data is the same.

Isolation: when accessing the database concurrently, one user's transaction is not disturbed by other transactions, and the database is independent among the concurrent transactions.

Persistence: after a transaction is committed. Its changes to the data in the database are persistent, and even if the database fails, it should not have any impact on it.

What are the problems caused by concurrent transactions?

In a typical application, multiple transactions run concurrently, often manipulating the same data to complete their respective tasks (multiple users operate on unified data). Although concurrency is necessary, it can cause the following problems:

Dirty read: when a transaction is accessing data and making changes to the data that have not yet been committed to the database, another transaction accesses the data and then uses the data. Because this data is data that has not yet been committed, then the data read by another transaction is "dirty data", and the operation based on "dirty data" may be incorrect.

Lost modification (Lost to modify): when one transaction reads data, another transaction also accesses the data, so after the data is modified in the first transaction, the second transaction also modifies the data. In this way, the result of the modification within the first transaction is lost, so it is called lost modification. For example, transaction 1 reads the data in a table, transaction 2 also reads transaction 20, transaction 1 modifies A=A-1, transaction 2 also modifies A=A-1, and as a result, transaction 19, the modification of transaction 1 is lost.

Unrepeatable Unrepeatableread: refers to reading the same data multiple times within a transaction. Before this transaction ends, another transaction also accesses the data. Then, between the two reads in the first transaction, the data read by the first transaction may be different because of the modification of the second transaction. This occurs when the data read twice in a transaction is different, so it is called non-repeatable read.

Phantom read: phantom reading is similar to unrepeatable reading. It occurs when one transaction (T1) reads several rows of data, and then another concurrent transaction (T2) inserts some data. In the subsequent query, the first transaction (T1) will find some additional records that do not exist, as if there were hallucinations, so it is called phantom reading.

The difference between unrepeatability and illusion:

The focus of unrepeatable reading is modification, and the focus of phantom reading is to add or delete.

Example 1 (the same condition, you read the data, read it again and found that the value is not the same): before Mr. An in transaction 1 finished reading his salary of 1000, Mr. B in transaction 2 changed the salary of A to 2000. This leads A to change his salary to 2000 when he reads his salary again; this is non-repeatable.

Example 2 (under the same conditions, the number of records read out for the first time and the second time is not the same): if there are 4 people in a payroll table whose salary is greater than 3000, transaction 1 reads all the people whose salary is greater than 3000 and finds a total of 4 records. at this time, transaction 2 inserts a record with a salary greater than 3000, and when transaction 1 reads it again, the record becomes 5, which leads to illusion.

What are the transaction isolation levels? what is the default isolation level for MySQL?

The SQL standard defines four isolation levels:

READ-UNCOMMITTED (read uncommitted): the lowest isolation level that allows reading of data changes that have not yet been committed and may result in dirty, phantom, or unrepeatable reads.

READ-COMMITTED (read committed): allows you to read data that has been committed by a concurrent transaction, preventing dirty reads, but phantom or unrepeatable reads can still occur.

REPEATABLE-READ (repeatable readable): multiple reads to the same field are consistent, unless the data is modified by its own transaction, which can prevent dirty and unrepeatable reads, but phantom readings can still occur.

SERIALIZABLE (serializable): the highest isolation level, fully compliant with the isolation level of ACID. All transactions are executed one by one, so that interference between transactions is completely impossible, that is, this level prevents dirty reading, unrepeatable reading, and phantom reading.

Isolation level

Dirty reading

Non-repeatable

Phantom reading

READ-UNCOMMITTED

√

READ-COMMITTED

√

REPEATABLE-READ

√

SERIALIZABLE

The default isolation level supported by the MySQL InnoDB storage engine is REPEATABLE-READ (rereadable). We can use the SELECT @ @ tx_isolation; command to see

Mysql > SELECT @ @ tx_isolation;+-+ | @ @ tx_isolation | +-+ | REPEATABLE-READ | +-+

It should be noted that what is different from the SQL standard is that the InnoDB storage engine uses the Next-Key Lock lock algorithm at the REPEATABLE-READ (re-readable) transaction isolation level, so it can avoid phantom reading, which is different from other database systems such as SQL Server. Therefore, the default isolation level supported by the InnoDB storage engine is REPEATABLE-READ (readable), which can fully guarantee the isolation requirements of transactions, that is, reaching the SERIALIZABLE (serializable) isolation level of the SQL standard.

Because the lower the isolation level, the fewer locks are requested for transactions, so most database systems have an isolation level of READ-COMMITTED (read commit):, but you should know that there is no performance penalty when the InnoDB storage engine uses REPEATABLE-READ (rereadable) by default.

The InnoDB storage engine generally uses the SERIALIZABLE (serializable) isolation level in the case of distributed transactions.

Index correlation Why can index improve query speed

The following contents are sorted out from: "two artifacts of Database [Index and Lock]" author: Java3y

Let's start with the basic storage structure of MySQL.

The basic storage structure of MySQL is the page (records are stored in the page):

Each data page can form a two-way linked list

The records in each data page can form an one-way linked list.

Each data page generates a page directory for the records stored in it, and when searching for a record through the primary key, you can use dichotomy to quickly locate the corresponding slot in the page catalog. then you can quickly find the specified record by traversing the records in the corresponding group in the slot-with other columns (non-primary keys) as the search criteria: you can only traverse each record in a single linked list starting with the minimum record.

So, if we write a sql statement that doesn't have any optimization, such as select * from user where indexname = 'xxx', it does this by default:

Navigate to the page where the record is located: you need to traverse the two-way linked list to find the page

Find the corresponding record from the page: since it is not based on the primary key query, you can only traverse the single linked list of the page.

Obviously, in the case of a large amount of data, this search will be very slow! The time complexity is O (n).

What does the index do to speed up our queries? In fact, it is to turn disordered data into ordered (relative):

To find a record with id 8, brief steps:

The obvious thing is: without indexing, we need to traverse the two-way linked list to locate the corresponding page, and now we can quickly locate the corresponding page through the "directory"! (binary search, time complexity is approximately O (logn))

In fact, the underlying structure is the B+ tree, which, as an implementation of the tree, allows us to quickly find out the corresponding records.

The following content is sorted out from: "Java engineer's way of practice"

What is the leftmost prefix principle?

Indexes in MySQL can refer to multiple columns in a certain order, which is called a federated index. For example, the name and city plus joint index of the User table is (name,city), while the leftmost prefix principle means that this column can be used if the query condition exactly matches one or more columns on the left side of the index. As follows:

Select * from user where name=xx and city=xx; / / can hit index select * from user where name=xx; / / can hit index select * from user where city=xx; / / cannot hit index

It is important to note that if both conditions are used in a query, but the order is different, such as city= xx and name = xx, then the current query engine will automatically optimize to match the order of the federated index so that it can hit the index.

Because of the leftmost prefix principle, when creating a federated index, the order of the index fields needs to take into account the number of field values after they are duplicated, more in front of them. The ORDER BY clause also follows this rule.

Be careful to avoid redundant indexes

Redundant index means that the function of the index is the same, and if you can hit it, you are sure to hit it, then redundant indexes such as (name,city) and (name) are redundant indexes, and the query that can hit the latter must be able to hit the former. In most cases, you should try to expand the existing index instead of creating a new one.

After the MySQLS.7 version, you can view redundant indexes by querying the schema_redundant_indexes table of the sys library

How does Mysql add indexes to table fields?

1. Add PRIMARY KEY (primary key index)

ALTER TABLE `table_ name` ADD PRIMARY KEY (`column`)

two。 Add UNIQUE (unique index)

ALTER TABLE `table_ name` ADD UNIQUE (`column`)

3. Add INDEX (normal index)

ALTER TABLE `table_ name` ADD INDEX index_name (`column`)

4. Add FULLTEXT (full-text index)

ALTER TABLE `table_ name` ADD FULLTEXT (`column`)

5. Add multi-column index

ALTER TABLE `table_ name` ADD INDEX index_name (`column1`, `column2`, `column3`) Storage engine some common commands

View all storage engines provided by MySQL

Mysql > show engines

From the figure above, we can see that the current default storage engine of MySQL is InnoDB, and of all the storage engines in version 5.7, only InnoDB is a transactional storage engine, that is, only InnoDB supports transactions.

View MySQL's current default storage engine

We can also view the default storage engine with the following command.

Mysql > show variables like'% storage_engine%'

View the storage engine of the table

Show table status like "table_name"

The difference between MyISAM and InnoDB

MyISAM is the default database engine for MySQL (prior to version 5.5). Although the performance is excellent and provides a large number of features, including full-text indexing, compression, spatial functions, etc., MyISAM does not support transactions and row-level locks, and the biggest drawback is that it cannot be safely recovered after a crash. However, after version 5.5, MySQL introduced InnoDB (transactional database engine), and after version 5.5 of MySQL, the default storage engine is InnoDB.

Most of the time we use the InnoDB storage engine, but in some cases MyISAM is also appropriate, such as in read-intensive cases. (if you don't mind MyISAM crashing and answering questions).

The comparison between the two:

Whether row-level locks are supported: MyISAM only supports table-level locks (table-level locking), while InnoDB supports row-level locks (row-level locking) and table-level locks. The default is row-level locks.

Whether to support transactions and secure recovery after a crash: MyISAM emphasizes performance, each query is atomic, and it executes faster than the InnoDB type, but does not provide transaction support. However, InnoDB provides advanced database functions such as transaction support and foreign keys. Transaction security (transaction-safe (ACID compliant)) table with transaction (commit), rollback (rollback), and crash repair capability (crash recovery capabilities).

Whether foreign keys are supported or not: MyISAM does not support it, but InnoDB does.

Whether MVCC is supported or not: only InnoDB supports it. MVCC is more efficient than simple locking in dealing with high concurrent transactions; MVCC only works under the isolation levels of READ COMMITTED and REPEATABLE READ; MVCC can be implemented using optimistic locks and pessimistic locks; and the implementation of MVCC is not uniform in each database.

There is a sentence on "MySQL performance" that reads:

Do not easily believe empirical arguments such as "MyISAM is faster than InnoDB". This conclusion is often not absolute. In many known scenarios, the speed of InnoDB can be unmatched by MyISAM, especially in applications where clustered indexes are used or the data that needs to be accessed can be stored in memory.

In general, we have no problem choosing InnoDB, but if you don't care about scalability and concurrency, don't need transaction support, and don't care about security recovery after a crash, MyISAM is also a good choice. But in general, we need to consider these problems.

The difference between optimistic lock and pessimistic lock

Always assume that in the worst case, every time you go to get the data, you think that someone else will change it, so it will be locked every time you get the data. in this way, others will block until it gets the lock (shared resources are only used by one thread at a time, other threads block, and then transfer resources to other threads after use). Many of these locking mechanisms are used in traditional relational databases, such as row locks, table locks, read locks, write locks and so on. Exclusive locks such as synchronized and ReentrantLock in Java are the realization of pessimistic locking.

Optimistic lock

Always assume that in the best case, every time you go to get the data, you think that others will not change it, so it will not be locked, but when you update it, you will judge whether others have updated the data during this period. You can use the version number mechanism and CAS algorithm to implement it. Optimistic locks are suitable for multi-read applications, which can improve throughput, such as the database provides a mechanism similar to write_condition, which are actually optimistic locks. The atomic variable class under the java.util.concurrent.atomic package in Java is implemented using CAS, an implementation of optimistic locking.

Usage scenarios of two kinds of locks

From the above introduction of the two kinds of locks, we know that each has its own advantages and disadvantages, and one should not be considered better than the other. For example, optimistic locks are suitable for situations where there are few writes (multi-read scenarios), that is, when conflicts really rarely occur, this saves the overhead of locks and increases the overall throughput of the system. However, in the case of overwriting, conflicts often occur, which will cause upper-layer applications to continue to retry, which degrades performance, so it is more appropriate to use pessimistic locks in scenarios with multiple writes.

Two common implementation methods of optimistic lock

Optimistic locks are generally implemented using the version number mechanism or the CAS algorithm.

1. Version number mechanism

Generally, a data version number version field is added to the data table, indicating the number of times the data has been modified. When the data is modified, the version value will be increased by one. When thread A wants to update the data value, it will also read the version value while reading the data. When submitting the update, if the version value just read is equal to the version value in the current database, otherwise, the update operation will be retried until the update is successful.

To take a simple example: suppose there is a version field in the account information table in the database, and the current value is 1, while the current account balance field (balance) is $100.

Operator A reads it out (version=1) at this time and deducts $50 ($100 million 50) from its account balance.

During the operation of operator A, operator B also reads this user information (version=1) and deducts $20 ($100 million) from its account balance.

Operator A completes the modification and adds the data version number by one (version=2), together with the account balance after deduction (balance=$50), and submits it to the database for update. At this time, because the submitted data version is greater than the current version of the database record, the data is updated and the database record version is updated to 2.

Operator B completed the operation and added the version number (version=2) to attempt to submit data (balance=$80) to the database, but when comparing the database record version, it was found that the data version number submitted by operator B was 2, and the current version of the database record was also 2, which did not meet the optimistic lock policy that "the submitted version must be greater than the current version of the record to perform the update". Therefore, the submission of operator B was rejected.

In this way, it is possible for operator B to overwrite the operation result of operator A with the result of old data modification based on version=1.

2. CAS algorithm

Compare and swap (compare and Exchange) is a well-known lock-free algorithm. Lock-free programming, that is, to achieve variable synchronization between multiple threads without using locks, that is, to achieve variable synchronization without thread blocking, so it is also called non-blocking synchronization (Non-blocking Synchronization). The CAS algorithm involves three operands

Memory value V that needs to be read and written

The value A for comparison

New value B to be written

When and only if the value of V is equal to A, CAS updates the value of V with the new value B atomically, otherwise nothing is performed (comparison and substitution is an atomic operation). In general, it is a spin operation, that is, constantly retrying.

The shortcomings of optimistic locks

The ABA problem is a common problem of optimistic locking.

1 ABA problem

If a variable V is read as A value for the first time and checks that it is still A value when preparing for assignment, can we show that its value has not been modified by other threads? Obviously not, because during this time its value may be changed to another value, and then back to A, then the CAS operation will mistakenly assume that it has never been modified. This problem is called the "ABA" problem of CAS operations.

This capability is provided by the AtomicStampedReference class since JDK 1.5, where the compareAndSet method first checks whether the current reference is equal to the expected reference and whether the current flag is equal to the expected flag, and if they are all equal, atomically sets the reference and the value of the flag to the given update value.

(2) long cycle time and high overhead

Spin CAS (that is, if it is not successful, it will loop until it succeeds) if it is not successful for a long time, it will bring a lot of execution overhead to CPU. If JVM can support the pause instructions provided by the processor, then the efficiency will be improved to a certain extent. Pause instructions have two functions. First, it can delay pipelined execution of instructions (de-pipeline), so that CPU will not consume too much execution resources. The delay time depends on the specific version of the implementation. On some processors, the delay time is zero. Second, it can avoid the CPU pipeline being emptied (CPU pipeline flush) caused by memory sequence conflict (memory order violation) when exiting the loop, thus improving the execution efficiency of CPU.

3 can only guarantee the atomic operation of one shared variable

CAS is only valid for a single shared variable, and CAS is invalid when an operation involves spanning multiple shared variables. But since JDK 1.5, AtomicReference classes have been provided to ensure atomicity between reference objects, and you can put multiple variables in a single object for CAS operations. So we can use locks or use the AtomicReference class to merge multiple shared variables into a single shared variable.

Locking Mechanism and InnoDB locking algorithm

Locks used by the MyISAM and InnoDB storage engines:

MyISAM uses table-level locks (table-level locking).

InnoDB supports row-level locks (row-level locking) and table-level locks. The default is row-level locks.

Comparison of table-level locks and row-level locks:

Table-level lock: a lock with the largest granularity in Mysql, which locks the entire table of the current operation, which is easy to implement, consumes less resources, adds locks quickly, and does not cause deadlocks. It has the largest locking granularity, the highest probability of triggering lock conflicts, and the lowest concurrency. Both MyISAM and InnoDB engines support table-level locks.

Row-level lock: a lock with the smallest granularity in Mysql, which is locked only for the rows of the current operation. Row-level locks can greatly reduce conflicts in database operations. The locking granularity is the smallest and the degree of concurrency is high, but the cost of locking is also the largest, locking is slow, and deadlocks will occur.

There are three algorithms for locking the InnoDB storage engine:

Record lock: lock on a single row record

Gap lock: a gap lock that locks a range, excluding the record itself

Next-key lock:record+gap locks a range that contains the record itself

Related knowledge points:

Innodb uses next-key lock for row queries

Next-locking keying in order to solve the problem of Phantom Problem phantom reading

Downgrade next-key lock to record key when the index of the query contains unique attributes

Gap locks are designed to prevent multiple transactions from inserting records into the same scope, which can lead to phantom reading problems.

There are two ways to explicitly close the gap lock: (except for foreign key constraints and uniqueness checking, only record lock is used) A. Set the transaction isolation level to RC B. Set the parameter innodb_locks_unsafe_for_binlog to 1

Large table optimization

When the number of MySQL records in a single table is too large, the CRUD performance of the database will be significantly degraded. Some common optimization measures are as follows:

1. Limit the scope of data

It is important to prohibit query statements without any restrictions on the range of data. For example, when users are querying the order history, we can control it within one month.

two。 Read / write separation

In the classic database split scheme, the master database is responsible for writing and the slave database is responsible for reading.

3. Vertical partition

Split according to the correlation of the data tables in the database. For example, if the user table contains both the user's login information and the user's basic information, the user table can be split into two separate tables, or even put into a separate library.

To put it simply, vertical split refers to the split of data table columns, dividing a table with more columns into multiple tables. As shown in the following figure, it should be easier for people to understand.

The advantage of vertical split: it can make the column data smaller, reduce the number of Block read when querying, and reduce the number of I _ Unip O times. In addition, vertical partitioning simplifies the structure of the table and is easy to maintain.

Disadvantages of vertical split: primary keys are redundant, redundant columns need to be managed, and Join operations can be caused, which can be solved by Join at the application layer. In addition, vertical partitioning makes transactions more complex

4. Horizontal zoning

Keep the structure of the data table unchanged and store data fragments through some strategy. In this way, each piece of data is distributed to different tables or libraries to achieve the purpose of distribution. Horizontal split can support a very large amount of data.

Horizontal split refers to the split of data table rows, which slows down when the number of rows of the table exceeds 2 million. At this time, the data of one table can be split into multiple tables to store. For example, we can split the user information table into multiple user information tables, so as to avoid the impact on performance caused by the excessive amount of data in a single table.

Horizontal split can support a very large amount of data. It should be noted that the split table only solves the problem that the data of a single table is too large, but because the data of the table is still on the same machine, it is of no significance to improve the concurrency ability of MySQL, so horizontal split is best to split the library.

Horizontal split can support a very large amount of data storage and less application modification, but the sharding transaction is difficult to solve, cross-node Join performance is poor, and the logic is complex. The author of "Java engineer practice" recommends that data should not be sliced as much as possible, because splitting will bring various complexities of logic, deployment, and operation and maintenance, and there is no big problem for general data tables to support less than 10 million data in the case of proper optimization. If you really want sharding, try to choose the client-side sharding architecture, which can reduce the network Imanence O of primary and middleware.

Here are two common solutions for database sharding:

Client agent: the sharding logic is on the application side, encapsulated in the jar package, and implemented by modifying or encapsulating the JDBC layer. Sharding-JDBC of Dangdang and TDDL of Ali are two commonly used implementations.

Middleware agent: a proxy layer is added between the application and the data. Sharding logic is uniformly maintained in the middleware service. What we are talking about now is the implementation of this architecture, such as Mycat, Atlas of 360, DDB of NetEase and so on.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.