What are the basic problems of MySQL 07/04 Update SLTechnology News&Howtos

What are the basic problems of MySQL

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article mainly talks about "what are the basic problems of MySQL". Interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn what are the basic problems of MySQL.

Routine article 1, talk about the three paradigms of the database?

The first paradigm: field atomicity, the second paradigm: row uniqueness, with primary key columns, and the third paradigm: each column is related to the primary key column.

In practical application, a small number of fields will be redundant to reduce the associated table, so as to improve the query efficiency.

2. Query only one piece of data, but the execution is also very slow. What are the general reasons?

The MySQL database itself is blocked, such as insufficient system or network resources.

SQL statements are blocked, such as table locks, row locks, etc., causing the storage engine not to execute the corresponding SQL statements

It is true that the index was used improperly and did not walk the index.

Due to the characteristics of the data in the table, the index is gone, but the number of returns to the table is huge.

3. What is the difference between count (*), count (0) and count (id)?

For count functions in the form of count (*), count (constant), and count (primary key), the optimizer can select the index with the lowest scanning cost to execute the query, thus improving efficiency, and their execution process is the same.

For count, the optimizer chooses a full table scan, which means that it can only be scanned sequentially at the leaf nodes of the clustered index.

4. What should I do if I delete data by mistake?

1) if the amount of data is large, use physical backup xtrabackup. Make a full backup of the database on a regular basis, or you can do an incremental backup.

2) if the amount of data is small, use mysqldump or mysqldumper, and then use binlog to restore or build master-slave to recover the data, you can recover it from the following points:

DML misoperation statement: you can parse binlog event first through flashback, and then reverse it.

Misoperation of DDL statement: data can only be restored through full backup + application of binlog. Once the amount of data is relatively large, then the recovery time is very long.

Rm deletion: use backups across computer rooms, or preferably across cities.

5. The difference between drop, truncate and delete

The DELETE statement performs deletions by deleting one row at a time from the table and saving the deletion of that row in the log as a transaction record for rollback operations.

On the other hand, TRUNCATE TABLE deletes all the data from the table at once and does not log the individual delete operation records, and the delete row cannot be recovered. And the delete trigger related to the table will not be activated in the process of deletion, so the execution speed is fast.

The drop statement frees up all the space occupied by the table.

6. Why doesn't MySQL large table query burst memory?

MySQL is "sent while reading", which means that if the client receives slowly, it will cause the MySQL server to execute the transaction longer because the result cannot be sent.

The server does not need to save a complete result set. The process of fetching and sending data is operated through a next_buffer.

Data pages in memory are managed in Buffer Pool (BP).

The InnoDB management Buffer Pool uses the improved LRU algorithm and is implemented using linked lists. In the implementation of InnoDB, the whole LRU linked list is divided into young area and old area according to the proportion of 5:3 to ensure that the hot data will not be washed away when loading cold data in large quantities.

7. How to deal with deep paging (super-large paging)?

Optimize with id: first find the maximum ID of the last page, and then query it using the index on id, similar to select * from user where id > 1000000 limit 100.

Optimization with overlay index: when Mysql's query hits the index completely, it is called overlay index, which is very fast, because the query only needs to look up on the index, and then it can be returned directly without going back to the table to get the data. So we can first find out the ID of the index, and then get the data according to Id.

Limit the number of pages as the business allows

8. How do you optimize SQL in daily development?

Add an appropriate index: index the fields that are used as query conditions and order by, and build a combined index for the consideration of multiple query fields. At the same time, pay attention to the order of the combined index fields, put the columns most commonly used as constraints on the far left, decreasing in turn, and the indexes should not be too many, generally less than 5.

Optimized table structure: numeric fields are better than string types, smaller data types are usually better, try to use NOT NULL

Optimize the query statement: analyze the SQl execution plan, whether to hit the index, etc., if the SQL is very complex, optimize the SQL structure, if the table data is too large, consider sub-table

9. What is the difference between concurrent connections and concurrent queries in MySQL?

In the result of performing the show processlist, you see thousands of connections, referring to concurrent connections.

The statement "currently executing" is the concurrent query.

Memory is mostly affected by the number of concurrent connections.

High concurrent queries are bad for CPU. When the number of CPU cores of a machine is limited and all threads rush in, the cost of context switching will be too high.

It should be noted that after the thread enters the lock and waits, the concurrent thread count is reduced by one, so the thread waiting for the row lock or gap lock is not counted in the count range. In other words, the thread that enters the lock and waits does not eat CPU, thus avoiding the locking of the whole system.

10. What is the internal operation when the MySQL update field value is the original value?

No update updates are made when the same data is used.

However, logs are handled differently for different binlog formats:

1) based on the row mode, the server layer matches the record to be updated, finds that the new value is consistent with the old value, and returns directly without updating, and does not record the binlog.

2) when based on statement or mixed format, MySQL executes the update statement and records the update statement to binlog.

11. What's the difference between datetime and timestamp?

The date range of datetime is 1001 Murray 9999, and the time range of timestamp is 1970 Murray 2038.

The datetime storage time is independent of the time zone; the timestamp storage time is related to the time zone, and the value displayed is also dependent on the time zone

Datetime has 8 bytes of storage; timestamp has 4 bytes of storage.

The default value of datetime is null;timestamp. By default, the field is not empty (not null), and the default value is current time (current_timestamp).

12. What are the isolation levels of transactions?

The lowest level of "read unsubmitted" (Read Uncommitted), no guarantee under any circumstances

Read submitted (Read Committed) can avoid dirty reading.

"Repeatable Read" can avoid dirty and unrepeatable reading.

"Serializable" can avoid dirty reading, unrepeatable reading and phantom reading.

The default transaction isolation level for Mysql is repeatable (Repeatable Read)

13. There are two kill commands in MySQL

Kill query + thread id, which terminates the statement that is being executed in this thread

Kill connection + thread id, where connection can be defaulted to disconnect this thread

Index 1. What are the index classifications?

According to the content of the leaf node, the index type is divided into primary key index and non-primary key index.

The leaf node of the primary key index stores the entire row of data. In InnoDB, a primary key index is also known as a clustered index (clustered index).

The leaf node content of a non-primary key index is the value of the primary key. In InnoDB, a non-primary key index is also called a secondary index (secondary index).

2. What is the difference between a clustered index and a nonclustered index?

Clustered index: a clustered index is an index created with a primary key, and a clustered index stores data in a table at the leaf node.

Nonclustered index: an index created by a non-primary key, in which the leaf node stores the primary key and index columns. When the nonclustered index is used to query the data, get the primary key on the leaf and find the data you want to find. The process of getting the primary key and looking for it is called going back to the table.

Overlay index: assuming that the columns queried happen to be corresponding to the index, and there is no need to go back to the table to check, then this index column is called an overlay index.

The hash index can handle the addition, deletion, modification and query of a single data row at the speed of O (1), but it will lead to the result of full table scan in the face of range query or sorting.

B-tree can store data in non-leaf nodes. Because all nodes may contain target data, we always have to traverse the subtree from the root node down to find the data rows that meet the conditions. This feature brings a large number of random Imax O, resulting in performance degradation.

All the data rows of the B+ tree are stored in the leaf nodes, and these leaf nodes can be connected sequentially through the "pointer". When we traverse the data in the B+ tree shown below, we can jump directly between multiple child nodes. this can save a lot of disk Igo O time.

Binary tree: the height of the tree is uneven, it cannot be self-balanced, the search efficiency is related to the data (the height of the tree), and the IO cost is high.

Red-black tree: the height of the tree increases with the increase of the amount of data, and the IO cost is high.

4. Talk about clustered index and non-clustered index?

In InnoDB, the leaf node of index B + Tree stores the whole row of data is the primary key index, also known as clustered index, that is, the data storage and index are put together, and the data is found when the index is found.

The leaf node of the index B+Tree stores the primary key value of the non-primary key index, which is also called non-clustered index and secondary index.

The first index is usually a sequential IO, and the operation of returning the table is a random IO. The more times we need to go back to the table, that is, the more random IO times, the more we tend to use full table scans.

5. Does a non-clustered index necessarily return a table query?

Not necessarily, this involves whether all the fields required by the query statement hit the index, and if all the fields hit the index, then there is no need to return to the table query. An index contains (overrides) the values of all the fields to be queried, which is called an override index.

6. Tell me about the leftmost prefix principle of MySQL.

The leftmost prefix principle is the leftmost priority. When creating a multi-column index, the most frequently used column in the where clause is placed on the leftmost according to the business requirements.

MySQL will keep matching to the right until it encounters a range query (>, 3 and d = 4). If you build an index in the order of (a), d does not need an index, but if you build an index (a), you can use it, and the order of a d can be adjusted at will.

= and in can be out of order, for example, a = 1 and b = 2 and c = 3 indexes can be built in any order, and MySQL's query optimizer will help you optimize it into a form that the index can recognize.

7. What is index push-down?

When the leftmost prefix principle is satisfied, the leftmost prefix can be used to locate the record in the index.

Before MySQL 5. 6, tables could only be returned one by one from ID. Go to the primary key index to find the data rows, and then compare the field values.

The index push-down optimization (index condition pushdown) introduced by MySQL 5.6can first judge the fields contained in the index in the process of index traversal, directly filter out the records that do not meet the conditions, and reduce the number of times to return to the table.

8. Why does Innodb use self-increasing id as the primary key?

If the table uses a self-incrementing primary key, each time a new record is inserted, the record is sequentially added to the subsequent position of the current index node, and when a page is full, a new page is automatically opened. If you use a non-self-increasing primary key (such as ID number or student number, etc.), because the value of each insertion of the primary key is approximately random, each new record has to be inserted into the middle of the existing index page, frequent movement and paging operations have caused a large number of fragments, resulting in an index structure that is not compact enough. Later, you have to OPTIMIZE TABLE (optimize table) to rebuild the table and optimize to fill the page.

9. The implementation principle of transaction ACID feature?

Atomicity: is implemented using undo log. If there is an error during transaction execution or the user executes rollback, the system returns the state of the transaction start through the undo log log.

"persistence": using redo log to achieve, as long as the redo log log is persisted, when the system crashes, the data can be recovered through redo log.

Isolation: isolates transactions from each other through locks and MVCC.

Consistency: consistency is achieved through rollback, recovery, and isolation in concurrency cases.

10. What is the difference between MyISAM and InnoDB in implementing B-tree indexes?

InnoDB storage engine: the leaf node of the B+ tree index saves the data itself

MyISAM storage engine: the physical address where the leaf node of the B+ tree index holds the data

InnoDB, its data file itself is an index file. Compared with MyISAM, the index file and the data file are separated. The table data file itself is an index structure organized by B+Tree. The node data field of the tree stores the complete data record. The key of this index is the primary key of the data table, so the InnoDB table data file itself is the primary index, which is called "clustered index" or clustered index. The rest of the index is used as a secondary index, and the data field of the secondary index stores the value of the corresponding record primary key instead of the address, which is different from MyISAM.

11. What are the categories of the index?

According to the content of the leaf node, the index type is divided into primary key index and non-primary key index.

The leaf node of the primary key index stores the entire row of data. In InnoDB, a primary key index is also known as a clustered index (clustered index).

The leaf node content of a non-primary key index is the value of the primary key. In InnoDB, a non-primary key index is also called a secondary index (secondary index).

12. What are the scenarios that lead to index invalidation?

Background: the fast positioning ability provided by B + tree comes from the order of sibling nodes on the same layer, so if this order is destroyed, the high probability will be invalidated, as shown in the following situations:

Use left or left fuzzy matching on the index: that is, either like% xx or like% xx% will cause the index to fail. The reason is that the result of the query may be "Chen Lin, Zhang Lin, Zhou Lin" and so on, so I don't know which index value to compare, so I can only query it by full table scan.

Use function / a pair of indexes for expression evaluation of the index: because the index holds the original value of the index field, rather than the value calculated by the function, there is no way to walk the index.

Implicit type conversion for indexes: equivalent to using a new function

OR: in the WHERE clause means that only one of the two columns is satisfied, so it is meaningless to have only one conditional column that is indexed. As long as the conditional column is not an indexed column, a full table scan will be performed.

Solution 1, there is a system without sub-database sub-table, how to design so that the system can be dynamically switched to sub-database sub-table?

Downtime and capacity expansion (not recommended)

Double write migration scheme: design the table structure scheme after the expansion, and then achieve double write to the single database and the sub-database. After observing for a week, close the read traffic of the single database, and then observe for a period of time. After being stable, turn off the write traffic of the single database and smoothly switch to the sub-database and sub-table.

2. How to design a sub-database and sub-table scheme that can dynamically expand and reduce capacity? Principle part 1. What is the execution step of a MySQL statement?

The steps for the Server layer to execute sql sequentially are:

Client request-> Connector (verify user identity and grant permissions)-> query cache (return directly if cache exists, follow-up operations will be performed if no cache exists)-> Analyzer (lexical analysis and parsing of SQL)-> optimizer (mainly choose the best execution scheme method for sql optimization) > executor (the execution will first see whether the user has the right to execute. Use the interface provided by this engine)-> go to the engine layer to get the data return (if query caching is enabled, the query results will be cached).

2. What is the internal principle of order by sorting?

MySQL allocates a memory (sort_buffer) for each thread to sort the memory size to sort_buffer_size.

If the amount of data sorted is less than sort_buffer_size, the sorting will be done in memory.

If the amount of sorting data is so large that so much data cannot be stored in memory, temporary disk files are used to assist sorting, also known as external sorting.

When using external sorting, MySQL splits into several separate temporary files to store the sorted data, and then merges these files into one large file.

3. The principle of MVCC implementation?

MVCC (Multiversion concurrency control) is a way to keep multiple versions of the same data, thus achieving concurrency control. In the query, through the read view and version chain to find the corresponding version of the data.

Function: improve concurrency performance. For high concurrency scenarios, MVCC has less overhead than row-level locking.

The implementation of MVCC depends on the version chain, which is implemented through three hidden fields of the table.

1) DB_TRX_ID: the current transaction id, which determines the time order of transactions by the size of the transaction id.

2) DB_ROLL_PRT: rollback the pointer to the previous version of the current row record, using this pointer to connect multiple versions of the data together to form a undo log version chain.

3) DB_ROLL_ID: primary key. If the data table does not have a primary key, InnoDB will automatically generate the primary key.

4. What is change buffer and what is its function? 5. How does MySQL ensure that data is not lost?

As long as redolog and binlog guarantee the persistence of the disk, you can ensure that the data resumes the binlog write mechanism after an abnormal restart of MySQL.

Redolog ensures that lost data can be redone after a system exception, and binlog archives the data to ensure that lost data can be recovered.

Redolog is written before the transaction is executed. During the transaction execution, the log is first written to the binlog cache, and when the transaction is committed, the binlog cache is written to the binlog file.

6. Why has the table been deleted and the size of the table file has not changed?

After the data item is deleted, the InnoDB tag page An is marked as reusable.

What about the delete command to delete the data from the entire table? As a result, all data pages are marked as reusable. But on disk, the file does not get smaller.

After a large number of additions, deletions and corrections, there may be holes in the tables. These holes also take up space, so if we can get rid of these holes, we can achieve the purpose of shrinking the table space.

This can be achieved by rebuilding the table. You can use the alter table An engine=InnoDB command to rebuild the table.

7. Comparison of three formats of binlog

The primary key id of the operation line of the binlog record in row format and the real value of each field, so there is no data inconsistency between the master and standby operations.

Statement: source SQL statement of the record

Mixed: the first two are mixed, why do you still need to have files in mixed format, because some binlog in statement format may cause inconsistency between master and standby, so use row format. But the disadvantage of the row format is that it takes up a lot of space. MySQL takes a compromise. MySQL will determine whether this SQL statement may cause inconsistency between master and standby, and if possible, use row format, otherwise use statement format.

8. MySQL locking rules

Principle 1: the basic unit of locking is that next-key lock,next-key lock is the front open and back closed interval.

Principle 2: only objects accessed during lookup will be locked

Optimization 1: the equivalent query on the index, when the unique index is locked, the next-key lock is reduced to a row lock.

Optimization 2: when the equivalent query on the index is traversed to the right and the last value does not satisfy the equivalence condition, the next-key lock is reduced to a gap lock.

A bug: the range query on the unique index accesses the first value that does not meet the condition.

9. What is dirty reading, unrepeatable reading and phantom reading?

"dirty reading": dirty reading refers to reading uncommitted data from other transactions, which means that the data may be rolled back, that is, data that may not eventually be stored in the database, that is, data that does not exist. Read the data that does not necessarily exist in the end, this is dirty reading.

"non-repeatable read": non-repeatable reading refers to the inconsistency between the data read at the beginning of a transaction and the same batch of data read at any time before the end of the transaction.

"Phantom reading": Phantom reading does not mean that the result set obtained by the two reads is different. The focus of phantom reading is that the data state of the result obtained by a certain select operation cannot support subsequent business operations. More specific: select whether a record exists, does not exist, is ready to insert this record, but when you execute insert, you find that the record already exists and cannot be inserted, and a phantom reading occurs.

10. What locks does MySQL have? Isn't locking like the above a bit of a hindrance to concurrency efficiency?

In terms of the types of locks, there are shared locks and exclusive locks.

1) shared lock: also known as read lock. When the user wants to read the data, add a shared lock to the data. Multiple shared locks can be added at the same time.

2) exclusive lock: also called write lock. When the user wants to write the data, add an exclusive lock to the data. Only one exclusive lock can be added, and he and other exclusive locks and shared locks are mutually exclusive.

The granularity of locks depends on the specific storage engine. InnoDB implements row-level locks, page-level locks, and table-level locks.

Their locking costs range from large to small, and their concurrency capabilities range from large to small.

Frame 1, what is the principle of Mysql master-slave replication?

Update events for Master (update, insert, delete) are written sequentially to bin-log. When Slave connects to Master, the Master machine starts the binlog dump thread for Slave, which reads the bin-log log.

After Slave connects to Master, the Slave library has an bin-log O thread that requests binlog dump thread to read the bin-log log and then writes it to the relay log log of the slave library.

Slave also has a SQL thread that monitors the relay-log log in real time for updates, parses the SQL statements in the file, and executes them in the Slave database.

2. What are the synchronization methods of Mysql master-slave replication?

Asynchronous replication: Mysql master-slave synchronization is asynchronous replication by default. In the above three steps, only the first step is synchronous (that is, Mater writes the binlog log), that is, after the master database writes the binlog log, it can successfully return to the client without waiting for the binlog log to be passed to the slave database.

Synchronous replication: for synchronous replication, the Master host sends an event to the Slave host and triggers a wait until all Slave nodes (if there is more than one Slave) return information about successful data replication to the Master.

Semi-synchronous replication: for semi-synchronous replication, the Master host sends an event to the Slave host and triggers a wait until one of the Slave nodes (if there is more than one Slave) returns information about the success of the data replication to the Master.

3. What is the cause of Mysql master-slave synchronization delay? How to optimize it?

If the master node executes a large transaction, it will have a great impact on the master-slave delay.

Network latency, large logs, and excessive number of slave

Multi-thread writing on the master and only single-thread synchronization in the slave node

Machine performance problem, whether the slave node uses a "bad machine"

Lock conflict problems may also cause SQL threads on the slave machine to execute slowly

4. What is the cause of Mysql master-slave synchronization delay? How to optimize it?

Big transactions: divide large transactions into small transactions and update data in batches

Reduce the number of Slave, no more than 5, and reduce the size of a single transaction

After Mysql 5. 7, you can use multithreaded replication and replicate the architecture using MGR

In the case of problems with disk, raid card and scheduling policy, there may be a high delay of a single IO. You can use the iostat command to check the IO of the DB data disk, and then further judge.

To solve the lock problem, you can check it by grabbing processlist and looking at the tables related to locks and transactions under information_schema.

6. What is bin log/redo log/undo log?

Bin log is a file at the Mysql database level that records all modifications made to the Mysql database and does not record select and show statements.

What is recorded in redo log is the data to be updated, for example, if a piece of data has been submitted successfully, it will not be synchronized to disk immediately, but will be recorded in redo log first and then refreshed at the right time, in order to achieve transaction persistence.

Undo log is used for data recall operations, and it retains the content of the record before it was modified. Transaction rollback can be implemented through undo log, and MVCC can be implemented by tracing back to a specific version of the data based on undo log.

At this point, I believe you have a deeper understanding of "what are the basic problems of MySQL?" you might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.