What is the underlying principle of indexing and locking in the database 07/19 Update SLTechnology News&Howtos

What is the underlying principle of indexing and locking in the database

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

What is the underlying principle of indexing and locking in the database? many novices are not very clear about this. In order to help you solve this problem, the following editor will explain it in detail. People with this need can come and learn. I hope you can get something.

I. Index

Before, I had the following understanding of the index:

The index can speed up the retrieval of the database.

Do not build an index if the table is often INSERT/UPDATE/DELETE-operated. In other words, the index will slow down the maintenance tasks such as insert, delete, modify, etc.

Indexes need to take up physical and data space

Understand the leftmost matching principle of the index

Know the classification of indexes: clustered and nonclustered indexes

Mysql supports both Hash index and Btree index.

It seems like you know everything, but you may GG when you are asked to say it in the interview:

Why can the use of indexes speed up the retrieval of the database?

Why do indexes slow down maintenance tasks such as inserts, deletions, modifications, etc.

What does the leftmost matching principle of the index refer to?

What is the difference between a Hash index and a B+ tree index? Which is more commonly used in the mainstream? Does InnoDB storage support it?

What is the difference between a clustered index and a nonclustered index?

1. Talk about the basics of indexing

First of all, the basic storage structure of Mysql is the page (records are stored in the page):

Each data page can form a two-way linked list

The records in each data page can form an one-way linked list.

Each data page generates a page directory for the records stored in it. When searching for a record through the primary key, you can quickly locate the corresponding slot in the page directory using dichotomy. Then you can quickly find the specified record by traversing the records in the corresponding group of the slot.

Search with other columns (not primary keys): each record in a single linked list can only be traversed in turn, starting with the minimum record.

So, if we write a sql statement that doesn't have any optimization, such as select * from user where username = 'Java3y', it does this by default:

Navigate to the page where the record is located

You need to traverse the two-way linked list to find the page you are on

Find the corresponding record from the page you are in

Since the query is not based on the primary key, you can only traverse the single linked list of the page.

Obviously, in the case of a large amount of data, this search will be very slow!

2. Index improves the speed of retrieval.

What does the index do to speed up our queries?

In fact, it is to turn disordered data into ordered (relative):

To find a record with id 8, brief steps:

The obvious thing is: without indexing, we need to traverse the two-way linked list to locate the corresponding page, and now we can quickly locate the corresponding page through the "directory"!

In fact, the underlying structure is the B+ tree, which, as an implementation of the tree, allows us to quickly find out the corresponding records.

3. The index reduces the speed of addition, deletion and modification.

If an ordinary tree can be degenerated into a linked list in extreme cases (the advantages of the tree no longer exist)

B+ tree is a kind of balanced tree, will not degenerate into a linked list, the height of the tree is relatively low (basically in line with the short and fat (balanced) structure) [so the time complexity of our retrieval is O (logn)]! As we can see from the figure in the previous section, indexing is actually building a B + tree.

The B + tree is a balanced tree. If we add and delete this tree, it will certainly destroy its original structure.

Extra work must be done to maintain a balanced tree. Because of these extra work expenses, the index will slow down the speed of adding, deleting and modifying.

4. Hash indexing

In addition to the B+ tree, there is also a common kind of hash index.

Hash indexing uses a certain hash algorithm to convert the key value into a new hash value, which does not need to be searched step by step from the root node to the leaf node like the B + tree, but can be located immediately by the hash algorithm once. The speed is very fast.

In essence, it converts the key value into a new hash value and locates it according to this hash value.

It seems that the hash index is very powerful, but in fact, the hash index has several limitations (according to his essential principle):

Hash indexing also has no way to use the index to complete sorting.

The leftmost matching principle is not supported.

In the case of a large number of repeated keys, the efficiency of hash indexing is also very low-> hash collision problem.

Range query is not supported

5. Does InnoDB support hash indexing?

The mainstream still uses more B+ tree indexes. For hash indexes, InnoDB is adaptive hash indexing (hash index creation is automatically optimized by the InnoDB storage engine, and we can't intervene)!

6. Clustered and nonclustered indexes

A brief summary:

A clustered index is an index created with a primary key.

A nonclustered index is an index created with a non-primary key.

Difference:

The clustered index stores data in the table at the leaf node.

Nonclustered indexes store primary keys and index columns on leaf nodes

When you use a nonclustered index to query the data, get the primary key on the leaf and find the data you want to find. (the process of getting the primary key and looking for it is called returning to the table.)

A nonclustered index is also called a secondary index. You don't have to tangle with so many nouns, just make it equivalent.

Nonclustered indexes may not be single-column when they are built, and multiple columns can be used to create the index.

At this point, it involves the question of which column will go to the index and which column will not go to the index (the leftmost matching principle-- > later)

When creating multiple single-column (nonclustered) indexes, multiple index trees are generated (so creating too many indexes takes up disk space)

A special index-- > overlay index-- is also involved in creating multi-column indexes.

As we knew earlier, if it is not a clustered index, the leaf node stores primary key + column values.

In the end, you have to "go back to the table", that is, to look for it again through the primary key. It will be slower.

Override the index is to query the column and the index is corresponding, do not do back to the table operation!

For example:

Now that I have created the username,age, when querying the data: select username,age from user where username = 'Java3y' and age = 20.

It is obvious that our above query is indexed, and the columns to be queried exist in the leaf node! So, you don't have to go back to the watch.

So, if you can use an overlay index, use it as much as possible.

7. The leftmost matching principle of index

Leftmost matching principle:

An index can be as simple as a column (a) or as complex as multiple columns (a, b, c, d), that is, a federated index.

If it is a federated index, then key also consists of multiple columns. At the same time, the index can only be used to find out whether key exists (equal). If you encounter a range query (>, 3 and d = 4), it will be * a, b, c in each node, and cannot * d. (very simple: index * can only be equal, not range matching)

8, =, in automatic optimization sequence

Regardless of the order of =, in, and so on, mysql automatically optimizes the order of these conditions to match as many index columns as possible.

Example:

If there is an index (a, b, c, d), the query condition c > 3 and b = 2 and a = 1 and d

< 4与a = 1 and c >

3 and b = 2 and d

< 4等顺序都是可以的，MySQL会自动优化为a = 1 and b = 2 and c >

3 and d

< 4，依次***a、b、c。 9、索引总结索引在数据库中是一个非常重要的知识点！上面谈的其实就是索引最基本的东西，要创建出好的索引要顾及到很多的方面： 1，最左前缀匹配原则。这是非常重要、非常重要、非常重要（重要的事情说三遍）的原则，MySQL会一直向右匹配直到遇到范围查询（>

, X lock, S lock, IS lock, IX lock, MMVC...

The knowledge of lock is related to the isolation level of storage engine, index and transaction.

This brings a lot of trouble to beginners of database locks ~ ~ so I will simply sort out the knowledge points of database locks. I hope it will be helpful for you to read it.

1. Why do you need to learn the knowledge of database lock

When developing, many people should rarely notice the problems of these locks and rarely add locks to the program (except in the case of inventory, which requires high quantity accuracy).

Generally speaking, I have heard the so-called optimistic lock and pessimistic lock, but after understanding the basic meaning, it is gone.

Reassurance: even if we don't know how to lock, our program can still run well under normal circumstances. Because these lock databases implicitly add:

For UPDATE, DELETE, and INSERT statements, InnoDB automatically adds an exclusive lock (X) to the dataset involved

MyISAM automatically adds read locks to all tables involved before executing the query statement SELECT, and automatically adds write locks to the tables before performing update operations (UPDATE, DELETE, INSERT, etc.). This process does not require user intervention.

Manual locking is required only in certain scenarios, and the purpose of learning database locking is to:

It can make us useful in certain situations.

It is better to control the programs written by yourself.

You can have a few words when talking to others about database technology.

Build your own knowledge base system! It's true in the interview.

2. Brief introduction of table lock

First of all, from the granularity of the lock, we can be divided into two categories:

Table lock cost is small, lock fast; there will be no deadlock; locking strength, high probability of lock conflict, concurrency *

High cost of row lock, slow locking, deadlock, small lock granularity, low probability of lock conflict and high concurrency

Different storage engines support different locking granularity:

Both InnoDB row locks and table locks are supported!

MyISAM only supports table locks!

InnoDB uses row-level locks only if it retrieves data through index conditions, otherwise InnoDB will use table locks

In other words, InnoDB's row locks are index-based!

There are two modes under the table lock:

Table read lock (Table Read Lock)

Table write lock (Table Write Lock)

You can clearly see from the following picture that in the environment of table read lock and table write lock: read is not blocked, read and write is blocked, write and write is blocked!

Read does not block: the current user is reading data, other users are also reading data, will not be locked

Read and write blocking: the current user is reading data, other users can not modify the data read by the current user, will be locked!

Write blocking: the current user is modifying the data, other users can not modify the data that the current user is modifying, it will be locked!

As you can see from above: read locks and write locks are mutually exclusive, and read and write operations are serial.

If one process wants to acquire a read lock, while another process wants to acquire a write lock. In mysql, write locks take precedence over read locks!

The priority of write and read locks can be adjusted by parameters: max_write_lock_count and low-priority-updates

It is worth noting that:

MyISAM can support the concurrency of query and insert operations. Which mode can be specified through the system variable concurrent_insert, which is the default in MyISAM: if there are no holes in the MyISAM table (that is, no deleted rows in the middle of the table), MyISAM allows one process to insert records from the footer while another process reads the table.

But the InnoDB storage engine is not supported!

3. Optimistic lock and pessimistic lock

Both Read committed and Repeatable read isolation levels are designed to resolve read-write conflicts.

Just under the Repeatable read isolation level, let's consider one problem:

At this point, the operation of user Li Si is lost:

Lost update: the update of one transaction overrides the update results of other transactions.

(ps: I have not thought of a better example to illustrate the problem of update loss at the moment. Although the above example is also update loss, it is acceptable to some extent. I wonder if anyone can think of an unacceptable example of update loss.)

The solution:

Using the Serializable isolation level, transactions are executed serially!

Optimistic lock

Pessimistic lock

Optimistic locking is an idea, and the implementation is that there is a version field in the table, which is obtained when it is read for the second time. When the business logic is updated, you need to check again whether the value of this field is the same as that of * times. If the same update, otherwise refuse. The reason for optimism is that the schema is not locked from the database until it is updated to determine whether it can be updated.

Pessimistic locks are locks added at the database level, which block to wait for the lock.

3.1. Pessimistic lock

So, according to the example above. If we use pessimistic locks, it's actually very simple (just add row locks manually):

Select * from xxxx for update

Adding for update after the select statement is equivalent to adding an exclusive lock (write lock). After adding a write lock, other transactions cannot modify it! You need to wait for the current transaction to be modified before it can be modified.

In other words, if Zhang San uses select. For update, Li Si is unable to modify this record.

3.2. Optimistic lock

Optimistic locks are not locks at the database level, but need to be added manually. In general, we add a version field to implement:

The specific process is as follows:

Zhang San select * from table-> will query the record, and there will be a version field.

Li Si select * from table-> will query the record, and there will be a version field.

Li Si made changes to this record: update A set Name=lisi,version=version+1 where ID=# {id} and version=# {version}, judged that the version queried before was compared with the version of the current data, and updated the version field at the same time.

At this point, the database record is as follows:

Zhang San also modified this record: update A set Name=lisi,version=version+1 where ID=# {id} and version=# {version}, but failed! Because the version in the current database is not consistent with the version queried!

4. Gap lock GAP

When we retrieve data using range conditions instead of equality conditions, and request sharing or exclusive locks, InnoDB locks the index entries of existing data records that meet the range conditions; records whose key values are within the range but do not exist are called "GAP". InnoDB also locks this "gap", and this locking mechanism is called gap locking.

It is worth noting that gap locks will only be used at the Repeatable read isolation level ~

Example: if there are only 101 records in the emp table, the empid values are 1, 2, and 100101, respectively.

Select * from emp where empid > 100 for update

The above is a range query, and InnoDB locks not only records with an empid value of 101, but also "gaps" with an empid greater than 101 (these records do not exist).

InnoDB uses gap locks for two purposes:

To prevent phantom reading (as mentioned above, phantom reading can be avoided by using GAP lock under Repeatable read isolation level)

To meet the needs of recovery and replication, the recovery mechanism requirements of MySQL: before a transaction is committed, other concurrent transactions cannot insert any records that meet its locking conditions, that is, false reads are not allowed.

5. Deadlock

The problem of concurrency is not without deadlocks, and there will also be deadlocks in MySQL.

But generally speaking, MySQL helps us solve a lot of deadlocks through rollback, but deadlocks cannot be completely avoided. You can use the following experience to minimize deadlocks:

1) access tables and rows in a fixed order. For example, in the case of two job batch updates, the simple method is to sort the id list first and then execute it, which avoids the situation of cross-waiting locks; adjusting the sql order of the two transactions to the same order can also avoid deadlocks.

2) large transactions are divided into small ones. Large transactions tend to be deadlocks, and if the business allows, large transactions will be split into smaller ones.

3) in the same transaction, try to lock all the resources needed at once to reduce the probability of deadlock.

4) lower the isolation level. If the business allows, lowering the isolation level is also a better choice. For example, changing the isolation level from RR to RC can avoid many deadlocks caused by gap locks.

5) add reasonable indexes to the table. You can see that if you don't move the index, you will add a lock to each row of the table, which greatly increases the probability of deadlock.

6. Lock summary

It says a lot about MySQL database locks, so let's briefly summarize it.

In fact, we programmers seldom care about the watch lock:

In the MyISAM storage engine, it is automatically added when the SQL statement is executed.

In the InnoDB storage engine, table locks are added automatically if indexes are not used.

Most of us now use MySQL to support row locks using InnoDB,InnoDB:

Shared Lock-read Lock-S Lock

Exclusive lock-write lock-X lock

By default, select does not add any row locks ~ transactions can be shown to add shared or exclusive locks to the recordset through the following statement.

Shared lock (S): SELECT * FROM table_name WHERE... LOCK IN SHARE MODE .

Exclusive lock (X): SELECT * FROM table_name WHERE. FOR UPDATE .

InnoDB also implements MVCC multi-version concurrency control based on row locks, and MVCC works under Read committed and Repeatable read at isolation levels. MVCC can read and write without blocking!

The Repeatable read isolation level implemented by InnoDB with GAP gap locks has avoided misreading!

An optimistic lock is actually an idea, as its name suggests: update the data if it is not locked, and not update (rollback) if something is found to be wrong. It is often implemented by adding a version field to the database.

Pessimistic locking uses the row lock of the database, thinking that there will be concurrency conflicts in the database, so the data will be locked directly, and other transactions cannot be modified until the current transaction is committed.

Is it helpful for you to read the above content? If you want to know more about the relevant knowledge or read more related articles, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.