Introduction to InnoDB Storage Index and algorithm of MySQL 07/09 Update SLTechnology News&Howtos

Introduction to InnoDB Storage Index and algorithm of MySQL

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Today, I will talk to you about the InnoDB storage index and algorithm of MySQL, which may not be well understood by many people. In order to make you understand better, the editor has summarized the following contents for you. I hope you can get something according to this article.

InnoDB definition

InnoDB is the preferred engine for transactional databases, supporting ACID transactions and row-level locking. InnoDB is designed for maximum performance when dealing with large amounts of data. The InnoDB storage engine is fully integrated with the MySQL server, and the InnoDB storage engine maintains its own buffer pool to cache data and indexes in main memory. InnoDB stores its tables & indexes in one tablespace, which can contain several files (or raw disk partitions).

InnoDB storage index

In a database, if there are too many indexes, the performance of the application may be affected; if there are too few indexes, it will affect the query performance. Therefore, we have to pursue a balance between the two, enough indexes to improve query performance, but not because there are too many indexes lead to overload when modifying data and other operations.

InnoDB supports three common indexes:

● hash index

● B+ tree index

● full-text index

What we will explain in detail next is the B+ tree index and the full-text index.

Hash indexing

Before we learn the hash index, let's learn some basic knowledge: the hash algorithm. Hash algorithm is a commonly used algorithm with a time complexity of O (1). It is used not only in indexes, but also in various database applications.

Hash table

Hash tables (Hash Table), also known as hash tables, are improved from direct addressing tables.

In this table, U represents a complete set of keywords, K represents actual keywords, and the array on the right (hash table) represents contiguous space that can be addressed directly in memory. The real address of the actual data is stored in the one-way linked list associated with each slot in the hash table.

If the array on the right uses the direct addressing table directly, then there will be an hk for each keyword K and will not repeat, so there are some problems, if the amount of U data is too large, then it is a bit impractical for the available capacity of the computer. If the proportion of set K to U is too small, most of the allocated space will be wasted.

So we use hash table, we use some functions h (k) to determine the mapping relationship, so that the discrete data as evenly distributed as possible using slots in the array, but there will be a problem, multiple keywords are mapped to the same slot, this situation is called collision, the database uses the simplest solution: chaining. That is, each slot stores a single linked list, and all colliding elements form a node in the linked list in turn. If it does not exist, the linked list points to NULL.

The function h (k) used becomes a hash function, and it must be able to hash well. It is best to avoid collisions or achieve minimum collisions. In general, in order to better deal with hash keywords, we will convert them to natural numbers, and then do so through division hash, multiplication hash, or global hash. The database generally uses division hashing, that is, when there are m slots, we modulo m for each keyword k: h (k) = k% m.

Hash algorithm in InnoDB Storage engine

The InnoDB storage engine uses the hash algorithm to find the dictionary, the conflict mechanism uses the linked list, and the hash function uses the division hash. For the hash table of the buffer pool, each page in the cache pool has a chain pointer to a page with the same hash value. For the division hash, the value of m is a prime number slightly more than 2 times the number of buffer pool pages. If the current innodb_buffer_pool_size size is 10m, there are 640 16KB pages that need to be allocated 1280 slots, while the prime number is slightly larger than 1399, so a hash table of 1399 slots is allocated to query pages in the buffer pool.

For converting each page to a natural number, each tablespace has a space_id, and the user wants to query the page of a contiguous 16KB in the space, that is, the offset. InnoDB moves the space_id 20 bits to the left, plus space_id and offset, that is, K=space_id show create table dimensionsConf. | | Table | Create Table | dimensionsConf | CREATE TABLE `dimensionsConf` (`id` int (11) NOT NULL AUTO_INCREMENT, `name` varchar (20) DEFAULT NULL, `remark` varchar (1024) NOT NULL, PRIMARY KEY (`id`), FULLTEXT KEY `fullindex_ remark` (`remark`) ENGINE=InnoDB AUTO_INCREMENT=178 DEFAULT CHARSET=utf8 | 1 row in set (0.00 sec) # first test the name attribute sorting of a non-primary key and find it. You can see that no index is used and filesort (file sorting) is required. The rows here is the estimated number of output lines mysql > explain select * from dimensionsConf order by name limit 10\ G * * 1. Row * * id: 1 select_type: SIMPLE table: dimensionsConf type: ALLpossible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 57 Extra: Using filesort1 row in set (0.00 sec) # then test the sorting of the primary key id and find At this point, the primary key index is used, and there is no filesort operation in the execution plan. This is the optimization mysql > explain select * from dimensionsConf order by id limit 10\ G brought about by clustered indexes. * * 1. Row * * id: 1 select_type: SIMPLE table: dimensionsConf type: indexpossible_keys: NULL key: PRIMARY key_len: 4 ref: NULL rows: 10 Extra: NULL1 row in set (0.00 sec) # search again according to the range of the primary key id At this point, you can quickly get the range according to the upper node of the leaf node, and then read the data mysql > explain select * from dimensionsConf where id > 10 and id select * from information_schema.INNODB_FT_DEFAULT_STOPWORD. +-+ | value | +-+ | a | about | | an | | are | | at | | be | | by | | com | | de | | en | | for | | from | how | I | in | | is | it | la | | of | on | on | or | that | the | this | to | was | what | when | where | who | will | with | und | the | www | + | -+ 36 rows in set (0.00 sec)

Other restrictions include:

● can only have one full-text search index per table

The full-text retrieval index of ● multi-column combination must use the same character set and character order. If you do not understand, you can refer to the causes of MySQL garbled and set the UTF8 data format.

● does not support languages without word delimiters (delimiter), such as Chinese, Japanese, Korean, etc.

Full-text retrieval

We create a full-text index:

There are two methods for full-text retrieval:

● Natural language (Natural Language), default method, can be omitted: (IN NATURAL LANGUAE MODE)

● Boolean pattern (Boolean Mode): (IN BOOLEAN MODE)

Natural languages also support an extension mode, followed by: (WITH QUERY EXPANSION).

The syntax is MATCH ()... AGAINST (), MATCH specifies the column to be queried, and AGAINST specifies which method to query.

Natural language retrieval

We can also check the correlation of each row of data, which is a non-negative floating-point number, and 0 means there is no correlation:

Mysql > select id,remark,match (remark) against ('baby') as relevance from dimensionsConf +-+-- +-+ | id | remark | relevance | +-+-- +-+ | | c | 0 | 0 | 111 | operator | 0 | 115 | a baby like panda | 2.1165735721588135 | 1165735721588135 | a baby like panda | 2.1165735721588135 | +-+-- +-+ 4 rows in set (0.01sec)

Boolean schema retrieval

MySQL also allows full-text search with modifiers, where special characters have special meanings:

+: the word must exist -: the word must be excluded (no operator): the word is optional, if it appears, the correlation is higher @ distance: multiple words of the query must be within the specified range >: increase the correlation when the word appears

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.