Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

An example Analysis of the implementation principle of Mysql Index

2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)05/31 Report--

This article mainly shows you the "sample analysis of the principle of Mysql index implementation", the content is simple and clear, hoping to help you solve your doubts, let the editor lead you to study and study the "sample analysis of the principle of Mysql index implementation" this article.

MySQL index type 1. Introduction

MySQL currently has the following main index types:

1. General index

two。 Unique index

3. Primary key index

4. Combinatorial index

5. Full-text index

The statement CREATE TABLE table_ name [col _ name data type] [unique | fulltext] [index | key] [index_name] (col_ name [length]) [asc | desc]

1.unique | fulltext is an optional parameter, indicating unique index and full-text index, respectively.

2.index and key are synonyms for the same purpose and are used to specify the creation of an index

3.col_name is the field column that needs to be indexed, which must be selected from multiple columns defined in the data table

4.index_name specifies the name of the index, which is an optional parameter. If not specified, the default col_name is the index value.

5.length is an optional parameter, indicating the length of the index. Only fields of string type can specify the length of the index.

6.asc or desc specifies index value storage for ascending or descending order

III. Index type

1. General index

Is the most basic index, it does not have any restrictions. It can be created in the following ways:

(1) create an index directly

CREATE INDEX index_name ON table (column (length))

(2) add an index by modifying the table structure

ALTER TABLE table_name ADD INDEX index_name ON (column (length))

(3) create the index when you create the table

CREATE TABLE `table` (`id` int (11) NOT NULL AUTO_INCREMENT, `title` char (255) CHARACTER NOT NULL, `content` text CHARACTER NULL, `time` int (10) NULL DEFAULT NULL, PRIMARY KEY (`id`), INDEX index_name (title (length)

(4) Delete the index

DROP INDEX index_name ON table

two。 Unique index

Similar to the previous normal index, except that the value of the index column must be unique, but null values are allowed. If it is a combined index, the combination of column values must be unique. It can be created in the following ways:

(1) create a unique index

CREATE UNIQUE INDEX indexName ON table (column (length))

(2) modify the table structure

ALTER TABLE table_name ADD UNIQUE indexName ON (column (length))

(3) specify directly when creating the table

CREATE TABLE `table` (`id` int (11) NOT NULL AUTO_INCREMENT, `title` char (255) CHARACTER NOT NULL, `content` text CHARACTER NULL, `time` int (10) NULL DEFAULT NULL, UNIQUE indexName (title (length)

3. Primary key index

Is a special unique index where a table can have only one primary key and no null values are allowed. Typically, the primary key index is created at the same time as the table is created:

CREATE TABLE `table` (`id` int (11) NOT NULL AUTO_INCREMENT, `title` char (255) NOT NULL, PRIMARY KEY (`id`))

4. Combinatorial index

An index created on multiple fields, which is used only if the first field when the index was created is used in the query condition. Follow the leftmost prefix set when using a combined index

ALTER TABLE `table` ADD INDEX name_city_age (name,city,age)

5. Full-text index

It is mainly used to find keywords in the text, rather than comparing them directly to the values in the index. The fulltext index is very different from other indexes in that it is more like a search engine than a simple parameter match for a where statement. Fulltext indexes are used with match against operations, rather than normal where statements plus like. It can be used in create table,alter table and create index, but currently only full-text indexes can be created on char and varchar,text columns. It is worth mentioning that when the amount of data is large, it is much faster to put the data into a table without a global index, and then create a fulltext index with CREATE index than to create a fulltext for a table and then write the data.

(1) it is suitable to add a full-text index to create a table

CREATE TABLE `table` (`id` int (11) NOT NULL AUTO_INCREMENT, `title` char (255) CHARACTER NOT NULL, `content` text CHARACTER NULL, `time` int (10) NULL DEFAULT NULL, PRIMARY KEY (`id`), FULLTEXT (content))

(2) modify the table structure to add a full-text index

ALTER TABLE article ADD FULLTEXT index_content (content)

(3) create the index directly.

CREATE FULLTEXT INDEX index_content ON article (content) IV. Disadvantages

1. Although indexing greatly improves query speed, it also slows down the speed of updating tables, such as insert, update, and delete on tables. Because when updating the table, you should not only save the data, but also save the index file.

two。 Create an index file that takes up disk space. In general, this problem is not too serious, but if you create multiple combined indexes on a large table, the index file will grow very fast.

Indexes are only one factor in improving efficiency, and if you have tables with a large amount of data, you need to take the time to study how to build the best indexes, or to optimize query statements.

V. matters needing attention

There are some tips and considerations when using indexes:

1. The index will not contain columns with null values

As long as the column contains a null value, it will not be included in the index, and as long as one column in the composite index contains a null value, the column is invalid for the composite index. So let's not let the default value of the field be null when designing the database.

two。 Use short index

Index the string column and specify a prefix length if possible. For example, if you have a column of char, and if most of the values are unique within the first 10 or 20 characters, do not index the entire column. Short indexes can not only improve the query speed, but also save disk space and Imax O operations.

3. Index column sorting

The query uses only one index, so if the index is already used in the where clause, the columns in order by will not use the index. Therefore, do not use the sort operation when the database default sorting can meet the requirements; try not to include multiple column sorting, and it is best to create a composite index on these columns if necessary.

4.like statement operation

Generally speaking, like operation is not recommended. If you have to use it, how to use it is also a problem. Like "% aaa%" does not use an index while like "aaa%" can use an index.

5. Do not operate on columns

This will cause the index to fail and perform a full table scan, such as

SELECT * FROM table_name WHERE YEAR (column_name) node)

}

Data = BTree_Search (root, my_key)

There are a series of interesting properties about B-Tree. For example, if a B-Tree of degree d is indexed with N key, then the upper limit of its tree height is logd ((nasty 1) / 2) logd ((nasty 1) / 2), and the search of a key has an asymptotic complexity of O (logdN) O (logdN). From this we can see that B-Tree is a very efficient indexing data structure.

In addition, because inserting and deleting new data records will destroy the nature of B-Tree, it is necessary to split, merge, transfer and other operations on the tree to maintain the B-Tree property when inserting and deleting. This article does not intend to discuss these contents completely, because there are already a lot of materials detailing the mathematical properties of B-Tree and the insertion and deletion algorithm. Interested friends can find the corresponding materials to read in the references column at the end of this article.

B+Tree

There are many variants of B-Tree, the most common of which is B+Tree. For example, MySQL generally uses B+Tree to implement its index structure.

Compared to B-Tree, B+Tree has the following differences:

The upper limit of the pointer per node is 2d instead of 2d+1.

The inner node does not store data, only the key; leaf node does not store pointers.

Figure 3 is a simple B+Tree diagram.

Figure 3

Because not all nodes have the same domain, the leaf nodes and inner nodes in B+Tree are generally different in size. This is different from B-Tree, although the number of key and pointers stored in different nodes in B-Tree may be different, but the domain and upper limit of each node are the same, so in the implementation, B-Tree often applies for the same amount of space for each node.

Generally speaking, B+Tree is more suitable for implementing external storage index structure than B-Tree, and the specific reasons are related to the principle of external memory and computer access, which will be discussed below.

B+Tree with sequential access pointers

The B+Tree structures commonly used in database systems or file systems are optimized on the basis of classical B+Tree, adding sequential access pointers.

Figure 4

As shown in figure 4, adding a pointer to an adjacent leaf node in each leaf node of the B+Tree forms a B+Tree with sequential access pointers. The purpose of this optimization is to improve the performance of interval access. For example, in figure 4, if you want to query all data records whose key is from 18 to 49, when 18 is found, you can access all data nodes at once simply by traversing the node and pointer order, which greatly mentions the efficiency of interval query.

This section gives a brief introduction to B-Tree and B+Tree, and the next section introduces why B+Tree is the preferred data structure for database systems to implement indexes with the principle of memory access.

Why use B-Tree (B+Tree)

As mentioned above, data structures such as red-black trees can also be used to implement indexes, but file systems and database systems generally use B-/+Tree as the index structure. This section will discuss the theoretical basis of B-/+Tree as an index combined with the relevant knowledge of computer composition principles.

Generally speaking, the index itself is too large to be stored in memory, so the index is often stored on disk in the form of an index file. In this way, the disk Imax O consumption will be generated in the index search process, which is several orders of magnitude higher than the memory access. Therefore, the most important index to evaluate the quality of a data structure as an index is the progressive complexity of the number of disk Imax O operations in the search process. In other words, the structure of the index should minimize the number of access to disk Imando O during the lookup process. The following first introduces the principles of memory and disk access, and then combines these principles to analyze the efficiency of B-/+Tree as an index.

Why does B-tree need B-tree?

There are two mechanical motion parts in the disk, namely the disk rotation and the movement of the magnetic arm. Disk rotation is the number of revolutions per minute mentioned in the market, while disk movement is after the disk rotates to a specified position, after moving the magnetic arm to start reading and writing data. Then there is a process of locating the block in the disk, and positioning takes a lot of time to access the disk, after all, mechanical movement takes much more time than electronic movement. When large-scale data is stored on disk, positioning is obviously a very time-consuming process, but we can optimize it through B-tree to improve the efficiency of positioning when reading disk.

Why can class B trees be optimized? According to the characteristics of the class B tree, we can construct a multi-order class B tree, and then store the relevant information on the nodes as much as possible to ensure that the number of layers is as few as possible, so that we can find the information more quickly, and the disk has less Igo O operations, and the class B tree is a balanced tree, and the height from each node to the leaf node is the same, which ensures that every query is stable.

Brief introduction

The B-tree here, that is, B-Tree in English, a B-tree of order m satisfies the following conditions:

Each node has at most m sub-trees.

The root node has at least two subtrees (in the case of subtrees)

Except for the root node, each branch node has at least 2 subtrees.

All the leaf nodes are on the same layer.

For the branch nodes of k sub-trees, there is a kmurl key, and the key codes are arranged in increasing order.

The number of keywords needs to meet ceil (m _ ram 2)-1 to accelerate the data to be used in pre-reading.

2.B+Tree can put multiple child nodes in a single node with the same number of IO times to retrieve more information.

3.B+TREE stores data only on leaf nodes & all leaf nodes contain a chain pointer & other inner non-leaf nodes store only index data. Only use the index to quickly locate the data index range, first locate the index and then locate the data efficiently and quickly through the index.

Mysql design uses the disk pre-reading principle, a B+Tree node size is set to a page size, when the new node is directly applied for a page space, so that a node can be physically stored in a page, coupled with the computer storage allocation is aligned by the page, so that each Node node needs only one Node O operation.

B-Tree index, B+Tree index: multiple child nodes can be placed in a single node with the same number of IO queries (mysql query IO up to 3-5 times-so each node needs to store a lot of data)

B+TREE stores data only on leaf nodes & all leaf nodes contain a chain pointer & other inner non-leaf nodes store only index data. Only use the index to quickly locate the data index range, first locate the index and then locate the data efficiently and quickly through the index.

B+Tree is more suitable for out-of-memory indexes, which is related to the degree d of the inner node. From the above analysis, we can see that the larger d, the better the performance of the index, and the upper limit of the output depends on the size of key and data in the node:

The nodes in the B+Tree have removed the data domain, so they can have greater output and better performance. Only use the index to quickly locate the data index range, first locate the index and then locate the data efficiently and quickly through the index.

Dmax=floor (pagesize/ (keysize+datasize+pointsize))

The underlying implementation of Mysql index-MyISAM & InnoDB

Clustered index: the index and data file are the same file. Non-clustered index: an index that is separate from a data file.

Both MyISAM and InnoDB use the B+Tree index structure. But the underlying index storage is different, MyISAM uses a non-clustered index, while InnoDB uses a clustered index.

MyISAM indexing principle: non-clustered index-MyISAM myi index file is separated from myd data file, and the index file only stores the pointer address of the data record. The leaf node data field stores the pointer address to the data record. (underlying storage structure: frm-table definition, myi-myisam index, myd-myisam data)

The MyISAM index searches by B+Tree, takes out the value of its data field if the specified Key exists, and then reads the corresponding data record with the data field value-data pointer address. There is no structural difference between the secondary index and the primary index, except that the primary index requires key to be unique, while the key of the secondary index can be repeated. The MyISAM index tree is as follows:

InnoDB advantages: high scalability, full use of hardware performance, Crash Safe, support for transactions, online hot backup

InnoDB features:

1. Transaction support (ACID) 2. Excellent expansibility 3. There is no conflict between reading and writing. Cache acceleration

two。 Functional components: redo/undo & Asynchronous IO & MVCC & Row level Lock & Page Cache (LRU)

The physical storage structure of InnoDB is shown below:

InnoDB tablespace management

Description of InnoDB physical storage file structure:

InnoDB is organized by tablespace Tablespace (idb file) structure, each Tablespace contains multiple Segment segments, each segment is divided into two types: leaf node Segment& non-leaf node Segment), a Segment segment contains multiple Extent, an Extent occupies 1m space contains 64 Page (each Page 16k), an InnoDB B-Tree logical node is assigned a physical Page, a node an IO operation. A Page contains a lot of ordered data Row row data, Row row data contains Filed attribute data and other information.

Tablespaces (ibd files)

Segment (an index 2 segment: leaf node Segment & non-leaf node Segment)

Extent (1MB): an Extent (1m) contains 64 Page (16k), a Page contains many ordered rows of data, an InnoDB B-Tree logical node is assigned a physical Page, and a node has an IO operation.

Page (16KB)

Row

Field

The principle of table insertion data expansion: expand one Extent space (1m) at a time, 64 Page, and insert order into each page according to the sequential structure.

InnoDB logical organizational structure:

InnoDB index tree structure

One B + tree per index, one B + tree node = one physical Page (16K)

The data is Page and numbered according to the 16KB slice, and the number can be mapped to the physical file offset (16K * N). The leaf node of the B+ tree forms a bi-directional linked list, the data is indexed and clustered by the primary key, and the secondary index leaf node stores the primary key value, and the primary key value of the leaf node is returned to the table to find the data.

InnoDB indexing principle:

Using clustered index-InnoDB data-index file as an idb file, the table data file itself is the main index, and the adjacent indexes are close to storage. The leaf node data field holds the complete data record (data [column data except primary key id] + primary index [index key: table primary key id]). The leaf node stores the data record directly, takes the primary key id as the key, and stores the data record directly in the leaf node. (underlying storage structure: frm-table definition, ibd: innoDB data & index file)

Note: because the InnoDB is stored in a clustered index structure, the data files of the index InnoDB need to be clustered according to the primary key, so InnoDB requires that the table must have a primary key (MyISAM can be absent). If mysql is not specified, it automatically selects a column that uniquely represents the data record as the primary key, and if no such column exists, mysql automatically generates an implicit field (6-byte length integer) as the primary key for the InnoDB table. All secondary indexes of InnoDB refer to the primary key of the data record as the data field.

The implementation of clustered index makes the search by the primary key very efficient, but the secondary index search needs to retrieve the index twice: first, retrieve the secondary index to obtain the primary key of the data record, and then use the primary key to retrieve the data record in the primary index. InnoDB clustering index structure:

Index lookup process:

# 1. Index precise search: determine the positioning conditions, find the root node Page No, the root node reads memory, look down layer by layer, read the leaf node Page, find the record or miss through binary search. (select * from user_info where id = 23)

# 2. Index range search: read the root node to memory, determine the index positioning condition id=18, and find the first leaf node that meets the condition.

, scan all results sequentially until the termination condition satisfies id > = 22 (select * from user_info where id > = 18 and id < 22)

# 3. Full table scan: directly read the leaf node head node, scan sequentially, return qualified records, to the end of the final node

(select * from user_info where name = 'abc')

# 4. Secondary index lookup

Create table table_x (int id primary key, varchar (64) name,key sec_index (name))

Select * from table_x where name = "d"

Find out the corresponding primary key through the secondary index, and take the primary key back to the table to check the primary key index to get the data. The secondary index can filter out a large number of invalid records and improve efficiency.

The above is all the contents of the article "sample Analysis of the principle of Mysql Index implementation". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report