Analysis on the principle and usage of Index implementation for MySQL Database Optimization 04/29 Update SLTechnology News&Howtos

Analysis on the principle and usage of Index implementation for MySQL Database Optimization

2025-04-29 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

This paper gives an example to describe the principle and usage of index implementation for MySQL database optimization. Share with you for your reference, the details are as follows:

Index what is an index

Indexes are used to quickly find records with specific values, and all MySQL indexes are saved in the form of a B-tree. If there is no index, MySQL must scan all records of the entire table from the first record when executing the query until a record that meets the requirements is found. The more records there are in the table, the higher the cost of this operation. If an index has been created on a column that is a search condition, MySQL can quickly find the location of the target record without scanning any records. If the table has 1000 records, finding records by index is at least 100 times faster than scanning records sequentially.

Classification of indexes

Primary key index

The primary key is a unique index, but it must be specified as "PRIMARY KEY". If you have ever used AUTO_INCREMENT-type columns, you may already be familiar with concepts such as primary keys. The primary key is generally specified when the table is created, such as "CREATE TABLE tablename ([…], PRIMARY KEY (list of columns);". However, we can also add the primary key by modifying the table, such as "ALTER TABLE tablename ADD PRIMARY KEY (list of columns);". Each table can have only one primary key.

Create a primary key index

When a table sets a column as the primary key, the column is the primary key index

Create table aaa (id int unsigned primary key auto_increment, name varchar (32) not null default'')

This is the id column, which is the primary key index.

Create table bbb (id int, name varchar (32) not null default'')

If you do not specify a primary key index when you create the table, you can also add the command after creating the table:

Example:

Alter table table name add primary key (column name)

Delete primary key index

Alter table articles drop primary key

Query index

Desc table name; cannot display index name show index from table name show keys from table name

Full-text index

Create a table structure

CREATE TABLE articles (id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY, title VARCHAR, body TEXT, FULLTEXT (title,body)) engine=myisam charset utf8 INSERT INTO articles (title,body) VALUES ('MySQL Tutorial','DBMS stands for DataBase...'), ('How To Use MySQL Well','After you went through a...'), ('Optimizing MySQL','In this tutorial we will show...'), ('1001 MySQL Tricks','1. Never run mysqld as root. 2.), ('MySQL vs. YourSQL','In the following database comparison...'), ('MySQL Security','When configured properly, MySQL...')

Misuse:

Select * from articles where body like'% mysql%'; error usage index will not take effect

Correct usage:

Select * from articles where match (title,body) against ('database')

Description:

1. In mysql, fulltext index is only valid for myisam.

The fulltext provided by 2.mysql is effective for English-> sphinx (coreseek) technology to handle Chinese.

3. The method to use is match (field name …) Against ('keyword')

4. Full-text indexing: stop words, because in a text, creating an index is an infinite number, so some common words and characters will not be created, these words are called stop words. For example (a _

Mysql > select match (title,body) against ('database') from articles; (output the match between each line and database)

Unique index

This index is basically the same as the previous "normal index", but there is one difference: all values of the index column can only appear once, that is, they must be unique. Uniqueness indexes can be created in the following ways:

Create an index, such as CREATE UNIQUE INDEX ON tablename (list of columns)

Modify the table, such as ALTER TABLE tablename ADD UNIQUE [name of index] (list of columns)

Specify an index when creating a table, such as CREATE TABLE tablename ([…] , UNIQUE [name of index] (list of columns)

Create a table structure

Create table ddd (id int primary key auto_increment, name varchar (32) unique)

Be careful

The unique field can be NULL and have multiple NULL, but it cannot be repeated if it is specific.

But there cannot be a duplicate empty string''.

General index

The only task of a normal index (an index defined by the keyword KEY or INDEX) is to speed up access to data. Therefore, indexes should be created only for those data columns that appear most frequently in query criteria (WHEREcolumn=) or sort conditions (ORDERBYcolumn). Whenever possible, you should select the column with the most neat and compact data (such as a column of integer type) to create the index.

Create table ccc (id int unsigned,name varchar (32)) create index index name on table (column 1, column name 2)

The implementation principle of Index

Database index is a sorted data structure in database management system to help quickly query and update data in database tables. The implementation of the index usually uses the B-tree and its variant B + tree.

In addition to the data, the database system also maintains data structures that meet specific search algorithms, which refer to (point to) the data in some way, so that advanced search algorithms can be implemented on these data structures. This kind of data structure is the index.

There is a price to pay for setting an index for a table: one is to increase the storage space of the database, and the other is to spend more time inserting and modifying data (because the index has to change with it).

The figure above shows one possible way of indexing. On the left is the data table, which has two columns and seven records, and the leftmost one is the physical address of the data record (note that logically adjacent records are not necessarily physically adjacent on disk). To speed up the Col2 lookup, you can maintain a binary lookup tree shown on the right, with each node containing an index key and a pointer to the physical address of the corresponding data record, so that you can use binary lookup to obtain the corresponding data within the complexity of O (log2n).

Creating an index can greatly improve the performance of the system.

First, by creating a uniqueness index, you can ensure the uniqueness of each row of data in the database table.

Second, it can greatly accelerate the speed of data retrieval, which is also the main reason for creating an index.

Third, the connection between the meter and the table can be accelerated, especially in achieving the referential integrity of the data.

Fourth, when using grouping and sorting clauses for data retrieval, the time of grouping and sorting in the query can also be significantly reduced.

Fifth, through the use of index, we can use the optimization hidden device in the process of query to improve the performance of the system.

One might ask: adding an index has so many advantages, why not create an index on each column in the table? Because, increasing the index also has many disadvantages.

First, it takes time to create and maintain an index, which increases as the amount of data increases.

Second, the index needs to occupy the physical space, in addition to the data table occupies the data space, each index also occupies a certain amount of physical space, if you want to establish a clustered index, then the space needed will be more.

Third, when the data in the table is added, deleted and modified, the index should also be maintained dynamically, which reduces the speed of data maintenance.

The index is built on top of some columns in the database table. When creating an index, you should consider on which columns you can create an index and on which columns you cannot. In general, indexes should be created on these columns: on columns that need to be searched frequently, you can speed up the search; on columns that act as primary keys, force the uniqueness of the column and organize the arrangement structure of the data in the table; on columns that are often used in join, these columns are mainly foreign keys, which can speed up the join Create an index on the column that often needs to search by scope, because the index has been sorted, and its specified range is continuous; create an index on the column that often needs to be sorted, because the index has been sorted, so that the query can make use of the sorting of the index to speed up the sorting query time; create an index on the column that is often used in the WHERE clause to speed up the judgment of conditions.

Similarly, indexes should not be created for some columns. In general, these columns that should not be indexed have the following characteristics:

First, indexes should not be created for columns that are rarely used or referenced in queries. This is because, since these columns are rarely used, indexing or no indexing does not improve query speed. On the contrary, due to the increase of the index, the maintenance speed of the system is reduced and the space requirement is increased.

Second, the index should not be added to columns that have only a few data values. This is because, because these columns have very few values, such as the gender column of the personnel table, the data rows of the result set account for a large proportion of the data rows in the table in the results of the query, that is, a large proportion of the data rows that need to be searched in the table. Increasing the index does not significantly speed up the retrieval speed.

Third, columns defined as text, image, and bit data types should not be indexed. This is because these columns either have a large amount of data or have very few values.

Fourth, indexes should not be created when modification performance is much greater than retrieval performance. This is because modification performance and retrieval performance contradict each other. When the index is added, the retrieval performance is improved, but the modification performance is reduced. When the index is reduced, the modification performance is improved and the retrieval performance is reduced. Therefore, indexes should not be created when modification performance is much greater than retrieval performance.

Depending on the functionality of the database, three indexes can be created in the database designer: unique index, primary key index, and clustered index.

Unique index

A unique index is an index that does not allow any two rows to have the same index value.

When there are duplicate key values in existing data, most databases do not allow the newly created unique index to be saved with the table. The database may also prevent the addition of new data that will create duplicate key values in the table. For example, if a unique index is created on the employee's last name (lname) in the employee table, no two employees can have the same last name. Primary key index database tables often have a column or combination of columns whose values uniquely identify each row in the table. This column is called the primary key of the table. Defining a primary key for a table in a database diagram automatically creates a primary key index, which is a specific type of unique index. The index requires that each value in the primary key be unique. It also allows quick access to data when a primary key index is used in a query. Clustered index in a clustered index, the physical order of the rows in the table is the same as the logical (index) order of the key values. A table can contain only one clustered index.

If an index is not a clustered index, the physical order of rows in the table does not match the logical order of key values. Clustered indexes usually provide faster data access than nonclustered indexes.

Locality principle and disk pre-reading

Due to the characteristics of the storage medium, the access of the disk itself is much slower than the main memory, coupled with the cost of mechanical movement, the access speed of the disk is often one percentile of the main memory, so in order to improve efficiency, it is necessary to reduce the disk Imax O as much as possible. In order to achieve this goal, the disk is often not strictly read on demand, but will be pre-read every time, even if only one byte is needed, the disk will start from this position and read a certain length of data back into memory in order. The theory of this is based on the famous locality principle in computer science: when a data is used, the data near it is usually used immediately. The data needed during the running of the program is usually centralized.

Because of the high efficiency of disk sequential reading (no seek time, only a little rotation time), pre-reading can improve the efficiency of Ibind O for programs with locality.

The length of the pre-read is generally an integral multiple of the page. Pages are logical blocks of computer management memory. Hardware and operating systems often divide main memory and disk storage into continuous blocks of equal size. Each storage block is called a page (in many operating systems, the page size is usually 4k). Main memory and disk exchange data on a page-by-page basis. When the data to be read by the program is not in the main memory, a page fault exception will be triggered, and the system will send a read signal to the disk, which will find the starting position of the data and continuously read one or more pages back into memory, and then return with the exception, and the program continues to run.

Performance Analysis of B-/+Tree Index

At this point, you can finally analyze the performance of B-/+Tree indexes.

As mentioned above, disk Ibind O is generally used to evaluate the quality of the index structure. First from the B-Tree analysis, according to the definition of B-Tree, we can know that a search requires a maximum of h nodes. The designers of the database system skillfully make use of the principle of disk pre-reading, setting the size of a node to equal to one page, so that each node can be fully loaded only once. To achieve this, you need to use the following techniques in the actual implementation of B-Tree:

Each time you create a new node, you directly apply for a page of space, which ensures that a node is physically stored in a page. In addition, the computer storage allocation is page-aligned, so that a node needs only one Imax O.

One search in B-Tree requires no more than one search in hmi (root node resident memory), and the asymptotic complexity is O (h) = O (logdN). In general practical applications, the output degree d is a very large number, usually more than 100, so h is very small (usually no more than 3).

On the other hand, the structure of the red-black tree is obviously much deeper. Because the logically close nodes (father and son) may be physically far away and can not make use of the locality, the asymptotic complexity of the red-black tree is also O (h), which is obviously much lower than that of B-Tree.

To sum up, it is very efficient to use B-Tree as an index structure.

You should spend time learning the data structures of B-trees and B + trees.

1) B tree

Each node in the B-tree contains the address pointer to the data object of the key value and key-value pair, so a successful search for an object does not have to reach the leaf node of the tree.

Successful search includes intra-node search and search along a certain path, and the successful search time depends on the level of keys and the number of keys in the node.

The way to find a given keyword in the B-tree is to first take the root node and the keyword K1, which is contained in the root node. Kj looks for a given keyword (sequential search or binary lookup method is available), and if a keyword equal to a given value is found, the search is successful; otherwise, it must be sure that the keyword to be looked up is between a Ki or Ki+1, so the next layer Inode block referred to by Pi is taken to continue searching until it is found, or when the pointer Pi is empty.

2) B + tree

The keys stored in the non-leaf node of the B+ tree do not indicate the address pointer of the data object, and the non-leaf node is only the index part. All the leaf nodes are on the same layer and contain all the keys and the storage address pointers of the corresponding data objects, and the leaf nodes are linked in the order from the smallest to the largest. If the actual data objects are stored in the order in which they are added instead of the number of key codes, the index of the leaf node must be dense, and if the actual data storage is stored in the order of key codes, the index of the leaf node is sparse.

The B + tree has two head pointers, one is the root node of the tree, and the other is the leaf node of the minimum key.

So there are two search methods for B+ trees:

One is to search in the order of the linked list pulled up by the leaf node itself.

One is to search from the root node, similar to the B-tree, but if the key of the non-leaf node is equal to a given value, the search does not stop, but continues along the right pointer to the key on the leaf node. So no matter whether the search is successful or not, you will walk through all the layers of the tree.

In the B+ tree, the insertion and deletion of data objects take place only on the leaf node.

The differences between the two data structures that deal with indexes:

The same key value does not appear many times in the amemory B tree, and it may appear in the leaf node or in the non-leaf node. The keys of the B + tree must appear in the leaf nodes, and may also be repeated in the non-leaf nodes in order to maintain the balance of the B + tree.

B, because the location of the B-tree key is uncertain and appears only once in the whole tree structure, although it can save storage space, it obviously increases the complexity of inserting and deleting operations. Compared with B + tree, it is a good compromise.

The query efficiency of the cMagne B tree is related to the position of the key in the tree, the maximum time complexity is the same as that of the B + tree (at the leaf node), and the minimum time complexity is 1 (at the root node). However, the complexity of the B+ tree is fixed to a built tree. You can scan the power of 2.

The cost of the index

Take up disk space

Impact on the efficiency of DML (update, delete, insert) statements

Additions, deletions and changes will affect the index because the index needs to be reorganized.

Types of indexes allowed by the storage engine

Myisam btree

Innodb btree

Memory/yeap Hash,btree

Which columns are suitable for adding indexes

① query should create an index as a query condition field

Fields with poor ② uniqueness are not suitable for creating separate indexes, even if they are frequent

Select * from emp where sex=' male'

③ updates fields frequently and does not define indexes.

④ will not appear in the field of the where statement. Do not create an index.

Summary: you should create an index only if you fill the fields with the following conditions.

① must be used frequently under where conditions.

② the contents of this field are not the only values.

The content of ③ field does not change frequently.

Matters needing attention of index

Create a table

New dept data

Create PROCEDURE insert_dept (in start int (10), in max_num int (10)) BEGIN declare i int DEFAULT 0; set autocommit=0; REPEAT set iDifference 1; insert into dept values ((start+i), rand_string (10), rand_string (8)); UNTIL I = max_num end REPEAT; commit;END execution call insert_dept (100m 10)

Create a primary key index

Alter table table name add primary key (column name)

Create a federated index

Alter table dept add index my_ind (dname,loc); / / the column on the left of dname, loc is the column on the right

Note:

1. For a multi-column index created, if you do not use the first part, the index will not be created.

Explain select * from dept where loc='aaa'\ G

The index will not be used.

two。 Fuzzy queries that begin with a percent sign before like are invalidated.

3. If there is an or in the condition, it will not be used even if there is a conditional index in it. In other words, all fields that are required to be used must be indexed. We suggest that you avoid using the or keyword as much as possible.

4. If the column type is a string, be sure to quote the data in quotation marks in the condition. Otherwise, the index is not used. (when adding, the string must be''), that is, if the column is a string type, be sure to include it with''.

5. If mysql estimates that using a full table scan is faster than using an index, the index is not used.

Query usage show status like 'handler_read%'

You can pay attention to:

Handler_read_key: this value is as high as possible, which indicates the number of times it has been queried using the index.

Handler_read_rnd_next: the higher this value, the less efficient the query.

Readers who are interested in MySQL-related content can check out this site's special topics: "Summary of MySQL Index Operation skills", "Summary of MySQL Common functions", "Collection of MySQL Log Operation skills", "MySQL transaction Operation skills Summary", "MySQL stored procedure skills Collection" and "MySQL Database Lock related skills Summary"

It is hoped that what is described in this article will be helpful to everyone's MySQL database design.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.