How to use MySQL clustered index and nonclustered index 07/19 Update SLTechnology News&Howtos

How to use MySQL clustered index and nonclustered index

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

Editor to share with you how to use MySQL clustered index and non-clustered index, I believe most people do not know much about it, so share this article for your reference, I hope you can learn a lot after reading this article, let's go to know it!

The index is divided into clustered index and non-clustered index.

Take an English textbook as an example. If you want to find lesson 8, turn the book directly. If you turn to lesson 5 first, then turn backward, then turn back to lesson 10, then turn forward. The book itself is an index, that is, "clustering index".

If you are looking for the word "fire", you will turn to the appendix at the back of the book, which is in alphabetical order. Find the piece of the letter F, and then find the word "fire". It will correspond to which lesson it is in. This appendix is the "non-clustered index".

Thus it can be seen that the order of clustered indexes is the order in which the data is stored, so it is easy to understand that a data table can only have one clustered index.

Clustered index is much more efficient than non-clustered index, especially when it comes to range query. So, as to whether the clustered index should be a primary key or other fields, this can be discussed again.

1. Index of MYSQL

In mysql, different storage engines implement the index in different ways, roughly speaking, MyISAM and InnoDB storage engines.

The data on the leaf node of MyISAM's B+Tree is not the data itself, but the address where the data is stored. The primary index is no different from the secondary index, except that the key in the primary index must be unique. The indexes here are all non-clustered.

MyISAM also uses a compression mechanism to store indexes. For example, if the first index is "her" and the second index is "here", then the second index will be stored as "3jue e". The disadvantage is that indexes in the same node can only be searched sequentially.

The data file of InnoDB itself is the index file, the data on the leaf node of B+Tree is the data itself, and key is the primary key, which is the clustering index. Non-clustered index, the data on the leaf node is the primary key (so the key of the clustered index cannot be too long). The reason why the primary key is stored instead of the address where the record is located is quite simple, because the address where the record is located is not guaranteed to remain the same, but the primary key is guaranteed.

As for why the primary key is usually recommended to use self-increment id?

2. Clustering index

The physical storage order of the data of the cluster index is consistent with the index order, that is, as long as the index is adjacent, then the corresponding data must also be stored on the disk adjacent. If the primary key is not self-incrementing id, you can imagine what it will do, constantly adjusting the physical address of the data, paging, and of course there are other measures to reduce these operations, but can not be completely avoided. However, if it is self-increasing, it is simple, it only needs to be written page by page, the index structure is relatively compact, the disk fragmentation is less, and the efficiency is high.

Clustering index can greatly improve the efficiency not only in retrieval, but also in data reading. For example, you need to query all the words of fancit.

One primary index using MyISAM and one clustered index using InnoDB. The B+Tree retrieval time of the two indexes is the same, but there is a difference when reading.

Because the primary index of MyISAM is not a clustered index, then the physical address of his data must be messy. Get these physical addresses, read them according to the appropriate algorithm, and start to seek and rotate constantly. The clustered index needs only one Icano.

However, when it comes to sorting of large amounts of data, full table scans, count, and so on, MyISAM has an advantage because indexes take up a small amount of space and these operations need to be done in memory.

In view of the range query efficiency of the clustered index, many people think that using the primary key as the clustered index is too wasteful. After all, it is almost impossible to use the primary key for range query. However, if we consider the storage of clustered indexes, it is difficult to come to a conclusion.

The following table shows when to use clustered and non-clustered indexes:

action

Use clustered index

Use non-clustered index

Columns are often sorted in groups

Should

Return data within a certain range

Should

Should not

One or very few different values

Should not

A small number of different values

Should

Should not

A large number of different values

Should not

Should

Frequently updated columns

Should not

Should

Foreign key column

Should

Primary key column

Should

Frequently modify index columns

Should not

Should

According to the tuning practice, attention should be paid to the selection of clustering index. First of all, we need to find our most commonly used SQL query, such as this example is the case of a similar combination conditional query, it is best to use a combined clustering index, and the most used fields should be placed in front of the combined clustering index, otherwise the index will not have a good effect.

Why is Index seek better than Index scan?

Index scanning means traversing the B-tree, while seek is the direct location of the B-tree lookup.

Index scan is most likely to appear in index columns in expressions. The database engine cannot directly determine the value of the column you want, so it can only scan the entire index for calculation. Index seek is much better. The database engine only needs to scan a few branch nodes to locate the records you want. In turn, if the leaf node of a clustered index is a record, then Clustered Index Scan is basically equivalent to full table scan.

Some optimization principles

1. The index built by default is a non-clustered index, but sometimes it is not the best. Under a non-clustered index, the data is physically randomly stored on the data page. Reasonable index design should be based on the analysis and prediction of various queries. Generally speaking:

a. There are a large number of duplicate values and there are often range queries (>

< ，>

< =）和order by、group by发生的列，可考虑建立群集索引； b.经常同时存取多列，且每列都含有重复值可考虑建立组合索引； c.组合索引要尽量使关键查询形成索引覆盖，其前导列一定是使用最频繁的列。索引虽有助于提高性能但不是索引越多越好，恰好相反过多的索引会导致系统低效。用户在表中每加进一个索引，维护索引集合就要做相应的更新工作。 2、ORDER BY和GROPU BY使用ORDER BY和GROUP BY短语，任何一种索引都有助于SELECT的性能提高。 3、多表操作在被实际执行前，查询优化器会根据连接条件，列出几组可能的连接方案并从中找出系统开销最小的最佳方案。连接条件要充份考虑带有索引的表、行数多的表；内外表的选择可由公式：外层表中的匹配行数*内层表中每一次查找的次数确定，乘积最小为最佳方案。 4、任何对列的操作都将导致表扫描，它包括数据库函数、计算表达式等等，查询时要尽可能将操作移至等号右边。 5、IN、OR子句常会使用工作表，使索引失效。如果不产生大量重复值，可以考虑把子句拆开。拆开的子句中应该包含索引。建立聚簇索引的思想 1、大多数表都应该有聚簇索引或使用分区来降低对表尾页的竞争，在一个高事务的环境中，对最后一页的封锁严重影响系统的吞吐量。 2、在聚簇索引下，数据在物理上按顺序排在数据页上，重复值也排在一起，因而在那些包含范围检查(between、=)或使用group by或orderby的查询时，一旦找到具有范围中第一个键值的行，具有后续索引值的行保证物理上毗连在一起而不必进一步搜索，避免了大范围扫描，可以大大提高查询速度。 3、在一个频繁发生插入操作的表上建立聚簇索引时，不要建在具有单调上升值的列(如IDENTITY)上，否则会经常引起封锁冲突。 4、在聚簇索引中不要包含经常修改的列，因为码值修改后，数据行必须移动到新的位置。 5、选择聚簇索引应基于where子句和连接操作的类型。聚簇索引的侯选列 1、主键列,该列在where子句中使用并且插入是随机的。 2、按范围存取的列，如pri_order >

100 and pri_order < 200.

3. Columns used in group by or order by.

4. Columns that are not often modified.

5. Columns used in join operations.

Non-clustered index

Non-clustered index, leaf-level pages point to the records in the table, and the physical order of records is not necessarily related to the logical order. A non-clustered index is more like a standard index table for a book, and the order in the index table is usually inconsistent with the actual page number order.

There can be only one clustered index per table because records in a table can only be stored in one physical order. However, a table can have more than one non-clustered index. In fact, you can build up to 249 non-clustered indexes per table. Non-clustered indexes require a lot of hard disk space and memory. In addition, although a non-clustered index can increase the speed of fetching data from a table, it also slows down the speed of inserting and updating data into the table. Whenever you change the data in a table that has a non-clustered index, you must update the index at the same time. Therefore, you should carefully consider when building a non-clustered index on a table. If you expect a table to update data frequently, don't build too many non-clustered indexes on it. In addition, if the hard disk and memory space is limited, the number of non-clustered indexes should also be limited.

Use of non-clustered index

The index established by SQLServer by default is a non-clustered index, because the non-clustered index does not reorganize the data in the table, but stores the index column value for each row and uses a pointer to the page where the data is located. In other words, a non-clustered index has an extra level between the index structure and the data itself. If a table does not have clustered indexes, it can have 250 non-clustered indexes. Each non-clustered index provides a different sort order for accessing data. When building a non-clustered index, it is necessary to weigh the advantages and disadvantages of the index between speeding up the query speed and reducing the modification speed. In addition, consider these issues:

1. How much space is needed for the index.

2. Whether the appropriate column is stable.

3. How to select the index key and whether the scanning effect is better.

4. Whether there are many duplicate values.

For tables that update frequently, non-clustered indexes on the table require more overhead than clustered indexes and no indexes at all. For each row moved to a new page, the page-level row pointing to each non-clustered index of the data must also be updated, and sometimes index page sorting may be required. The process of deleting data from a page has a similar overhead, and the deletion process must also move the data to the top of the page to ensure data continuity. Therefore, it is very prudent to build a non-clustered index. Non-clustered indexes are often used in the following situations:

1. A column is often used for aggregate functions (such as Sum,....).

2. A column is often used for join,order by,group by.

3. The data found shall not exceed 20% of the data in the table.

The above is all the content of the article "how to use MySQL clustered index and non-clustered index". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.