Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the InnoDB index optimization method in MySQL?

2025-01-30 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly introduces "what is the InnoDB index optimization method in MySQL". In the daily operation, I believe that many people have doubts about the InnoDB index optimization method in MySQL. The editor consulted all kinds of data and sorted out simple and easy-to-use operation methods. I hope it will be helpful for you to answer the doubt of "what is the InnoDB index optimization method in MySQL?" Next, please follow the editor to study!

Half-duplex communication: MySQL data transmission uses half-duplex communication, at the same time, either the client sends data to the server, or the server sends data to the client, these two actions can not occur at the same time. MySQL also has requirements for the client to send data, sending all the data at once and waiting for the server to respond before sending the next data.

Sequential read and write and random read and write: database data must be removed from the disk. Due to the physical structure of the disk and the long seek time, sequential read and write is much more efficient than random read and write. If you don't quite understand, you can think about taking a bus. Would you like to take a direct train (read and write in sequence)? Or all kinds of transfer (random read and write) is better?

Result caching: MySQL supports caching of query results and is turned off by default. (as a hint, try not to use MySQL's own cache for frequently updated data, cache invalidation results in more performance waste)

SQL query process: the client sends the query SQL, transfers the data to the server, and gives priority to the query result cache. If it misses, the result is stored in memory and returned to the client through the parser, preprocessor, optimizer, execution plan, execution engine, and storage engine. (follow-up to write an article to introduce)

Index: a data structure that helps MySQL get data efficiently. Most indexes in MySQL use multipath balanced search trees.

Before optimizing the index, you need to know the specific structure of the index. According to different storage engine data storage structure is also different, storage engine mainly uses InnoDB, MyISAM.

InnoDB engine index description clustered index

Each table has a clustered index:

Use the primary key as the clustered index when the primary key exists

When the primary key does not exist, the first unique index that does not contain null values is used as the clustered index

When none of the above indexes exist, MySQL creates a clustered index of the hidden field rowid.

The data of each table is clustered together according to the clustered index to form a B + tree. The non-index data is mounted on the last leaf node, and there are ordered pointers between the leaf nodes.

Clustered index diagram 1

Auxiliary index

In addition to the clustered index, other non-clustered indexes in the table become secondary indexes or secondary indexes. the leaf nodes in the secondary indexes no longer mount non-index data, but store the index values of the clustered indexes.

Secondary index diagram 2

Joint index

Special auxiliary index: in a federated index, the node of a B+ tree stores not one column data, but multiple column data, forming a node in a defined order.

Joint index diagram 3

With a certain understanding of the storage structure of B+ tree, this paper analyzes how to optimize SQL from a practical point of view. This is also what the SQL optimizer does.

Index optimization selection of primary key

First of all, understand that the B+ tree is an ordered multi-channel balanced search tree, that is, it needs to be sorted before insertion, and operations such as page removal and rotation are needed in order to balance.

First of all, the order itself, the order is the result of comparison, how to compare? MySQL must specify the encoding format and sorting method when creating the data, so there is a way to compare the order. Regardless of the type of primary key, numbers and strings are converted and sorted. The comparability of primary keys determines the efficiency of primary keys.

In the sense of order, take a closer look at the leaf nodes in figure 1 of the clustered index, that is, the last layer, which is an ordered list of pages (the data put together in the diagram is called a page). Each insertion is to determine the location of the primary key, and then record the data, whether the orderly insertion of the leaf node determines the efficiency of the primary key. The orderliness of primary keys determines the orderliness of disk reading and writing (sequential writing is much more efficient than random writing).

The above two points are sufficient to illustrate the importance of the ordering of primary keys in MySQL. So choose the primary key first to choose the ordered primary key, the self-increasing primary key is the ordered primary key. Of course, do not be so absolute, when the amount of data is too small, the efficiency gap is basically invisible.

By the way, the choice of UUID primary key and self-increasing primary key, which is often asked, can be used when the amount of data is too small or when the business is rigid. When the amount of data is too large, it is recommended to increase the primary key, not only because of ordering, but also because the storage space of strings is larger than that of integers.

The choice of sorting

As for the order mentioned earlier, the following order is used here. The leaf nodes of the index tree are ordered themselves, and the more order by matches this order, the more efficient the query. Therefore, when sorting, try to sort according to the index used, so the default is primary key sort when querying the whole table. If other indexes are involved in the query condition, the order of the first index is the default. If you are not sure what index is used, you should actively specify the sort sequence

Also based on the above, it is recommended to build indexes on frequently sorted or grouped columns

How to get the data in the index tree

First of all, it is clear that there are two kinds of data in the index tree: 1: the index tree non-leaf node stores index data, and 2: the index leaf node stores index data and table non-index data.

Secondly, it should be clear: a clustered index is an index tree with full table data, and every table must have it. There is one more index tree for each other secondary index, but as shown in the diagram, the leaf node does not store data.

Therefore, the acquisition of SQL query data should be analyzed from two angles.

From the perspective of different index trees

Query clustering index tree

Query non-clustered index tree

From the point of view of the location of the query data

Query non-leaf node data in the index tree (that is, index data) without looking up other data

Query the data in the leaf node (including indexed and non-indexed data).

One of the key points of SQL index optimization is the location of the data.

If all the data of the query is in the non-leaf node of the index tree (that is, the query index column), the efficiency is the highest, because of the order of the node, the data can be quickly found through efficient algorithms to complete the query, this kind of query is called overlay index query. This tells the user: try not to use select *, and you should also know that if a table column is full of indexes, it is sure to leave the index. (stop talking about not null,! = must not go to the index.)

If the query data is not in the non-leaf node of the index tree (that is, the query non-index column), note that the SQL optimizer is likely to optimize the written SQL at this time, resulting in an inconsistency between the final executed SQL and the SQL transmitted by the client.

Let's first talk about the regular data search process at this time:

If the query condition has an index, the first index condition column (optimized) is used to load the data row for the first time

If the index is a clustered index, then on the clustered index tree, the leaf node location of the index is queried according to the algorithm, and the corresponding data of that position can be obtained.

If the index is a non-clustered index, on the non-clustered index tree, the leaf node location is queried according to the algorithm, the clustering index value is obtained, and then the value is located on the clustered index tree. Then the corresponding data on the leaf node of the clustered index tree can be obtained. The process from a non-clustered index tree to a clustered index tree is called a table return.

If the query condition does not have an index

Since there is no index, the whole table will be scanned at the non-leaf node data of the clustered index tree and matched one by one until the data is returned after scanning.

The rows of data obtained from the clustered index are loaded into memory and are then processed

Filter other where conditions, and finally return the filtered data.

This tells the user that the first condition in the where condition should match the data as accurately as possible (for example, primary keys, highly discrete index columns).

Splitting of index tree and removal of nodes

The number of data stored in each page in the index tree is fixed, such as 4. When the new data is added to the page, if the data has reached 4, it needs to be split into 2 pages, each page is still 4 to ensure.

When the node is removed, the index tree rotates to achieve balance. You can query the balance tree for the specific process. All you need to know here is that index tree adjustment is time-consuming and expensive.

Therefore, columns that are updated frequently are not suitable as primary keys or indexes.

Leftmost matching principle

Ask an index optimization, all say the leftmost matching principle, but do you know why it is the leftmost matching and how to match?

When talking about the order above, I mentioned how to sort, and how to match here is similar, for example, how abc and abd match. Here we talk about the popular understanding (not necessarily implementation). Get the sorting value by encoding and sorting these two characters one by one, assuming that the post-coding value of an is 32, the post-coding value is 33, and the post-coding value is 34, and the d-coding value is 35. When matching, first compare a = =, if it is not equal, it is no longer necessary to match, if equal, compare b, then c, and finally find that 35 > 34, so the result is a mismatch. The matching of the first step an is the leftmost matching principle.

Leftmost matching application:

Like matching, only the left character determination can support the leftmost matching principle, that is, does not support% xxx matching.

In federated index matching, data stores in non-leaf nodes in a federated index are combined into one node in the order defined by the federated index, such as

Once the index0,index1,index2 is out of order, it cannot be matched. But remember one thing: the combined index nodes are sorted according to one node in the index, that is, even matching one index can improve efficiency. For example, the aggregate index a _ r _ b _ I _ c queries the condition where _ index 1 and c _ indexing 1, where axiom 1 can walk the aggregate index, but c is not, which is equivalent to% c. There is also a pit here, which will ask if the query goes through the index, and the answer is to go through the index (part of it is also done). There are also range queries encountered in query conditions (like! = >)

< 等)则会中止后续匹配。直接理解为联合索引就是一个拼接后的字符列索引,遇到范围查询则会导致开销指数级变大。 索引条件下推ICP 在索聚簇索引树查询数据行之前,匹配的数据行越少,越精确则查询效率越高。ICP(index_condition_pushdown)技术就是优化的这部分,旨在尽量减少数据行加载到内存中。在InnoDB引擎中ICP只支持联合索引,因为聚簇索引能直接锁定要查询的数据行,无法继续再筛选(聚簇索引只有一个索引),而联合索引则是至少2个索引,在第一个索引匹配的行数和后续其他联合索引匹配的行数处理后,再回表到聚簇索引树中查询数据,这样聚簇索引树中的数据行就会缩减,从而提高效率。ICP技术是默认开启的。explain提示信息为:Using index condition,设置参数为:index_condition_pushdown ICP应用: 尽量建立聚合索引而不是多个单索引,where条件后面按照聚合索引列作为条件 函数对索引条件的影响内置函数 MySQL函数的contract,date_format,count等 函数区分为2种,1:该函数可以得到确定的结果,这种称为确定性函数,2:该函数不能得到确定的结果,具体的结果由参数决定,这种称为不确定性函数 表达式 计算表达式,1+1、2*3等 函数和表达式位置分为条件左侧和右侧,条件左侧即条件列,右侧为查询条件。 对于右侧: 确定性函数大部分可以使用索引,例如: contract、pow 不确定性函数基本不能使用索引,例如: rand,uuid 对于左侧: 一定导致索引失效,而且任何对左侧索引列的处理都会导致索引失效,包含编码格式、函数、表达式计算等。 例如:where age + 10 = 30 应写为where age = 30 + 10这种写法没问题,MySQL会自动优化为where age = 40 NULL的优化 MySQL支持索引列的null查询,且支持is not null和is null,属于范围查询。出现索引失效的一般都是因为回表开销过大导致的,毕竟数据为null为少数或者多数。 非空约束列的is null查询不会走索引,因为有比索引更高效的查询方式。 开销优化 MySQL的优化器是基于开销的,它对客户端的SQL会解析出多条同样效果的SQL,最终选择的是开销最小的SQL。基本所有的优化都基于此。 离散度体现的开销 例如:在性别sex列表建立索引,然而sex值只有0和1。如果表中数据全是男或者全是女,优化器会觉得全表扫描会由于索引查询,毕竟不用从索引树的根节点逐个比较。 开销大小对索引而已外观表现为索引列数据的离散度,离散度相当于count(distinct(column_name))/count(*)。对于这种离散度低的列不建议建立索引 全表扫描开销 例如:聚合索引a,b,c,在查询条件中使用where a=1 or d=1,这里d为非索引列,此时会导致匹配d时必须全表扫描,既然都全表扫描了说明索引树中的数据行都加载到了内存,因此没必要通过索引去过滤,定位聚簇索引树的位置了,于是最终采用的是全表扫描而不会走索引。注:如果表所有列都是索引则全表扫描也是走索引树扫描。覆盖索引优先级比全表扫描优先级高 联合索引顺序开销 例如:聚合索引a,b,c,在查询条件中书写顺序where a=1 and b=1 and c=1和书写顺序where c=1 and a=1 and b=1不影响索引使用,SQL优化器会分析出最小的开销,就是按照索引定义顺序来纠正查询条件。符合最左匹配原则才有意义。 其他索引优化 MySQL优化点很多,只是列一些常见的优化 隐式转换 字符串类型的列一定要加单引号'',否则会隐式转换为数字,导致索引失效 负向索引 负向索引( 、!= 、not in)有可能使用索引,但是大部分不会使用索引,这要基于SQL优化器优化了。例如对于索引列a,如果值全是1(离散度过低),此时1 、!=、not in(1) 都是会走索引的。注意不走索引便意味着全表扫描。 对于负向索引(not like) 一定不走索引。 强制索引 当SQL优化器优化后不是想要的SQL时,可以指定强制索引(force index(idx_name))来让SQL使用指定的索引查询,不一定会采用,只有多个执行计划中有这个索引的执行计划时才有效(毕竟强制一个不查询的索引也没意义)。 其他优化查询结果越少越好 前面提到MySQL是半双工通信,客户端需要等待服务端处理好结果且返回之后才能继续。如果查询结果很大,会导致后续请求阻塞。故善用limit,不要select *,也注意insert into xxx select xxx这个select结果也是越少越好 子查询越少越好,最好不存在 子查询会导致多次查询数据行,浪费IO。个人建议即使多次请求也比子查询好。不仅能看懂,效率也不一定降低。 查询SQL越精确越好 SQL越精确,在进行查找时读取的数据行越少,查询效率越高。 尽量不要随机读取 基于磁盘性能,随机读取效率差,索引树查询开销大,不建议 常量查询效率比索引查询高 能使用常量查询的尽量使用常量查询 例如:只是确认是否存在,没必要查询其他字段 select 1 from user where name='xx' limit 1 例如 非空约束列查询is null 时间字段尽量使用数据库函数 虽然说大部分数据库和线上库都会统一时间,但是防止埋坑,而且数据库自身的效率会高点,当然这点性能没什么影响。如果没必要还是建议使用数据库自身的时间函数来填充时间字段。 update user set modify_time=now()使用IN代替OR 针对同列的IN 和 OR 如果查询字段是索引列,则二者性能基本一致,否则In的效率随着数据量增大会比OR越来越高, 针对IN,MySQL会估算in范围的条数开销,in的范围越大开销越大,特别是不是唯一列的开销更大,此时可以考虑join等方式是否可以试下,毕竟in其实也是等值比较,join连接条件也是等值比较。当然也可以考虑exists 针对不同列的OR,例如where a=1 or b=1,会被优化为union,尽量主动书写union select a,b from source where a=0 or b=2 推荐写法 select a,b from source where a=0 unionselect a,b from source where b=2In和Exists 使用IN时要保证IN中的总数据量小且in之后的数据量也很小才能操作其效率高。Exists则是exists语句中的数据量大,但是匹配后小则效率高。 在考虑in和exists时,思考下哪个遍历的少,哪个效率就高。 平时常见的索引优化暂时就罗列这些,一旦想起来再来补充吧! 补充Like优化 经过数据验证,like在千万级数据时效率很差,反而没有instr函数效率高。 select xxx from xxx where xxx like '%abc%' 不如走索引的以下语句好 select xxx from xxx where xxx like 'abc%' 走索引的like不如以下语句好 select xxx from xxx where instr( xxx, 'abc' ) >

At this point, the study on "what is the InnoDB index optimization method in MySQL" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report