Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to optimize fuzzy matching Like% xxx%

2025-01-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)05/31 Report--

This article introduces the relevant knowledge of "how to optimize fuzzy matching Like% xxx%". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

Push ICP under index condition

ICP introduction

MySQL 5.6supports ICP (Index Condition Pushdown). Before ICP is not supported, when querying the index, first look up the data according to the index, and then filter according to the where condition, scan a large number of unnecessary data and increase the database IO operation.

After supporting ICP, MySQL judges whether where conditional filtering can be carried out while fetching index data, and puts part of the filtering operation of where on the storage engine layer to filter out unnecessary data in advance, thus reducing the IO overhead caused by unnecessary data scanning.

Under some queries, the reading of storage engine layer data by the Server layer can be reduced, thus providing the overall performance of the database.

ICP has the following characteristics

ICP related control parameters

Index_condition_pushdown: the default is enabled under the index condition, and the ICP feature is disabled when set to off.

Mysql > show variables like 'optimizer_switch'; | optimizer_switch | index_condition_pushdown=on # enable or disable ICP feature mysql > set optimizer_switch =' index_condition_pushdown=on | off'

ICP process

Suppose you have a user table users01 (id, name, nickname, phone, create_time) with 11W of data. Since ICP can only be used for secondary indexes, create a composite index idx_name_nickname (name,nickname) on the name,nickname column and analyze the execution of the SQL statement select * from users01 where name = 'Lyn' and nickname like'% SK%' with ICP off and on.

SQL performance Analysis of turning off ICP feature

Open profiling to track resource usage at each stage of SQL execution.

Mysql > set profiling = 1

Turn off the ICP feature to analyze SQL execution

Mysql > set optimizer_switch = 'index_condition_pushdown=off'

Mysql > explain select * from users01 where name = 'Lyn' and nickname like'% SK%'; | 1 | SIMPLE | users01 | NULL | ref | idx_name_nickname | 82 | const | 29016 | 100.00 | Using where | # View the resources of each stage during the execution of SQL using mysql > show profile cpu,block io for query 2 | | Status | Duration | CPU_user | CPU_system | Block_ops_in | Block_ops_out | +-- +-+ | starting | 0.000065 | 0.000057 | 0.000009 | 0 | 0 |. | executing | 0.035773 | 0.034644 | 0.000942 | 0 | 0 | # the execution phase takes 0.035773 seconds. | | end | 0.000015 | 0.000006 | 0.000009 | 0 | # status status variable Analysis | Handler_read_next | 16384 | # # number of rows requested to be read | Innodb_data_reads | 2989 | # Total number of data physical reads | Innodb_pages_read | 2836 | # Total number of logical reads | | | Last_query_cost | 8580.324460 | cost COST of # SQL statement | Mainly include IO_COST and CPU_COST.

Through the explain analysis of the execution plan, when the CP feature is turned off, the SQL statement takes the compound index idx_name_nickname,Extra=Using Where, first reads all the records of name = 'Lyn' from the storage engine through the composite index idx_name_nickname prefix, and then uses where to filter the nickname like'% SK%' situation on the server side.

Handler_read_next=16384 indicates that 16384 rows of data have been scanned, and SQL actually returns only 12 rows, which is time-consuming 50ms. For this kind of SQL which scans a large number of data rows and returns only a small amount of data, it can be analyzed from two aspects.

1. Low index selection rate: if name appears where condition as the leading column in accordance with the index (name,nickname), CBO will choose to walk the index, because the scanned index is smaller than the COST scanned by the whole table, but because the cardinality of the name column is not high, a large amount of data in the index is scanned, resulting in low SQL performance.

Column_name: name Cardinality: 6 can see that there are only 6 different values of name in the users01 table, and the selection rate of 6max 114688 is very low.

two。 Uneven distribution of data: for where name =? Generally speaking, when the name data is unevenly distributed, the value passed in by SQL for the first time returns a very small result set, so CBO will choose to walk the index and cache the execution plan of SQL. In fact, it is wrong to scan the index regardless of any value passed in name. If the value passed in Fly100 returns 80% of the data in the table, it is faster to scan the whole table.

| | name | count (*) | +-+-+ | Grubby | 12 | Lyn | 1000 | | Fly100 | 98100 | |

The histogram statistical characteristics of columns are introduced in MySQL 8.0, which is mainly optimized for the uneven distribution of index column data.

Performance Analysis of turning on ICP feature

Enable ICP features to analyze SQL execution

Mysql > set optimizer_switch = 'index_condition_pushdown=on'

# execution Plan | 1 | SIMPLE | users01 | NULL | ref | idx_name_nickname | idx_name_nickname | 82 | const | 29016 | 11.11 | Using index condition | # status status variable Analysis | Handler_read_next | 12 | Innodb_data_reads | 2989 | Innodb_pages_read | 2836 | Last_query_cost | 8580.324460 |

As can be seen from the execution plan, the composite index idx_name_nickname,Extra=Using index condition is gone, and only 12 rows of data are scanned, indicating that using the index condition push ICP feature, SQL consumes a total of 10ms, which is 5 times higher than that when ICP is turned off.

When the ICP feature is enabled, because the like condition of nickname can be filtered through the index, the storage engine layer removes the records that do not meet the conditions by comparing the index with the where condition. This process does not need to read the records, and only returns the filtered records to the Server layer to reduce unnecessary IO overhead.

Index scan mode displayed by Extra

Using where: if the query uses an index, you need to go back to the table to query the required data.

Using index condition: the query uses an index, but needs to go back to the table to query the data.

Using index: this occurs when a query uses an override index.

Using index & using where: the query uses an index, but all the data you need can be found in the index column, so you don't need to go back to the table to query the data.

Fuzzy matching rewriting optimization

When the ICP feature is enabled, the composite index (name,nickname) can be used to reduce unnecessary data scans and improve SQL performance for the condition where name = 'Lyn' and nickname like'% SK%'. But can the ICP feature be used to improve the performance of where nickname like'% SK%' complete fuzzy matching queries? Start by creating a single-column index idx_nickname on nickname.

Mysql > alter table users01 add index idx_nickname (nickname); # SQL execution Plan | 1 | SIMPLE | users01 | NULL | ALL | NULL | 114543 | 11.11 | Using where |

From the execution plan, you can see that type=ALL,Extra=Using where takes full scans and does not take advantage of the ICP feature.

The secondary index idx_nickname (nickname) contains the primary key id, which is equivalent to the compound index of (id,nickname). Try to use the overlay index feature to rewrite SQL to select Id from users01 where nickname like'% SK%'.

| | 1 | SIMPLE | users01 | NULL | index | NULL | idx_nickname | 83 | NULL | 114543 | 11.11 | Using where; Using index |

As you can see from the execution plan, type=index,Extra=Using where; Using index, the index is fully scanned, but all the data needed can be found in the index column, and there is no need to return to the table. Taking advantage of this feature, the original SQL statement first obtains the primary key id, and then associates it with the original table through id to analyze its execution plan.

Select * from users01 a, (select id from users01 where nickname like'% SK%') b where a.id = b.id

| | 1 | SIMPLE | users01 | NULL | index | PRIMARY | idx_nickname | 83 | NULL | 114543 | 11.11 | Using where; Using index | 1 | SIMPLE | a | NULL | eq_ref | PRIMARY | PRIMARY | 4 | test.users01.id | 1 | 100.00 | NULL |

From the point of view of the execution plan, without the index idx_nickname, there is no need to go back to the table to access the data, and the execution time is reduced from 60ms to 40ms. Type = index means that the ICP feature is not used, but Using where; Using index can be used to reduce resource overhead and improve performance.

Full-text index

MySQL 5.6 began to support full-text indexing, which can be created on variable-length string types to speed up DML operations for fuzzy matching business scenarios. It is an inverted index (reverse index), which automatically creates 6 auxiliary index tables (secondary index table) when creating fulltext index, and supports parallel index creation. Parallelism can be set through the parameter innodb_ft_sort_pll_degree, and the parameter value can be increased appropriately for large tables.

Deleting the data of the table of the full-text index will result in a large number of delete operations in the secondary index table. The tag is deleted inside the InnoDB, and the deleted DOC_ID is recorded in the special FTS_*_DELETED table, but the size of the index will not be reduced. You need to set the parameter innodb_optimize_fulltext_only=ON, and then run OPTIMIZE TABLE to rebuild the full-text index.

Full-text indexing feature

Two retrieval modes

IN NATURAL LANGUAGE MODE: default mode, search in natural language, AGAINST ('watch the wind' IN NATURAL LANGUAGE MODE) is equivalent to AGAINST ('watch the wind').

IN BOOLEAN MODE: Boolean pattern. The characters before and after a string in a table have a special meaning. For example, to find records that contain SK but not Lyn, you can use the +,-symbol.

AGAINST ('+ SK-Lyn' in BOOLEAN MODE)

Look for nickname like'% Lyn%' at this time, and you can see by reverse indexing the associative array that the word Lyn is stored in document 4 and then located in the specific secondary index table.

Full-text index analysis

Add a full-text index supporting Chinese word segmentation to the nickname of table users01

Mysql > alter table users01 add fulltext index idx_full_nickname (nickname) with parser ngram

View data distribution

# set the current full-text index table mysql > set global innodb_ft_aux_table = 'test/users01'; # View data file mysql > select * from information_schema.innodb_ft_index_cache +-+ | WORD | FIRST_DOC_ID | LAST_DOC_ID | DOC_COUNT | DOC_ID | POSITION | +-+- -+. | | Watch | 7 | 7 | 1 | 7 | 3 | | laugh | 7 | 7 | 1 | 7 | 0 |

Analysis of objects related to full-text indexing

# full-text indexing object analysis mysql > SELECT table_id, name, space from INFORMATION_SCHEMA.INNODB_TABLES where name like 'test/%'; | 1198 | test/users01 | 139 | # Store the document ID that is deleted by the tag and cleaned from the index, where _ being_deleted_cache is the memory version of the _ being_deleted table. | 1199 | test/fts_00000000000004ae_being_deleted | 1200 | test/fts_00000000000004ae_being_deleted_cache | 1200 | # Storage index internal status information and FTS_SYNCED_DOC_ID | 1201 | test/fts_00000000000004ae_config | 142 | # Storage document ID that has been deleted by the tag but not cleaned from the index, where _ deleted_cache is the memory version of the _ deleted table. | | 1202 | test/fts_00000000000004ae_deleted | 143c | 1203 | test/fts_00000000000004ae_deleted_cache | 144C |

Fuzzy matching optimization

For the condition nickname like'% watch% 'after the SQL statement, by default, CBO does not choose to take the nickname index, the write SQL is the way to match the full-text index: match (nickname) against (' watch').

Mysql > explain select * from users01 where match (nickname) against; | 1 | SIMPLE | users01 | NULL | fulltext | idx_full_nickname | idx_full_nickname | 0 | const | 1 | 100.00 | Using where; Ft_hints: sorted |

Using the full-text index query, type=fulltext, while hitting the full-text index idx_full_nickname, from the above analysis, we can see that in MySQL, for the full-text index of%% query, SQL can improve the efficiency through full-text index.

Generate column

MySQL 5.7supports generating columns, which are calculated from the value of the expression. There are two modes: VIRTUAL and STORED. If VIRTUAL is not specified by default, the creation syntax is as follows:

Col_name data_type [GENERATED ALWAYS] AS (expr) [* * VIRTUAL** | * * STORED**] [NOT NULL | NULL]

Generate column characteristics

VIRTUAL generated columns are used for complex condition definitions, can simplify and unify queries, do not take up space, and access columns will be calculated.

STORED generated columns are used as materialized caches, which can reduce computing costs and take up disk space for complex conditions.

Support for the creation of secondary indexes, partitioning and generating columns can simulate functional indexes.

Does not support stored procedures, expressions for user-defined functions, NONDETERMINISTIC built-in functions such as NOW (), RAND (), and unsupported subqueries

Generate column usage

# add the generated column based on function reverse reverse_nickname mysql > alter table users01 add reverse_nickname varchar (200) generated always as (reverse (nickname)); # View the generated column information mysql > show columns from users01; | reverse_nickname | varchar | YES | | NULL | VIRTUAL GENERATED | # Virtual generated column

Fuzzy matching optimization

Index scanning cannot be used for like'% xxx' after where condition. You can use MySQL 5.7 to generate columns to simulate functional indexes. The specific steps are as follows:

Use the built-in reverse function to convert like'% Fengyun 'into like' Yunfeng%', and add virtual generated columns based on this function.

Create an index on a virtual build column.

Rewrite the SQL to filter by generating the column like reverse ('% Fengyun') to generate the index on the column.

Add a virtual build column and create an index.

Mysql > alter table users01 add reverse_nickname varchar (reverse (nickname)); mysql > alter table users01 add index idx_reverse_nickname (reverse_nickname); # SQL execution Plan | 1 | SIMPLE | users01 | NULL | range | idx_reverse_nickname | idx_reverse_nickname | 803 | NULL | 1 | 100.00 | Using where |

You can see that indexes cannot be used for like'% xxx', which can be solved by indexing based on generated columns.

This is the end of "how to optimize fuzzy matching Like% xxx%". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report