Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to implement full-text indexing in MySQL

2025-03-29 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)05/31 Report--

In this issue, the editor will bring you about how to achieve full-text indexing in MySQL. The article is rich in content and analyzed and described from a professional point of view. I hope you can get something after reading this article.

Use

Use MATCH ()... Search in AGAINST mode

Match () indicates that the column is searched, and against indicates that the string is searched

Check the default participle (use these words to distinguish different keywords); you can also customize the participle to distinguish different keywords.

SELECT * FROM information_schema.INNODB_FT_DEFAULT_STOPWORD

Such as

+-+ | value | +-+ | a | | about | | an | | are | | as | | at | | be | | by | | com | | de | | en | | for | | from |

Three types of full-text search

Natural language search (Natural language search)

Pass a specific string through MATCH AGAINST for checking, default mode

Boolean search (Boolean search)

Add operators to the retrieved string, such as "+" means it must be included, "-" does not contain, "*" means wildcard, even if the passed string is small or appears in the stop word, it will not be filtered out.

Query expansion search (query extended search)

The search string is used to perform a natural language search, and then the word of the most relevant line returned by the search is added to the search string and searched again, and the query returns the line from the second search

Related parameters

Configure related parameters

Innodb_ft_min_token_size

Default 3, which means a minimum of 3 characters as a keyword. Increasing this value reduces the size of the full-text index.

Innodb_ft_max_token_size

The default is 84, which means a maximum of 84 characters as a keyword. Limiting this value reduces the size of the full-text index.

Ngram_token_size

Default 2, which indicates that 2 characters are used as a keyword in the built-in word segmentation parser, such as establishing a full-text index on "abcd". The keyword is' ab','bc','cd'.

Innodb_ft_min_token_size and innodb_ft_max_token_size are invalid when using the ngram word segmentation parser

Note that none of these three parameters can be modified dynamically. If you modify these parameters, you need to restart the MySQL service and re-establish the full-text index.

Testing the innodb engine using full-text indexing

Prepare for

1. Goal

Query whether an article contains a keyword; the number of times a keyword appears in a series of articles

Query whether the title of the article contains a keyword

2. Set the following parameters to reduce disk IO pressure

SET GLOBAL sync_binlog=100;SET GLOBAL innodb_flush_log_at_trx_commit=2

3. Import 1kw data for test full-text indexing

The data source is searched on the Internet.

Extraction code: iyip

4. The structure of an article table

CREATE TABLE `originle` (`id`bigint (10) NOT NULL, `url`varchar (1024) CHARACTER SET latin1 NOT NULL DEFAULT'', `title`varchar (1024) NOT NULL DEFAULT'', `source`varchar''COMMENT' COMMENT', `keywords` varchar (32) DEFAULT NULL, `publish_ time`timestamp NULL DEFAULT NULL, PRIMARY KEY (`id`), KEY `title_ idx` (`title`) ENGINE=InnoDB

Import test data using myloader multithreading

First decompress the test data tar-zxf mydumper_dump_article.tar.gztime myloader-u $user-p $passwd-S $socket-t 32-d / datas/dump_article-v 3

5. The total amount of data and the size of data files and index files after data import

SELECT COUNT (*) FROM `roomle` +-+ | COUNT (*) | +-+ | 10000000 | +-+ 1 row in set (7.85 sec) SELECT table_name, CONCAT (FORMAT (SUM (data_length) / 1024 / 1024),'M') AS dbdata_size, CONCAT (FORMAT (SUM (index_length) / 1024 / 1024),'M') AS dbindex_size CONCAT (FORMAT (SUM (data_length + index_length) / 1024 / 1024 / 1024 and table_name='article' 2),'G') AS `db_size (G) `, AVG_ROW_LENGTH,table_rows,update_time FROM information_schema.tables WHERE table_schema = DATABASE () and table_name='article' +-+ | table_name | dbdata_size | dbindex_size | Db_size (G) | AVG_ROW_LENGTH | table_rows | update_time | +- -+ | article | 3710.00m | 1003.00m | 4.60g | 414 | 9388739 | 2019-07-05 15:31:37 | +-+- -+

Create a full-text index using the default method

1. The table is already related to the key word field (a brief description of the content of the article), and uses "," as the word separator.

Select keywords from article limit 10 +-+ | keywords | +-+ | NULL | | NULL | Marriage, Love | | hair style, partial score, make-up, Fashion | | Xiao A, | clothing match, female, Fashion | | Beautiful, female | | Valentine's Day, Dongguan, female | | skin, skin care, skin care, food nutrition, beauty, health preservation | | Sanlitun | Beijing, Fashion | +-

2. Search for a keyword without a full-text index

A full table scan is required

Select count (*) from article where keywords like'% Fashion%'; +-+ | count (*) | +-+ | 163 | +-+ 1 row in set (7.56 sec)

3. Create a full-text index of the keyword field (as a participle)

Set innodb_ft_min_token_size in the my.cnf configuration file and restart the MySQL service (minimum two characters as one keyword, default three characters as one keyword)

[mysqld] innodb_ft_min_token_size=2

3.1 set custom stopwords (i.e. participle)

USE mysql;CREATE TABLE my_stopwords (VALUE VARCHAR (30)) ENGINE = INNODB;INSERT INTO my_stopwords (VALUE) VALUE (','); SET GLOBAL innodb_ft_server_stopword_table = 'mysql/my_stopwords';SHOW GLOBAL VARIABLES WHERE Variable_name IN (' innodb_ft_min_token_size','innodb_ft_server_stopword_table') +-- +-+ | Variable_name | Value | +-- -+ | innodb_ft_min_token_size | 2 | | innodb_ft_server_stopword_table | mysql/my_stopwords | +-+-+

3.2 create a full-text index

Alter table article add fulltext index idx_full_keyword (keywords); * [] Query OK, 0 rows affected, 1 warning (1 min 27.92 sec) * [] Records: 0 Duplicates: 0 Warnings: 1

3.3 the remaining disk space needs to be sufficient. The original table is 4.6g and the remaining 5.7g disk. Adding a full-text index will also fail.

Df-hFilesystem Size Used Avail Use% Mounted on/dev/vda1 7.8G 6.3G 1.2G 85% / tmpfs 1.9G 0 1.9G 0 / dev/shm/dev/mapper/vg_opt-lvol0 19G 12G 5.7G 68% / datas creates a temporary file of the original table size 8.6K Jul 5 16:19 # sql-5250_3533.frm 4.4G Jul 5 16:20 # sql-ib117-1768830977. Ibdalter table article add fulltext index idx_full_keyword (keywords) ERROR 1114 (HY000): The table 'article' is full

3.4 query the number of occurrences of a keyword using the created full-text index

Query response time has been greatly improved, only 0.05s; using where keywords like'% Fashion% 'takes 7.56s

Select count (*) from article where match (keywords) against ('% Fashion%'); +-+ | count (*) | +-+ | 163,163 | +-+ 1 row in set (0.05 sec)

3.5 if you need to match multiple keywords at the same time, use Boolean full-text search

Indicates that it exactly matches the number of records of "Sanlitun, Beijing" select count (*) from article where match (keywords) against ('+ Sanlitun, Beijing'in boolean mode) +-+ | count (*) | +-+ | 1 | +-+ 1 row in set (0.06 sec) indicates the number of records matching "Sanlitun" or "Beijing" select count (*) from article where match (keywords) against ("Sanlitun, Beijing") +-+ | count (*) | +-+ | 8 | +-+ 1 row in set (0.06 sec)

3.6 after the full-text index is created, some other files are created

96K Jul 5 16:30 FTS_00000000000000a7_00000000000000c0_INDEX_1.ibd

96K Jul 5 16:30 FTS_00000000000000a7_00000000000000c0_INDEX_2.ibd

96K Jul 5 16:30 FTS_00000000000000a7_00000000000000c0_INDEX_3.ibd

96K Jul 5 16:30 FTS_00000000000000a7_00000000000000c0_INDEX_4.ibd

128K Jul 5 16:30 FTS_00000000000000a7_00000000000000c0_INDEX_5.ibd

256K Jul 5 16:30 FTS_00000000000000a7_00000000000000c0_INDEX_6.ibd

96K Jul 5 16:29 FTS_00000000000000a7_BEING_DELETED_CACHE.ibd

96K Jul 5 16:29 FTS_00000000000000a7_BEING_DELETED.ibd

96K Jul 5 16:30 FTS_00000000000000a7_CONFIG.ibd

96K Jul 5 16:29 FTS_00000000000000a7_DELETED_CACHE.ibd

96K Jul 5 16:29 FTS_00000000000000a7_DELETED.ibd

-the first 6 represent inverted indexes (secondary index tables)

-7th, 8th represents a document ID (DOC_ID) that contains deleted documents, whose data is currently being deleted from the full-text index

-the ninth message that represents the internal status of the FULLTEXT index

-10th, 11 documents that contain deleted data but have not been deleted from the full-text index

Create a full-text index using the ngram word segmentation parser

1. Set up a full-text index to the title field (there is no fixed stopwords participle in this field, use the ngram participle parser)

You need to first set ngram_token_size in the my.cnf configuration file (2 characters by default as the keyword of ngram), and restart the mysql service

The default 2 is used here

Select title from article limit 10 +-- + | title | +- -- + | worth IT | | small show of Launchpad Jiangnan Tannery | | A rare moment behind the Raw when a "madman" is carried back to the background. | | Raw: the son scolded his father that you are a green tea boy with a dozen of four | | four groups of 30 square meters hardcover small households A large number of pictures, with household pictures | | Sexy smoked cat eye makeup of the queen of nightclubs | | Big Rock Johnson | | Girl Cui Xiuying dress Science Lin Yuner Huang Meiying Jin Taiyan Zheng Xiujing | | Deyang Outdoor outing Huatian self-help BBQ | +-+

2. Create a full-text index on the title field

Alter table article add fulltext index ft_index_title (title) with parser ngram;Query OK, 0 rows affected (3 min 29.22 sec) Records: 0 Duplicates: 0 Warnings: 0

3. An inverted index will be created (the longer the title field, the larger the inverted index will be created)

112M Jul 5 21:46 FTS_00000000000000a7_00000000000000cd_INDEX_1.ibd

28M Jul 5 21:46 FTS_00000000000000a7_00000000000000cd_INDEX_2.ibd

20M Jul 5 21:46 FTS_00000000000000a7_00000000000000cd_INDEX_3.ibd

140M Jul 5 21:46 FTS_00000000000000a7_00000000000000cd_INDEX_4.ibd

128M Jul 5 21:46 FTS_00000000000000a7_00000000000000cd_INDEX_5.ibd

668M Jul 5 21:46 FTS_00000000000000a7_00000000000000cd_INDEX_6.ibd

4. Do not set up a full-text index to search for a keyword in title

Select count (*) from article where title like'% outdoor%'; +-+ | count (*) | +-+ | 22058 | +-+ 1 row in set (8.60 sec) select count (*) from article where title like'% background%'; +-+ | count (*) | +-+ | 1142 | +-+

5. Search for a keyword using a full-text index

Response time has been greatly improved

Select count (*) from article where match (title) against ('Outdoor'); +-+ | count (*) | +-+ | 22058 | +-+ 1 row in set (0.07 sec) select count (*) from article where title like'% background%' +-+ | count (*) | +-+ | 1142 | +-+ 1 row in set (8.31 sec)

6. Note that inconsistencies will occur when the number of keywords searched is greater than 2 (ngram_token_size definition size).

For general search, the actual number of records with this keyword is 6select count (*) from article where title like'% Prince Da%; +-+ | count (*) | +-+ | 6 | +-+ 1 row in set (8.40 sec) full-text search, and the number of records with keywords is 9443select count (*) from article where match (title) against ('Prince Dada') +-+ | count (*) | +-+ | 9443 | +-+ 1 row in set (0.06 sec) the number of records in which the keyword actually appears is 1select count (*) from article where title like'% Huatian self-help%'. +-+ | count (*) | +-+ | 1 | +-+ 1 row in set (8.33 sec) full-text search the number of records with this keyword is 3202select count (*) from article where match (title) against ("Huatian self-help") +-+ | count (*) | +-+ | 3202 | +-+ 1 row in set (0.06 sec)

Conclusion

When there is a fixed stopword participle in a mysql field (English space character, Chinese ","-", etc.), establishing a full-text index to this field can quickly search the relevant record information of a certain keyword and achieve the effect of a simple search engine.

When a mysql field does not have a fixed stopword participle, the built-in parser ngram can be used to quickly search the field value into a fixed number of keywords (ngram_token_size defined size). When the number of characters of the searched keyword is not equal to the ngram_token_size defined size, there will be problems that are inconsistent with the actual situation.

The full-text index can be searched quickly, and there is also the overhead of maintaining the index; the larger the length of the field, the larger the full-text index created, which will affect the throughput of DML statements. You can use a special full-text search engine ES to do this.

The above is the editor for you to share how to achieve the full-text index in MySQL, if you happen to have similar doubts, you might as well refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report