Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

MySQL 4.1.0 Chinese reference Manual-6.8 MySQL full text search (transfer)

2025-03-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)06/01 Report--

MySQL 4.1.0 Chinese reference manual-- 6.8 MySQL full text search [@ more@] code {color:purple} tt {color:green} samp {color:navy} pre {color:maroon} MYSQL 4.1.0 Chinese reference manual > words "content=" MySQL,4.1.0,Shuixin13,MySQL 4.1.0, Chinese, Chinese reference manual, Dog (Xinfan) > CSS rel=STYLESHEET > MySQL Reference Manual for version 4.1.0-alpha.6.8 MySQL full text search

By 3.23.23, MySQL began to support full-text indexing and search. A full-text index is an index of type FULLTEXT in MySQL. FULLTEXT indexes are used for MyISAM tables and can be created on CHAR, VARCHAR, or TEXT columns using ALTER TABLE or CREATE INDEX at or after CREATE TABLE. For large databases, it would be very fast to load the data into a table without an FULLTEXT index, and then create the index using ALTER TABLE (or CREATE INDEX). Loading data into a table that already has an FULLTEXT index will be very slow.

Full-text search is done through the MATCH () function.

Mysql > CREATE TABLE articles (- > id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,-> title VARCHAR,-> body TEXT,-> FULLTEXT (title,body)->) Query OK, 0 rows affected (0.00 sec) mysql > INSERT INTO articles VALUES-> (NULL,'MySQL Tutorial', 'dbMS stands for DataBase...'),-> (NULL,'How To Use MySQL Efficiently', 'After you went through a...'),-> (NULL,'Optimising MySQL','In this tutorial we will show...'),-> (NULL,'1001 MySQL Tricks','1. Never run mysqld as root. 2.'),-> (NULL,'MySQL vs. YourSQL','In the following database comparison...'),-> (NULL,'MySQL Security', 'When configured properly, MySQL...'); Query OK, 6 rows affected (0.00 sec) Records: 6 Duplicates: 0 Warnings: 0 mysql > select * FROM articles-> WHERE MATCH (title,body) AGAINST ('database') +-+ | id | title | body | +-+-- -+ | 5 | MySQL vs. YourSQL | In the following database comparison. | | 1 | MySQL Tutorial | DBMS stands for DataBase. | +-+-- -+ 2 rows in set (0.00 sec)

The function MATCH () performs a natural language search for a string against a text set (the column set of one or more columns contained in an FULLTEXT index). The search string is given as an argument to AGAINST (). The search is performed by ignoring the case of letters. MATCH () returns a correlation value for each record row in the table. That is, the measure of similarity between the search string and the text of the column specified by the record row in the MATCH () list.

When MATCH () is used in a WHERE clause (see the example above), the returned rows of records are automatically sorted in the order of correlation from highest to lowest. The correlation value is a non-negative floating-point number. Zero correlation means dissimilarity. The calculation of correlation is based on the number of words in the record line, the number of unique words in the line, the total number of words in the set, and the number of documents containing a particular word (record line).

It can also perform a logical pattern search. This is described in the following chapters.

The previous example is some basic instructions on the use of the function MATCH (). Record rows are returned in descending order of similarity.

The next example shows how to retrieve an explicit similarity value. If there is neither WHERE nor ORDER BY clause, the returned rows are not sorted.

Mysql > SELECT id,MATCH (title,body) AGAINST ('Tutorial') FROM articles +-+-+ | id | MATCH (title Body) AGAINST ('Tutorial') | +-- +-+ | 1 | 0.64840710366884 | | 2 | 0 | 3 | 0.66266459031789 | 4 | 0 | 5 | 0 | 6 | 0 | + | -+ 6 rows in set (0.00 sec)

The following example is a little more complex. The query returns similarity and still returns record rows in descending order. To accomplish this, you should specify MATCH () twice. This does not incur additional overhead because the MySQL optimizer notices the same MATCH () calls twice and invokes the full-text search code only once.

Mysql > SELECT id, body, MATCH (title,body) AGAINST-> ('Security implications of running MySQL as root') AS SCOre-> FROM articles WHERE MATCH (title,body) AGAINST-> (' Security implications of running MySQL as root') +-+-- +-+ | id | body | score | +-+-- -+ | 4 | 1. Never run mysqld as root. 2. | 1.5055546709332 | | 6 | When configured properly, MySQL. | 1.31140957288 | +-+-- + 2 rows in set (0.00 sec)

MySQL uses a very simple parser to separate text into words. A "word" is any sequence of characters consisting of text, data, "'" and "_". Any "word" that appears on the stopword list, or that is too short (3 characters or less) will be ignored.

Each appropriate word in the set and query is measured by its importance in the set and query. In this way, a word that appears in multiple documents will have a lower weight (maybe even a zero weight) because it has a lower semantic value in this particular set. Otherwise, if the word is less, it will get a higher weight. The weight of the word is then combined to calculate the similarity of the record rows.

Such a technical job works well with a large set (in fact, it will be carefully tuned to it). For very small tables, word classification is not enough to fully reflect their semantic values, and sometimes this pattern can produce strange results.

Mysql > SELECT * FROM articles WHERE MATCH (title,body) AGAINST ('MySQL'); Empty set (0.00 sec)

In the above example, the search term MySQL did not get any results, because the word appears in more than half of the record lines. Similarly, it is effectively treated as a stopword (that is, a word with zero semantic value). This is the ideal behavior-- a natural language query should not return every second row from a 1GB table.

Words that match half of the rows recorded in the table are rarely likely to find the relevant document. In fact, it may find a lot of irrelevant documents. We all know that this often happens when we try to search for something on the Internet through search engines. For this reason, such a row is set to a low semantic value in this particular dataset.

By 4.0.1, MySQL can also use the IN BOOLEAN MODE modifier to perform a logical full-text search.

Mysql > SELECT * FROM articles WHERE MATCH (title,body)-> AGAINST ('+ MySQL-YourSQL' IN BOOLEAN MODE) +-+ | id | title | body | +-+-- -- + | 1 | MySQL Tutorial | DBMS stands for DataBase... | 2 | How To Use MySQL Efficiently | After you went through a... | 3 | Optimising MySQL | In this tutorial we will show... | 4 | 1001 MySQL Tricks | 1. Never run mysqld as root. 2. | | 6 | MySQL Security | When configured properly, MySQL. | +-+

This query returns all rows of records that contain the word MySQL (note: the threshold of 50% is not used), but it does not contain the word YourSQL. Note that the search for a logical pattern does not automatically sort record rows in descending order of similar values. You can see from the above results that the highest similarity value (the one that includes MySQL twice) is the last, not the first. A logical full-text search works even without an FULLTEXT index, but it is slower.

Logical full-text search supports the following operators:

A leading plus sign indicates that the word must appear in each returned record row. A leading minus sign indicates that the word must not appear in each returned record row. The default word (when neither a plus sign nor a minus sign is specified) is arbitrary, but the record lines that contain it will be arranged a little higher. This mimics MATCH () without the IN BOOLEAN MODE modifier. The behavior of AGAINST ().

< >

These two operators are used to change the base value of a word's similarity value. The operator adds it. See the following example. () parentheses are used to group words in subexpressions. A leading negative sign acts like a negative operator, and the base value of the word that causes row similarity is negative. It is useful for marking a noise word. A record containing such a word will be arranged a little lower, but will not be completely excluded, because the-operator can be used. * an asterisk is the truncation operator. Unlike other operators, it should be appended to a word, not in front of it. The phrase, enclosed in double quotes, matches only the record lines that contain the phrase (literally, as if typed).

Here are some examples:

Apple banana looks for a record line + apple + juice... that contains at least one of the above words Both words are included in + apple macintosh... Contains the word "apple", but if it also contains "macintosh", it will be arranged higher + apple-macintosh... Contains "apple" but not "macintosh" + apple + (> pie...) Contains "apple" and "pie", or "apple" and "strudel" (in any order), but "apple pie" is arranged a little higher than "apple strudel" apple*... Contains "apple", "apples", "applesauce" and "applet"some words". Can include "some words of wisdom", but not the restriction of "some noise words" 6.8.1 that all arguments to the MATCH () function must be from columns from the same table and must be part of the same FULLTEXT index, unless MATCH () is IN BOOLEAN MODE. The MATCH () column list must exactly match the column list defined in one of the FULLTEXT indexes of the table, unless MATCH () is IN BOOLEAN MODE. The argument to AGAINST () must be a constant string. 6.8.2 fine-tune MySQL full-text search

Unfortunately, full-text search still has only a few user-adjustable parameters, although some of the additions are high on TODO. If you have a MySQL source distribution (see section source > 2.3 to install a MySQL source distribution), you can exercise more control over full-text search.

Note that full-text search is the best search result and has been carefully adjusted. The behavior of changing the default value, in most cases, will only make the search results worse. Do not modify the source code of MySQL unless you know what you are doing!

The minimum length of the indexed word is specified by the MySQL variable ft_min_word_len. See section 4.5.6.4 SHOW VARIABLES. Change it to the value you want and rebuild your FULLTEXT index. (this variable is only supported since MySQL 4. 0.) the stopword list can be read from the file specified by the ft_stopword_file variable. See section 4.5.6.4 SHOW VARIABLES. After modifying the stopword list, rebuild your FULLTEXT index. (this variable is only supported from MySQL 4.0.10.) the 50% threshold selection is determined by the special measurement mode selected. To disable it, modify the following line in the `myisam/ftdefs.h' file:

# define GWS_IN_USE GWS_PROB

Change to:

# define GWS_IN_USE GWS_FREQ

Then recompile MySQL. In this case, there is no need to rebuild the index. Note: using this will significantly reduce the ability of MySQL to provide sufficient similarity values for MATCH (). If you do need to search for such a public word, it's best to use IN BOOLEAN MODE's search instead, which doesn't follow the 50% threshold. Sometimes, search engine maintainers want to change the operators used for logical full-text search. These are defined by the variable ft_boolean_syntax. See section 4.5.6.4 SHOW VARIABLES. However, this variable is read-only and its value is set in `myisam/ft_static.c'.

For these changes, you are required to rebuild your FULLTEXT index. For a MyISAM table, the easiest way to rebuild the index file is like the following statement:

Mysql > REPAIR TABLE tbl_name QUICK

6.8.3 full-text search TODO makes all operations on FULLTEXT indexes faster. Proximity operator support for "always-index words". They can be any string that the user wants to treat as a word, such as "C++", "AS/400", "TCP/IP", etc. Support for full-text search in the MERGE table supports multi-byte characters to create stopword lists Stemming (data-dependent language, of course) Generic user-suppliable UDF preparser. Make the mode more flexible (by adding some adjustable parameters to the FULLTEXT in CREATE/ALTER TABLE)

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report