In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
This article shows you what to do if the mysql5.7 full-text index does not support Chinese word segmentation, the content is concise and easy to understand, it will definitely brighten your eyes. I hope you can get something through the detailed introduction of this article.
Before MySQL 5.7.6, full-text indexing only supports English full-text indexing, but not Chinese full-text indexing. It is necessary to use a word splitter to preprocess Chinese paragraphs into words and store them in the database.
Since MySQL 5.7.6, MySQL has built-in ngram full-text parser to support Chinese, Japanese and Korean word segmentation.
The version of MySQL used in this article is 5.7.22 focus InnoDB database engine.
Ngram full text parser
A ngram is a sequence of n consecutive words in a paragraph. The ngram full-text parser can segment text, and each word is a continuous sequence of n words. For example, use the ngram full-text parser to segment "Happy Birthday":
Happy, happy, happy.
The global variable ngram_token_size is used in MySQL to configure the size of n in ngram, which ranges from 1 to 10 and the default value is 2. Ngram_token_size is usually set to the minimum number of words to query. If you need to search for words, set ngram_token_size to 1. In the case of the default value of 2, the search word will not get any results. Because Chinese words are at least two Chinese characters, the default value of 2 is recommended.
There are two ways to set the global variable ngram_token_size:
1. When starting the mysqld command
Mysqld-ngram_token_size=2
2. Modify MySQL configuration file
[mysqld] ngram_token_size=2 creates a full-text index
1. Create a full-text index while creating a table
CREATE TABLE articles (id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY, title VARCHAR, body TEXT, FULLTEXT (title, body) WITH PARSER ngram) ENGINE = INNODB
2. Add it through alter table
ALTER TABLE articles ADD FULLTEXT INDEX ft_index (title,body) WITH PARSER ngram
3. Directly through create index
CREATE FULLTEXT INDEX ft_index ON articles (title,body) WITH PARSER ngram; full text Retrieval Model
There are two commonly used full-text retrieval modes:
1. Natural language pattern (NATURAL LANGUAGE MODE)
Natural language mode is the default full-text retrieval mode of MySQL. Natural language patterns cannot use operators and cannot specify complex queries such as keywords that must or must not appear.
2. BOOLEAN mode (BOOLEAN MODE)
BOOLEAN mode can use operators to support complex queries such as specifying that keywords must or must not appear or that keywords have a high or low weight.
Example SELECT * FROM articlesWHERE MATCH (title,body) AGAINST ('Belt and Road' IN NATURAL LANGUAGE MODE); / / No mode is specified, and the natural language mode SELECT * FROM articlesWHERE MATCH (title,body) AGAINST ('Belt and Road') is used by default.
Example
The results returned in the above example are automatically sorted by correlation, with the highest correlation in front. The value of the correlation is a non-negative floating-point number, and 0 indicates no correlation.
/ / get the correlation value SELECT id,title,MATCH (title,body) AGAINST ('mobile' IN NATURAL LANGUAGE MODE) AS scoreFROM articlesORDER BY score DESC
Example
/ / get the number of matching result records SELECT COUNT (*) FROM articlesWHERE MATCH (title,body) AGAINST ('Belt and Road' IN NATURAL LANGUAGE MODE)
You can use BOOLEAN mode to perform advanced queries.
/ / must include "Tencent" SELECT * FROM articlesWHERE MATCH (title,body) AGAINST ('+ Tencent'IN BOOLEAN MODE)
Example
/ / must include "Tencent", but cannot include "communication tool" SELECT * FROM articlesWHERE MATCH (title,body) AGAINST ('+ Tencent-communication tool'IN BOOLEAN MODE)
Example
The following example demonstrates the use of operators in BOOLEAN mode:
'apple banana' has no operator, or, either contains apple, or contains banana'+apple + juice' must contain both words'+ apple macintosh' must contain apple, but it is more relevant if it also contains macintosh.' + apple-macintosh' must contain apple and not macintosh.' + apple ~ macintosh' must contain apple, but if it also contains macintosh, the correlation is lower than that of records without macintosh.' + apple + (> juice)
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.