Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the use of COLLATE in MYSQL

2025-02-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)05/31 Report--

This article is to share with you about the use of COLLATE in MYSQL. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.

What is COLLATE in MYSQL?

When you execute the show create table instruction in mysql, you can see the table creation statement of a table. The example is as follows:

CREATE TABLE `table1` (`id` bigint (20) unsigned NOT NULL AUTO_INCREMENT, `field1` text COLLATE utf8_unicode_ci NOT NULL COMMENT 'field 1percent, `field2` varchar' COLLATE utf8_unicode_ci NOT NULL DEFAULT''COMMENT' field 2percent, PRIMARY KEY (`id`)) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8_unicode_ci

We can read most of the fields, but today we are talking about the COLLATE keyword. What does the corresponding utf8_unicode_ci after this value mean? When you take the DBA with this question during the interview, you should be able to stumble a large number of people.

What is COLLATE used for?

Developers using phpmyadmin may look familiar because the Chinese header already gives the answer:

Phpmyadmin screenshot

The so-called utf8_unicode_ci is actually a rule used for sorting. For columns of those character types in mysql, such as columns of type VARCHAR,CHAR,TEXT, you need a COLLATE type to tell mysql how to sort and compare the column. In short, COLLATE affects the order of ORDER BY statements, the results filtered by the greater than less sign in the WHERE condition, and the query results of * * DISTINCT**, * * GROUP BY**, * * HAVING** statements. In addition, when mysql builds an index, if the index column is a character type, it will also affect index creation, but we are not aware of this effect. In short, everything that involves character type comparison or sorting has something to do with COLLATE.

Differences between various COLLATE

COLLATE is usually related to data encoding (CHARSET). Generally speaking, each CHARSET has multiple COLLATE that it supports, and each CHARSET specifies a COLLATE as the default. For example, the default COLLATE for Latin1 encoding is latin1_swedish_ci,GBK encoding, and the default COLLATE for gbk_chinese_ci,utf8mb4 encoding is utf8mb4_general_ci.

By the way, there are two kinds of codes: utf8 and utf8mb4 in mysql. Please forget that * * utf8**, always uses * * utf8mb4** in mysql. This is a legacy of mysql. Utf8 in mysql can only support character encoding of 3bytes length at most. For some text that needs to occupy 4bytes, mysql's utf8 does not support it. You need to use utf8mb4.

Many COLLATE have the word _ ci, an acronym for Case Insensitive, which is case-insensitive, meaning that "A" and "a" are sorted and compared equally. Selection * from table1 where field1= "a" can also select the value where field1 is "A". At the same time, COLLATE with the _ cs suffix is Case Sensitive, that is, case-sensitive.

Using the show collation directive in mysql, you can see all the COLLATE supported by mysql. Take utf8mb4 as an example, all the COLLATE supported by this code is shown in the following figure.

All COLLATE related to utf8mb4 in mysql

In the picture, we can see the sorting rules of the languages of many countries. Utf8mb4_general_ci (default), utf8mb4_unicode_ci and utf8mb4_bin are commonly used in China. Let's explore the difference between these three:

First of all, the comparison method of utf8mb4_bin is to directly treat all characters as binary strings, and then compare them from the highest bit to the lowest bit. So it's obviously case-sensitive.

However, there is no difference between utf8mb4_unicode_ci and utf8mb4_general_ci for Chinese and English. For the system we developed for domestic use, you can choose whichever one you like. It's just that for some Western letters, utf8mb4_unicode_ci is more in line with their language habits than utf8mb4_general_ci. General is an older standard for mysql. For example, the German letter "ss" is equivalent to "ss" in utf8mb4_unicode_ci (this is in line with the German custom), while in utf8mb4_general_ci it is equivalent to the letter "s". However, the slight differences between the two codes are hard to perceive for normal developers. In itself, we rarely use text fields to sort directly, to say the least, even if this letter is misarranged one or two, can it really bring catastrophic consequences to the system? From the discussion of various posts found on the Internet, more people recommend using utf8mb4_unicode_ci, but for systems that use default values, they are not very exclusive and do not think there is any big problem. Conclusion: utf8mb4_unicode_ci is recommended, and there is no need to take time to modify the system that has already used utf8mb4_general_ci.

It is also important to note that since mysql 8.0, the default CHARSET for mysql has been changed from Latin1 to utf8mb4 (reference link), and the default COLLATE has been changed to utf8mb4_0900_ai_ci. Utf8mb4_0900_ai_ci is generally a further subdivision of unicode, 0900 refers to the number of the unicode comparison algorithm (Unicode Collation Algorithm version), and ai means accent insensitive (pronunciation independent), such as e, è, é, ê and ë are treated equally. Related reference link 1, related reference link 2

COLLATE setting level and its priority

Setting COLLATE can be specified at the sample level, library level, table level, column level, and SQL. The COLLATE setting at the instance level is the collation_connection system variable in the mysql configuration file or startup instruction.

The statement to set COLLATE at the library level is as follows:

CREATE DATABASE DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci

If CHARSET and COLLATE are not set at the library level, the default CHARSET and COLLATE settings at the library level use the instance level settings. In the following versions of mysql8.0, if you don't change anything, the default CHARSET is Latin1 and the default COLLATE is latin1_swedish_ci. Starting with mysql8.0, the default CHARSET has been changed to utf8mb4 and the default COLLATE has been changed to utf8mb4_0900_ai_ci.

The COLLATE setting at the table level is to add relevant setting statements to the CREATE TABLE, such as:

CREATE TABLE (…) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

If CHARSET and COLLATE are not set at the table level, the CHARSET and COLLATE at the library level are inherited at the table level.

Column level setting is specified when the column is declared in CREATE TABLE, for example

CREATE TABLE (`field1` VARCHAR (64) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL DEFAULT', …) ……

If CHARSET and COLATE are not set at the column level, the column level inherits CHARSET and COLLATE at the table level.

Finally, you can also display the declaration COLLATE when writing a SQL query to override the COLLATE settings of any library table column, which is less commonly used. Just know:

SELECT DISTINCT field1 COLLATE utf8mb4_general_ci FROM table1;SELECT field1, field2 FROM table1 ORDER BY field1 COLLATE utf8mb4_unicode_ci

If all settings are displayed, the priority order is SQL statement > column level setting > table level setting > library level setting > instance level setting. That is, the COLLATE specified on the column can override the COLLATE specified on the table, and the COLLATE specified on the table can override the COLLATE at the library level. If not specified, the settings at the next level are inherited. If no COLLATE is specified above the column, the COLLATE for that column is the same as that set on the table.

The above is the COLLATE related knowledge about mysql. However, in the system design, we should try our best to avoid making the system rely heavily on the sorting results of Chinese fields, and we should also try to avoid using Chinese as query conditions in mysql queries.

Thank you for reading! This is the end of this article on "what is the use of COLLATE in MYSQL?". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, you can share it for more people to see!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report