What is the essential difference between utf8_general_ci and utf8_unicode_ci in Mysql 07/06 Update SLTechnology News&Howtos

What is the essential difference between utf8_general_ci and utf8_unicode_ci in Mysql

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

The following brings you about the essential difference between utf8_general_ci and utf8_unicode_ci in Mysql. If you are interested, let's take a look at this article. I believe it will be helpful for you to read the essential difference between utf8_general_ci and utf8_unicode_ci in Mysql.

It took me so long to find that I didn't know the difference between utf_bin and utf_general_ci.

Ci is case insensitive, that is, "case insensitive". An and A will be treated as the same in character judgment.

Bin is binary, so an and A will be treated differently.

For example, you run:

SELECT * FROM table WHERE txt ='a'

Then you can't find the line txt ='A'in utf8_bin, but utf8_general_ci can.

Utf8_general_ci is case-insensitive, which you need to use when registering a user name and email.

Utf8_general_cs is case-sensitive. If you use this for your user name and mailbox, it will lead to adverse consequences.

Utf8_bin: string each string is compiled and stored with binary data. Case-sensitive and can store binary content

I. official documentation states

Here is an excerpt of the instructions on utf8_unicode_ci and utf8_general_ci in the Chinese manual of Mysql 5.1:

At present, utf8_unicode_ci proofreading rules only partially support Unicode proofreading rules algorithm. Some characters are still not supported. Also, combined notation is not fully supported. This mainly affects some minority languages in Vietnam and Russia, such as Udmurt, Tatar, Bashkir and Mari.

The main feature of utf8_unicode_ci is its support for extensions, that is, when one letter is treated as equal to other letter combinations. For example, in German and some other languages,'²'equals' ss'.

Utf8_general_ci is a legacy proofreading rule and does not support extensions. It can only be compared one by one between characters. This means that utf8_general_ci proofreading rules are compared quickly, but are less correct than proofreading rules that use utf8_unicode_ci.

For example, using utf8_general_ci and utf8_unicode_ci proofreading rules, the following comparisons are equal:

= A

Boro = O

Ü D = U

The difference between the two proofreading rules is that the following equation holds for utf8_general_ci:

é n = s

However, the following equation holds for utf8_unicode_ci:

ü ber = ss

For a language, the utf8 character set proofreading rules related to the specific language are implemented only if the utf8_unicode_ci sorting is not done well. For example, for German and French, utf8_unicode_ci works well, so you no longer need to create special utf8 proofing rules for both languages.

Utf8_general_ci also works with German and French, except that the word''equals' ss' 'rather than' German'. If your application can accept this, then you should use utf8_general_ci because it is fast. Otherwise, use utf8_unicode_ci because it is more accurate.

If you want to use gb2312 encoding, it is recommended that you use latin1 as the default character set of the datasheet, so that you can insert data in Chinese directly into the command line tool and display it directly. Instead of using character sets such as gb2312 or gbk, if you are worried about query sorting, you can use binary attribute constraints, such as:

Create table my_table (name varchar (20) binary not null default'') type=myisam default charset latin1

Second, a brief summary

There is no real difference between utf8_unicode_ci and utf8_general_ci in Chinese and English.

The proofreading speed of utf8_general_ci is fast, but the accuracy is slightly lower.

The accuracy of utf8_unicode_ci is high, but the proofreading speed is slow.

If your application is in German, French or Russian, be sure to use utf8_unicode_ci. Generally speaking, it is enough to use utf8_general_ci, but no problem has been found up to now.

III. Detailed summary

1. For a language, the utf8 character set proofreading rules related to the specific language will be executed only if the utf8_unicode_ci sorting is not done well. For example, for German and French, utf8_unicode_ci works well, so you no longer need to create special utf8 proofing rules for both languages.

2. Utf8_general_ci also applies to German and French, except for'?' Equals' ss'', not 'outside'. If your application can accept this, then you should use utf8_general_ci because it is fast. Otherwise, use utf8_unicode_ci because it is more accurate.

An overview of the above paragraph in one sentence: utf8_unicode_ci is more accurate and utf8_general_ci is faster. Normally, utf8_general_ci is accurate enough for us to use. After I have seen a lot of program source code, I find that most of them also use utf8_general_ci, so utf8_general_ci is generally used when building a new database.

How to use UTF8 in MySQL5.0

Add the following parameters to my.cnf

[mysqld] init_connect='SET NAMES utf8'default-character-set=utf8default-collation = utf8_general_ci

The execution of the query mysql > show variables; is as follows:

In my opinion, utf8-general is accurate enough for the use of database, and it has an advantage over utf8-unicode in speed, so we can rest assured to adopt it.

Annex 1: methods for upgrading old data

Take the original character set as latin1 as an example, upgrade to utf8 character set. The original table: old_table (default charset=latin1), the new table: new_table (default charset=utf8).

Step 1: export old data

Mysqldump-default-character-set=latin1-hlocalhost-uroot-B my_db-- tables old_table > old.sql

Step 2: transcoding (similar to unix/linux environment)

Iconv-t utf-8-f gb2312-c old.sql > new.sql

Or you can remove the-f parameter and let iconv automatically determine the original character set.

Iconv-t utf-8-c old.sql > new.sql

Here, it is assumed that the original data is gb2312 encoded by default.

Step 3: import

Modify the old.sql by adding a sql statement: "SET NAMES utf8;" before the insert / update statement starts. Save.

Mysql-hlocalhost-uroot my_db < new.sql

It's done!

Attachment 2: MySQL clients that support viewing the utf8 character set are

1.) MySQL-Front, it is said that this project has been stopped by MySQL AB. For some reason, if there are still many cracked versions available for download in China (it does not mean that I recommend using the cracked version:-P).

2.) Navicat, another very good MySQL client, the Chinese version has just come out, and I have been invited to try it. Generally speaking, it is good, but I also have to pay for it.

3.) PhpMyAdmin, the open source php project, is very good.

4.) The terminal tool (Linux terminal) under Linux sets the character set of the terminal to utf8. After connecting to MySQL, SET NAMES UTF8; can also read and write utf8 data.

Read the details above about what is the essential difference between utf8_general_ci and utf8_unicode_ci in Mysql, and whether you have gained anything. If you want to know more about it, you can continue to follow our industry information section.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.