A brief understanding of mysql character set and proofreading rules 07/06 Update SLTechnology News&Howtos

A brief understanding of mysql character set and proofreading rules

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

The following mainly brings you a simple understanding of the mysql character set and proofreading rules. I hope these words can bring you practical use, which is also the main purpose of my editor's simple understanding of the mysql character set and proofreading rules. All right, don't talk too much nonsense, let's just read the following.

The default rules for MySQL database name, table name, column name, alias case under Linux are as follows:

1. Database names and table names are strictly case-sensitive.

two。 The aliases of the table are strictly case sensitive.

3. Column names and column aliases are case ignored in all cases.

4. The content of the field (that is, data) is case-insensitive by default.

5. Variable names (functions and stored procedures) are also strictly case-sensitive.

So we can't help but want to modify it, and what on earth is it that controls it? It is the character set and the proofreading rules of the character set.

What are the character set and proofreading rules?

The character set is the coding table of our character parsing. At the bottom of the computer, any character is just the code that can not be parsed directly. The most basic is the ASCII code table, but this table has too few characters, so it is OK to represent English and some daily punctuation marks, but it is obviously not enough to represent so many characters in the world, so there are various character sets to parse all kinds of characters, such as gb2312 in simplified Chinese, big5 in traditional Chinese, the most famous Universal Code utf8, and utf8mb4 that supports emoji expressions.

The access garbled we often talk about is caused by the asymmetry of character coding, which may be between you and the cloud server, between the cloud server and the database, or between the internal code, and so on.

Character set is universal and exists in all kinds of environments in the computer world, database is only one of them, and proofreading rules are for mysql, and the rules are fixed. Some people may feel confused, with the character set, why to proofread rules, easy to understand, there are differences between characters, but what to rely on to reflect the differences (such as sorting and grouping operations), that is the meaning of this proofreading rule. For example, if it is not case-sensitive, An and a have the same meaning, but after strict proofreading rules and case-sensitive, An and an are not the same.

Character set

We can use the following command to see which character sets and proofing rules are supported.

Generally speaking, mysql still supports more character sets by default, but in most cases we still use more utf8. The reason is to see the last column of Maxlen, which represents the maximum number of bytes occupied by a character after using this character set. Big5 (traditional Chinese) and gb2312 (simplified Chinese) account for less, only 2 bytes, but the versatility is not good. Although utf16 is very powerful, it occupies slightly more bytes and occupies 4 bytes, and it may not be used so much, utf8 is better to use, a character occupies 3 bytes.

To change the default character set of mysql, you can add it to the configuration file my.cnf

Character proofreading rules

After talking about the character set, let's take a look at the proofreading rules for characters.

# to show the proofreading rules under the utf8 character set, select mysql > SHOW COLLATION like 'utf8\ _%'. +-- +-+ | Collation | Charset | Id | Default | Compiled | Sortlen | +- +-+ | utf8_general_ci | utf8 | 33 | Yes | Yes | 1 | | utf8_bin | utf8 | 83 | | Yes | 1 | utf8_unicode_ci | utf8 | | | Yes | 8 | utf8_icelandic_ci | utf8 | 193 | Yes | 8 | utf8_unicode_520_ci | utf8 | 214 | Yes | 8 | utf8_vietnamese_ci | utf8 | 215 | | Yes | 8 | utf8_general_mysql500_ci | utf8 | 223 | | Yes | | | 1 | +-- +-+ 27 rows in set (0.00 sec) |

There are many proofreading rules, but we can't use most of them. As you can see, many other rules have a sortlen of 8, which is more commonly used than the first two. It is certain that it will cost more resources.

Each character set has a default proofreading rule. For example, the default proofreading rule for utf8 is utf8_general_ci. And there are proofreading rule naming conventions: they start with their associated character set name, usually include a language name, and end with _ ci (case-insensitive), _ cs (case-sensitive), or _ bin (binary / case-sensitive).

So, by default, mysql's utf8 character set is not case-sensitive to data.

Change the default character proofreading rules for mysql, or add them to the configuration file my.cnf

# to add [mysqld] # Global default character proofreading rules to the [mysqld] sub-key, set collation_server = utf8_bin# to restart as needed, and enter mysql to have a look. (you can change it one by one) mysql > show variables like 'collation_%'. +-- +-+ | Variable_name | Value | +-+-+ | collation_connection | utf8_general_ci | | collation_database | utf8_bin | | collation _ server | utf8_bin | +-- +-+ 3 rows in set (0.00 sec) # of course, you can only change the library level or link level depending on the actual situation It is allowed to be different, but you should know that the risk is that collation_database = utf8_bin# needs to note that the character proofreading rules at the connection level are determined by the client, and changing the configuration file does not take effect. # so if you want to change, you can only execute mysql > set collation_connection = utf8_bin by command.

Be careful

It is important to note that these changes only take effect on subsequent database tables and users, and existing databases are not affected. If you want to change the definition of existing tables, you must change the table structure or rebuild the database.

The method to change is alter. If the method is created by default, you don't need to do anything. If you don't, you have to specify it artificially.

# change the character set and character proofreading rules of the table mysql > ALTER TABLE table name MODIFY COLUMN field name varchar (50) CHARACTER SET utf8 COLLATE utf8_bin

It should be noted that changing the character proofreading rules of the tables in the existing database does not affect the use of the table, as long as there is no conflict in the data of the table, but part of the query and sorting / grouping operations will cause a gap. However, if the character set is changed randomly, it will directly cause data garbled, so be very careful, sometimes it is not as practical as rebuilding the database.

Of course, if you do need it, you can also specify what character encoding and proofreading rules libraries and tables should use, such as suddenly making a library or table that is specified as utf8mb4.

# specify character encoding and proofreading rules when creating a database. The fields of new tables in this database will follow these two rules by default: mysql > CREATE DATABASE database name DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;# create a table and specify the use of character set and proofreading rules. If other tables are not specified, follow the default configuration of the database: mysql > CREATE TABLE table name (field name varchar (5)) DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci. # create a table that specifies that different fields have different character sets and proofreading rules. If other fields are not specified, the default configuration of the database is CREATE TABLE `table name` (`field 1` varchar (255) CHARACTER SET utf8 COLLATE utf8_general_ci NULL, `field 2` varchar (255) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NULL)

It is also listed at the beginning that mysql's default database name and table name are strictly case-sensitive. If only database name and table name are case-sensitive, my.cnf has a special configuration option. After the configuration is completed, restart will ignore the case rules of database name and table name. Both will take effect together. Lower_case_table_names

# to add [mysqld] # case-sensitive option for table names to the [mysqld] subkey. 0: case-sensitive, 1: case-insensitive lower_case_table_names = restart takes effect

For the above simple understanding of the mysql character set and proofreading rules, we do not find it very helpful. If you need to know more, please continue to follow our industry information. I'm sure you'll like it.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.