Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Character set in MySQL 5.6

2025-01-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)06/01 Report--

This article introduces the character set in MySQL 5.6.It is basically compiled when I was studying the MySQL 5.6manual.

Introduction

Basic concept

A character set (character set) is a mapping set of encodings and character symbols. Collation (collation) is a set of rules used to compare characters in a character set.

Now let's customize a simple character set character set. Suppose we have an alphabet of only four letters: a, B, a, b. We give each letter a number: a = 0 ~ (th) B = 1 ~ () a = 2 ~ (b) = 3. The letter An is a character symbol, the number 0 is the coding of A, and the mapping of these four letters and their codes is a character set.

Suppose we want to compare the values of two strings: an and B. The easiest way is to look at their codes: a has a code of 0 and B has a code of 1. Because 0 is less than 1, we say An is less than B. All we do is apply the collation to the character set. The collection of these rules (there is only one rule in this example) is the collation collation: compare their encodings. Of course, real-world character sets and collations are much more complex, but the basic idea is that mentioned above.

Repertoire is a collection of characters in a character set. Every string expression has a repertoire attribute, which can have two values:

ASCII: the expression can only contain Unicode-encoded characters in the range of Ubun0000 to Ubun007F.

UNICODE: expressions can contain Unicode-encoded characters in the range of Ubun0000 to U+10FFFF. This includes characters in the range of Basic Multilingual Plane (BMP) (Ubun0000 to U+FFFF) and supplementary characters outside the range of BMP (Utt01000 to U+10FFFF).

The ASCII range is a subset of the UNICODE range, so an ASCII repertoire string can be safely converted to a character set of UNICODE repertoire strings (or any character set that contains an ASCII range) without any problems. From here, we can draw two conclusions:

1. A specific character set has a specific range of repertoire, which limits that tables / fields that use this character set can only use a specific range of characters, and characters out of range will not be supported by that character set. For example, the repertoire range of the ascii character set is ASCII, so if you use this character set, you can only use Unicode to encode characters in the range from 0000F to 007F.

2. The character set of the character subset can be safely converted to the character set of its superset without any problems. This is useful when mixing different character sets, and MySQL can do the automatic conversion. However, the reverse is not possible.

Common character sets in MySQL:

Utf8 character set: a UTF-8-encoded Unicode character set that takes up 1 to 3 bytes per character. Only characters within the BMP range can be covered, including not only English characters, but also Chinese characters.

Utf8mb4 character set: a UTF-8-encoded Unicode character set that takes up 1 to 4 bytes per character. Characters and supplementary characters within the BMP range can be overridden. The character encoding in the BMP range is exactly the same as that in the utf8 character set, and the length is exactly the same, so the utf8mb4 character set is compatible with the utf8 character set.

Character set of metadata

Metadata is "data about data". Anything that describes a database-as opposed to what is in the database-is metadata. Therefore, most of the string results of column names, database names, user names, version names, and SHOW commands are metadata. This is also true for the contents of the tables in the information_schema library, because by definition, those tables contain information about database objects. The presentation of metadata must meet these requirements:

All metadata must use the same character set. Otherwise, SHOW or SELECT commands on tables in the information_schema library will not work properly because different rows of the same column will be in different character sets in the results returned by these operations.

The metadata must contain all characters in all languages. Otherwise, users will not be able to name tables or columns in their own local language.

To meet the above requirements, MySQL stores metadata in the Unicode character set, UTF-8 to be exact. As long as you don't use dialect or non-Latin characters, this won't be a problem. But if you use it, you should be aware that the metadata is the UTF-8 character set.

MySQL sets the system variable character_set_system to the same value as the character set used by the metadata:

Mysql > SHOW VARIABLES LIKE 'character_set_system'

+-+ +

| | Variable_name | Value |

+-+ +

| | character_set_system | utf8 |

+-+ +

Just because Unicode is used for storing metadata does not mean that when the server returns the column name of the result of the DESCRIBE function, it defaults to the character set set by the character_set_system variable. When you use the SELECT column1 FROM t command, the column1 returned by the server to the client, the character set of the column name itself, is determined by the value of the system variable character_set_results, and the default value is latin1:

Mysql > SHOW VARIABLES LIKE 'character_set_results'

+-+ +

| | Variable_name | Value |

+-+ +

| | character_set_results | latin1 |

+-+ +

If you want the server to return metadata to the client using a different character set, use the SET NAMES'character_set' command to force the server to convert the current character set to the specified character_set. This command automatically sets relevant system variables such as character_set_results, which is valid only for the current session. If character_set_results is set to NULL, the server returns metadata without conversion and uses its original character set (that is, the character set indicated by the variable character_set_system).

In addition, the client program can also perform character set conversion after receiving the data returned by the server. Performing character set conversion on the client side is more efficient, but not all clients support this feature.

As can be seen from the above instructions, character set issues affect not only data storage, but also communication between the client and the MySQL server. If you want the client program to communicate with the server using a different character set than the default character set, you need to specify which one. For example, to use the utf8 character set, execute the command:

Mysql > SET NAMES 'utf8'

Of course, because the relevant variables have not been set in the MySQL configuration file, the settings here are not persistent. Takes effect only in the current session.

Character set when storing data

MySQL supports specifying the character set used at the MySQL server (server), database (database), table (table), and column (column) levels. MySQL supports configuring character sets for MyISAM, MEMORY, and InnoDB storage engines.

1. The MySQL server has a server character set and server collation, which are controlled by the variable character_set_server and the variable collation_server respectively. You can set the values of these two options in the MySQL configuration file or in the MySQL startup options, and you can also use the set command to dynamically modify them, with both global values and session values. The default value for character_set_server is the latin1 character set, and the default value for collation_server is the latin1_swedish_ci collation. If you specify a character set instead of a collation, the system automatically sets the collation to the default collation for that character set. For example, if you set the value of character_set_server to the utf8mb4 character set but not the value of collation_server, the value of collation_server will automatically become the utf8mb4_general_ci collation, because utf8mb4_general_ci is the default collation of the utf8mb4 character set.

[root@gw ~] # vim / usr/my.cnf

[mysqld]

Character-set-server=utf8mb4

Collation-server=utf8mb4_general_ci

If you do not specify a database character set and collation when you create a database using the CREATE DATABASE command, then the server-side character set and collation are used as default values, and that is what they are used for. Therefore, it can be said that server-side character sets and collations are possible default values for data (except metadata) in the MySQL database.

2. Each library in MySQL has a database character set and database collation. Both the CREATE DATABASE and ALTER DATABASE statements have options to specify the database character set and collation:

CREATE DATABASE db_name

[[DEFAULT] CHARACTER SET charset_name]

[[DEFAULT] COLLATE collation_name]

ALTER DATABASE db_name

[[DEFAULT] CHARACTER SET charset_name]

[[DEFAULT] COLLATE collation_name]

MySQL determines the character set and collation of the database as follows:

If both CHARACTER SETcharset_name and COLLATEcollation_name are specified, then the database character set and collation are the specified charset_name and collation_name.

If only CHARACTER SETcharset_name is specified and no COLLATE is specified, the database character set and collation are the specified charset_name and the default collation for that character set.

If only COLLATEcollation_name is specified and no CHARACTER SET is specified, then the database character set and collation are the character set associated with that collation_name and the specified collation_name.

If neither CHARACTER SETcharset_name nor COLLATEcollation_name is specified, then the database character set and collation use the MySQL server character set and collation (see the previous section).

The default character set and collation of the database can be obtained from the two system variables character_set_database and collation_database. To view the default character set and collation for the specified database, use the command:

Mysql > use db_name

Mysql > SELECT @ @ character_set_database, @ @ collation_database

You can also use the following command:

Mysql > SELECT DEFAULT_CHARACTER_SET_NAME, DEFAULT_COLLATION_NAME

FROM INFORMATION_SCHEMA.SCHEMATA WHERE SCHEMA_NAME = 'db_name'

Database character sets and collations affect these aspects of server operation:

For CREATE TABLE statements, if the character set and collation are not explicitly specified when the table is created, the database character set and collation are used as the default character set and collation for the table. To override this behavior, explicitly use the CHARACTER SET and COLLATE options.

For LOAD DATA statements that do not include the CHARACTER SET option, the server uses the character set indicated by the variable character_set_database to parse the information in the file. To override this behavior, explicitly use the CHARACTER SET option.

For stored programs (procedures and functions), if the declaration of the character data parameter (character data parameters) does not use the CHARACTER SET and COLLATE options when the program is created, then the database character set and collation will be used as the character set and collation of the character data parameter. To override this behavior, explicitly use the CHARACTER SET and COLLATE options.

3. Each table has a set of table characters and table collation. Both the CREATE TABLE and ALTER TABLE statements have options to specify the table character set and collation:

CREATE TABLE tbl_name (column_list)

[[DEFAULT] CHARACTER SET charset_name]

[COLLATE collation_name]]

ALTER TABLE tbl_name

[[DEFAULT] CHARACTER SET charset_name]

[COLLATE collation_name]

Example:

CREATE TABLE T1 (...)

CHARACTER SET latin1 COLLATE latin1_danish_ci

MySQL determines the character set and collation of the table as follows:

If both CHARACTER SETcharset_name and COLLATEcollation_name are specified, then the table character set and collation are the specified charset_name and collation_name.

If only CHARACTER SETcharset_name is specified and no COLLATE is specified, the table character set and collation are the specified charset_name and the default collation for that character set.

If only COLLATEcollation_name is specified and no CHARACTER SET is specified, then the table character set and collation are the character set associated with the collation_name and the specified collation_name.

If neither CHARACTER SETcharset_name nor COLLATEcollation_name is specified, then the table character set and collation use the database character set and collation (see the previous section).

The table character set and collation are used as default values in the column definition if the column character set and collation are not specified in a single column definition. Table character sets and collations are extensions of MySQL and are not part of the standard SQL.

4. Each "character" column (that is, a column of type CHAR, VARCHAR, or TEXT) has a column character set and column collation. There are options in both CREATE TABLE and ALTER TABLE statements to specify column character sets and collations:

Col_name {CHAR | VARCHAR | TEXT} (col_length)

[CHARACTER SET charset_name]

[COLLATE collation_name]

There are also options in the columns of ENUM and SET types:

Col_name {ENUM | SET} (val_list)

[CHARACTER SET charset_name]

[COLLATE collation_name]

Example:

CREATE TABLE t1

(

Col1 VARCHAR (5)

CHARACTER SET latin1

COLLATE latin1_german1_ci

);

ALTER TABLE t1 MODIFY

Col1 VARCHAR (5)

CHARACTER SET latin1

COLLATE latin1_swedish_ci

MySQL determines the character set and collation of columns as follows:

If both CHARACTER SETcharset_name and COLLATEcollation_name are specified, then the column character set and collation are the specified charset_name and collation_name.

If only CHARACTER SETcharset_name is specified and no COLLATE is specified, the column character set and collation are the specified charset_name and the default collation for that character set.

If only COLLATEcollation_name is specified and no CHARACTER SET is specified, then the column character set and collation are the character set associated with the collation_name and the specified collation_name.

If neither CHARACTER SETcharset_name nor COLLATEcollation_name is specified, then the column character set and collation use the table character set and collation (see the previous section).

The character set of the literal amount of a string

In addition to what I mentioned earlier, each string literal (string literal) has a character set and collation. For the simple statement SELECT 'string', the string used here has a concatenated default character set and a collation (connection default character set and collation), controlled by the system variables character_set_connection and collation_connection, respectively. A string literal may have an optional character set introducer and COLLATE clause to specify it as a string that uses a specific character set and collation:

[_ charset_name] 'string' [COLLATE collation_name]

Example:

SELECT 'abc'

SELECT _ latin1'abc'

SELECT _ binary'abc'

SELECT _ utf8'abc' COLLATE utf8_danish_ci

The expression _ charset_name is officially called introducer. It tells the parser that the string followed by it uses the character set charset_name. MySQL determines the character set and sorting rules for the literal amount of strings as follows:

If both _ charset_name and COLLATEcollation_name are specified, then the string literal character set and collation are the specified charset_name and collation_name.

If only _ charset_name is specified and no COLLATE is specified, the string literal character set and collation are the specified charset_name and the default collation for that character set.

If only COLLATEcollation_name is specified and no _ charset_name is specified, the string literal character set and collation are the concatenation default character set specified by the system variable character_set_connection and the specified collation_name. Collation_name must be one of the collations to connect to the default character set.

If neither _ charset_name nor COLLATEcollation_name is specified, then the string literal character set and collation use the concatenation default character set and collation specified by the system variables character_set_connection and collation_connection.

Connect related character sets

Each client has a connection-related (connection-related) character set and collation. A "connection" is the connection when you connect to the MySQL server. The client sends SQL statements, such as queries, over a connection to the server. The server returns data, such as query results or error messages, through a connection to the client. This leads to the following problems with character sets and collations related to client connections:

1. What is the character set of the statement when it leaves the client?

The server treats the value of the character_set_client system variable as the character set of the SQL statement sent by the client.

2. After receiving the SQL statement, what character set will the server convert it into?

The server converts the character set of the SQL statement sent by the client from character_set_client to the character set pointed to by character_set_connection (except for the literal amount of strings with introducer). The literal comparison of strings uses the collation that collation_connection points to. Comparisons of strings with column values have nothing to do with collation_connection, because columns have their own collation, which has higher priority.

3. What character set will the server convert SQL query results or error messages to before sending them back to the client?

The server converts the character set of the SQL query result to the character set pointed to by the system variable character_set_results. The data to be converted includes column values and metadata (such as column names and error messages).

There are two commands that uniformly set the connection-related character set:

1. SET NAMES command

The SET NAMES command tells the MySQL server what character set the client uses to interact with it. The value set by the SET NAMES command takes effect only for the current session. Syntax format:

SET NAMES 'charset_name' [COLLATE' collation_name']

A SET NAMES 'charset_name' statement (COLLATE' collation_name' is optional) is actually equivalent to the following three statements:

SET character_set_client = charset_name

SET character_set_results = charset_name

SET character_set_connection = charset_name

/ / this third statement also implies that the variable collation_connection is set to the default collation of the charset_name character set

2. SET CHARACTER SET command

The SET CHARACTER SET command is similar to SET NAMES, but sets character_set_connection and collation_connection to character_set_database and collation_database. Syntax format:

SET CHARACTER SET charset_name

A SET CHARACTER SET charset_name command is actually equivalent to the following three statements:

SET character_set_client = charset_name

SET character_set_results = charset_name

SET collation_connection = @ @ collation_database

/ / this third statement also implies the execution command SET character_set_connection = @ @ character_set_database

For the MySQL client programs mysql, mysqladmin, mysqlcheck, mysqlimport, and mysqlshow, they follow the following rules to determine the character set to use when communicating with the MySQL server:

1. In the absence of other information, they use the default character set (usually latin1) to communicate with the MySQL server. The default character set for this preset can also be specified in the MySQL configuration file with the following options:

[root@gw ~] # vim / usr/my.cnf

[client]

Default-character-set=utf8mb4

2. The program can automatically detect which character set to use, based on operating system settings, such as the value of LANG or LC_ALL environment variables. For example, if the value of the LANG environment variable in the operating system is ru_RU.KOI8-R, it causes these client programs to use the koi8r character set. This will take precedence over point 1.

3. These programs support the use of the option-default-character-set to allow the user to explicitly specify the character set to override the character set automatically determined by the program. This will take precedence over the first and second points.

If you want the server not to perform character set conversion when returning SQL query results or error messages to the client, you can set the character_set_results variable to NULL or binary:

SET character_set_results = NULL

View character set

To display all the character sets supported in MySQL, you can view the information_schema.character_ sets table or use the show character set command, followed by a LIKE or WHERE clause to filter:

Mysql > SHOW CHARACTER SET

Mysql > select * from information_schema.character_sets

A given character set will have at least one collation. Most character sets support multiple collations, one of which is the default. To display all the collations supported by the character set, you can look at the information_ schemas. Collations table or use the show collation command, followed by a LIKE or WHERE clause to filter:

Mysql > SHOW COLLATION

Mysql > select * from information_schema.collations

If you want to use all the system variables related to character sets and collations, you can use the command:

Mysql > SHOW VARIABLES WHERE Variable_name LIKE 'character_set%' OR Variable_name LIKE' collation%'

After execution, you can see the following variables:

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report