Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the MySQL character sets?

2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)05/31 Report--

Today, I will talk to you about what the MySQL character set is, which many people may not know very well. in order to make you understand better, the editor has summarized the following content for you. I hope you can get something according to this article.

1. Character set basis

First, find out where the MySQL character set is used. Enter the VARIABLES parameters of the database at a glance:

Mysql > SHOW VARIABLES WHERE variable_name LIKE'% character%' OR variable_name LIKE'% collation%' +-- +-+ | Variable_name | Value | +- -+-+ | character_set_client | utf8mb4 | | character_set_connection | utf8mb4 | | character_set_database | utf8mb4 | | character_set_filesystem | binary | | character_set_results | utf8mb4 | | character_set_server | utf8mb4 | | character_set_system | utf8 | | | character_sets_dir | / opt/idc/mysql8.0.23/share/charsets/ | | collation_connection | utf8mb4_bin | | collation_database | utf8mb4_bin | | collation_server | utf8mb4_bin | | default_collation_for_utf8mb4 | utf8mb4_0900_ | Ai_ci | +-+-+

From the above client,connection,database,results layers of links, any link of the character set is not compatible will lead to garbled problems.

The MySQL8.0 utf8mb4 Unicode character set is a new generic sorting category, named utf8mb4_0900_ai_ci. MySQL 8.0 defaults to utf8mb4_0900_ai_ci, which is one of the utf8mb4_unicode_ci.

What does each of these fields mean:

Uft8mb4 represents the UTF-8 encoding scheme, with a maximum of 4 bytes per character.

Utf8mb3 is also the UTF-8 encoding of the Unicode character set, using one to three bytes per character. (utf8: alias for utf8mb3)

0900 is the version of the Unicode proofreading algorithm. The Unicode classification algorithm is used to compare two Unicode strings that meet the requirements of the Unicode standard.

Ai means insensitive accent. In other words, there is no difference between e, è, é, ê and ë, and there is no distinction between stress.

Ci means case-insensitive. There is no difference between p and P when sorting.

Let's take a look at the problem points of utf8 and utf8mb4 in some scenarios:

Utf8 encoding supports up to 3 bytes of data, while emoji emoticons and partial characters cannot be stored in 4-byte utf8. The utf8mb4 character set is extended to solve this problem.

Commonly used character sets:

Utf8mb4_bin: compiles and stores each character of a string with binary data, case-sensitive, and can be stored in a binary

Utf8mb4_general_ci:ci, or case insensitive, is case-insensitive. The Unicode collation is not implemented, and the sorting results may be inconsistent when you encounter some special languages or character sets. But in the vast majority of cases, the order of these special characters does not need to be that precise.

Utf8mb4_unicode_ci: is based on the standard Unicode to sort and compare, can be accurately sorted between various languages, Unicode collation in order to be able to handle the case of special characters, to implement a slightly complex sorting algorithm.

two。 Character set configuration

How to set the mysql character set and change the operation:

1) my.cnf configuration file information, which is recommended to be set during initialization.

[mysqld]

Character-set-server = utf8mb4

Collation-server = utf8mb4_unicode_ci

Character-set-client-handshake = FALSE # this is to ignore the client's character set and use the server's settings

2) the string executed by the init_connect='SET NAMES utf8mb4' # server for each connected client does not take effect for some super-managed sources

3) character set changes, including library, table, column changes. Can have their own character set completely.

# # change DATABASE ALTER DATABASE `db1` DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci # # change TABLE ALTER TABLE `t1` DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci # # change column field ALTER TABLE `t1` modify `name` varchar (80) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci COMMENT 'nickname'

4) daily character set check:

SELECT b.SCHEMA_NAME, b.DEFAULT_CHARACTER_SET_NAME, b.DEFAULT_COLLATION_NAME, a.TABLE_NAME, a.TABLE_COLLATION FROM information_schema.SCHEMATA b left join information_schema.TABLES an on b.SCHEMA_NAME = a.Table _ SCHEMA WHERE b.SCHEMA_NAME not in ('information_schema',' mysql','performance_schema', 'sys') ORDER BY TABLE_SCHEMA,TABLE_NAME

Through how convenient to set, change the operation, check. Character sets are no longer a hidden problem.

3. The influence of character set on Database

The influence of character set on the whole database is considerable. Library changes do not affect pre-existing table fields, and so on. So the database > table > fields can be set separately.

FAQ 1: there is an index that does not go because the character set is implicitly converted

FAQ 2: differences in trailing spaces

The comparison of string values (CHAR, VARCHAR, and TEXT) is different from other collations in terms of trailing spaces. For example,'a 'and'a' are compared as different strings, not the same string.

For character set sorting, the spaces at the end of the string are also handled accordingly.

Mysql > SELECT COLLATION_NAME, PAD_ATTRIBUTE FROM INFORMATION_SCHEMA.COLLATIONS WHERE CHARACTER_SET_NAME = 'utf8mb4' +-- +-+ | COLLATION_NAME | PAD_ATTRIBUTE | +-- +-+ | utf8mb4_general_ci | PAD SPACE | | utf8mb4_bin | PAD SPACE | | utf8mb4_unicode_ci | PAD SPACE | | utf8mb4_icelandic_ci | PAD SPACE | | utf8mb4_latvian_ci | PAD SPACE | |. | | utf8mb4_0900_ai_ci | NO PAD | | utf8mb4_de_pb_0900_ai_ci | NO PAD | | utf8mb4_0900_bin | NO PAD | +-+ + |

Note: spaces need to be processed after choosing to use the utf8mb4_0900 character set.

FAQ 3: case sensitivity to data

In addition to lower_case_table_names, how to effectively use the uppercase and lowercase character set settings, the use of ut8mb4_bin character set, query and data insertion solution.

FAQ 4: emoji, partial new words

Common error code: 1366 Incorrect string value:'\ xF0\ x9F\ x99\ x82' for column 'name' at row 2

Specify the Utf8mb4 character set, along with the character-set-client-handshake attribute

Note: there is no such character set as utf8mb4 for jdbc.

After reading the above, do you have any further understanding of the MySQL character set? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report