In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly introduces the reasons for setting garbled code in Mysql character set, which can be used for reference by interested friends. I hope you can learn a lot after reading this article.
Basic concept
Character (Character) refers to the smallest semantic symbol in human language. For example,'A','B', etc.
Given a series of characters, each character is assigned a numeric value that represents the corresponding character, which is the Encoding of the character. For example, if we assign a value of 0 to the character'A' and a value of 1 to the character'B', then 0 is the encoding of the character'A'.
[@ more@]
Given a series of characters and given the corresponding encoding, the set of all these characters and coding pairs is the Character Set. For example, if the list of characters given is {'Aggregano'}, {'A'= > 0, 'Bai = > 1} is a character set.
Character order (Collation) refers to the comparison rules between characters in the same character set.
After the character order is determined, what is the equivalent character and the size relationship between the characters can be defined on a character set
Each character order uniquely corresponds to a character set, but a character set can correspond to multiple character orders, one of which is the default character order (Default Collation)
Character order names in MySQL follow naming conventions: they start with the character set name corresponding to the character order, and end with _ ci (for case insensitivity), _ cs (for case sensitivity), or _ bin (for comparison by coded value). For example, under the character order "utf8_general_ci", the characters "a" and "A" are equivalent.
MySQL character set Settings
System variable:
-character_set_server: default internal operation character set
-character_set_client: the character set used by the client source data
-character_set_connection: connection layer character set
-character_set_results: query result character set
-character_set_database: the default character set of the currently selected database
-character_set_system: system metadata (field name, etc.) character set
-there are also variables starting with collation_ that correspond to the above to describe the character order.
Use introducer to specify the character set of the text string:
-format: [_ charset] 'string' [COLLATE collation]
-for example:
SELECT _ latin1 'string'
SELECT _ utf8 'Hello' COLLATE utf8_general_ci
-the text string modified by introducer is directly converted to internal character set processing without redundant transcoding during the request process.
Character set conversion process in MySQL
1. MySQL Server converts the request data from character_set_client to character_set_connection when it receives the request
two。 To convert the request data from character_set_connection to internal operation character set before internal operation, the determination method is as follows:
Use the CHARACTER SET of each data field to set the value
If the above value does not exist, the DEFAULT CHARACTER SET setting value of the corresponding data table is used (MySQL extension, non-SQL standard)
If the above value does not exist, the DEFAULT CHARACTER SET of the corresponding database is used to set the value
If the above value does not exist, use character_set_server to set the value.
3. Converts the result of the operation from the internal operation character set to character_set_results.
Analysis of common problems
The connection character set is not set before inserting utf8-encoded data into the data table with the default character set of utf8. When querying, the connection character set is set to utf8.
-character_set_client, character_set_connection, and character_set_results are all latin1 when inserted according to the default settings of the MySQL server
-the data of the insert operation will go through the character set conversion process of latin1= > latin1= > utf8. In this process, each inserted Chinese character will be changed from 3 bytes to 6 bytes.
-the result of the query will go through the character set conversion process of utf8= > utf8, and the saved 6 bytes will be returned intact, resulting in garbled code.
The connection character set is set to utf8 before inserting utf8-encoded data into the data table with the default character set of latin1
-character_set_client, character_set_connection and character_set_results are all utf8 according to the connection character set setting when inserting
-the inserted data will be converted into the character set of utf8= > utf8= > latin1. If the original data contains Unicode characters outside the range of u0000~u00ff, it will be converted to the "?" (0 × 3F) symbol because it cannot be represented in the latin1 character set. No matter how the connection character set is set, the content of the original data cannot be restored later.
Some means of detecting character set problems
SHOW CHARACTER SET
SHOW COLLATION
SHOW VARIABLES LIKE 'character%'
SHOW VARIABLES LIKE 'collation%'
SQL functions HEX, LENGTH, CHAR_LENGTH
SQL functions CHARSET, COLLATION
Recommendations when using the MySQL character set
Try to explicitly indicate the character set used when establishing a database / table and performing database operations, rather than relying on the default settings of MySQL, otherwise MySQL upgrades may cause great trouble
Although the problem of garbled code can be solved in most cases when using latin1 for both database and connection character sets, the disadvantage is that the SQL operation cannot be carried out on a character-by-character basis. In general, setting the database and connection character set to utf8 is a better choice.
When using mysql C API, set the MYSQL_SET_CHARSET_NAME property to utf8 with mysql_options immediately after initializing the database handle, so that you do not have to specify the connection character set explicitly with the SET NAMES statement, and the connection character set will be reset to utf8 when you reconnect a disconnected persistent connection with mysql_ping
For mysql PHP API, the total running time of page-level PHP programs is relatively short, so you can explicitly set the connection character set with SET NAMES statement after connecting to the database; but when using persistent connections, be careful to keep the connection open and explicitly reset the connection character set with SET NAMES statement after disconnection and reconnection.
Other considerations
The default_character_set setting in my.cnf only affects the connection character set when the mysql command connects to the server, and does not have any effect on applications that use the libmysqlclient library!
SQL function operations on fields are usually performed in an internal operation character set and are not affected by the connection character set setting.
Naked strings in SQL statements are affected by the connection character set or introducer settings, and operations such as comparisons can produce completely different results, so be careful!
Thank you for reading this article carefully. I hope the article "what are the reasons for the garbled Mysql character set" shared by the editor will be helpful to you. At the same time, I also hope that you will support us and pay attention to the industry information channel. More related knowledge is waiting for you to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.