In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
Basic concept
Character (Character) refers to the smallest semantic symbol in human language. For example,'A','B', etc.
Given a series of characters, each character is assigned a numeric value that represents the corresponding character, which is the Encoding of the character. For example, if we assign a value of 0 to the character'A' and a value of 1 to the character'B', then 0 is the encoding of the character'A'.
Given a series of characters and given the corresponding encoding, the set of all these characters and coding pairs is the Character Set. For example, if the list of characters given is {'Aggregano'}, {'A'= > 0, 'Bai = > 1} is a character set.
Character order (Collation) refers to the comparison rules between characters in the same character set.
After the character order is determined, what is the equivalent character and the size relationship between the characters can be defined on a character set
Each character order uniquely corresponds to a character set, but a character set can correspond to multiple character orders, one of which is the default character order (Default Collation)
Character order names in MySQL follow naming conventions: they start with the character set name corresponding to the character order, and end with _ ci (for case insensitivity), _ cs (for case sensitivity), or _ bin (for comparison by coded value). For example: under character order "utf8_general_ci"
The characters "a" and "A" are equivalent
MySQL character set Settings
System variable:
-character_set_server: default internal operation character set
-character_set_client: the character set used by the client source data
-character_set_connection: connection layer character set
-character_set_results: query result character set
-character_set_database: the default character set of the currently selected database
-character_set_system: system metadata (field name, etc.) character set
-there are also variables starting with collation_ that correspond to the above to describe the character order.
Use introducer to specify the character set of the text string:
-format: [_ charset] 'string' [COLLATE collation]
-for example:
SELECT _ latin1 'string'
SELECT _ utf8 'Hello' COLLATE utf8_general_ci
-the text string modified by introducer is directly converted to internal character set processing without redundant transcoding during the request process.
Character set conversion process in MySQL
1. MySQL Server converts the request data from character_set_client to character_set_connection when it receives the request
two。 To convert the request data from character_set_connection to internal operation character set before internal operation, the determination method is as follows:
-set the value using the CHARACTER SET of each data field
-if the above value does not exist, the DEFAULT CHARACTER SET setting value of the corresponding data table is used (MySQL extension, non-SQL standard)
-if the above value does not exist, the DEFAULT CHARACTER SET of the corresponding database is used to set the value
-if the above value does not exist, use character_set_server to set the value.
3. Converts the result of the operation from the internal operation character set to character_set_results.
Now let's go back and analyze the garbled problem we have:
An our fields do not have a character set, so we use the dataset of the table
B our table does not specify a character set and defaults to the character set of the data store.
C our database did not specify a character set when it was created, so use character_set_server to set the value
D We did not deliberately modify the specified character set of character_set_server, so we use the mysql default
The default character set of e-mysql is latin1, so we use the latin1 character set, while our character_set_connection character set is UTF-8, so it is inevitable to insert Chinese garbled code.
Analysis of common problems
FAQ-1 does not set the connection character set before inserting utf8-encoded data into the data table whose default character set is utf8. When querying, it sets the connection character set to utf8
-character_set_client, character_set_connection, and character_set_results are all latin1 when inserted according to the default settings of the MySQL server
-the data of the insert operation will go through the character set conversion process of latin1= > latin1= > utf8. In this process, each inserted Chinese character will be changed from 3 bytes to 6 bytes.
-the result of the query will go through the character set conversion process of utf8= > utf8, and the saved 6 bytes will be returned intact, resulting in garbled code. Refer to the following figure:
The concatenated character set is set to utf8 before inserting utf8-encoded data into the data table with the default character set of latin1 (this is the kind of error we encountered)
-character_set_client, character_set_connection and character_set_results are all utf8 according to the connection character set setting when inserting
-- the inserted data will be converted to the character set of utf8= > utf8= > latin1. If the original data contains Unicode characters outside the range of\ u0000 ~\ u00ff, it will be converted to "?" (0 × 3F) symbol because it cannot be represented in the latin1 character set.
The contents of the connection character set cannot be recovered in future queries regardless of the connection character set setting. The conversion process is shown below:
Some means of detecting character set problems
SHOW CHARACTER SET
SHOW COLLATION
SHOW VARIABLES LIKE 'character%'
SHOW VARIABLES LIKE 'collation%'
SQL functions HEX, LENGTH, CHAR_LENGTH
SQL functions CHARSET, COLLATION
Recommendations when using the MySQL character set
Try to explicitly indicate the character set used when establishing a database / table and performing database operations, rather than relying on the default settings of MySQL, otherwise MySQL upgrades may cause great trouble
When latin1 is used in database and connection character set, although the problem of garbled code can be solved in most cases, the disadvantage is that the SQL operation cannot be carried out on a character-by-character basis. In general, setting the database and connection character set to utf8 is a better choice.
When using mysql CAPI (mysql provides API for C language operations), set the MYSQL_SET_CHARSET_NAME property to utf8 with mysql_options immediately after initializing the database handle, so that there is no need to explicitly specify the connection character set with the SET NAMES statement
And reconnecting disconnected persistent connections with mysql_ping will also reset the connection character set to utf8.
For mysql PHP API, the total running time of page-level PHP programs is relatively short, so you can explicitly set the connection character set with SET NAMES statements after connecting to the database, but when using persistent connections, be careful to keep the connection open and use it after disconnecting and reconnecting
The SET NAMES statement explicitly resets the connection character set.
Other considerations
The default_character_set setting in my.cnf only affects the connection character set when the mysql command connects to the server, and does not have any effect on applications that use the libmysqlclient library!
SQL function operations on fields are usually performed in an internal operation character set and are not affected by the connection character set setting.
Naked strings in SQL statements are affected by the connection character set or introducer settings, and operations such as comparisons can produce completely different results, so be careful!
Summary
According to the above analysis and suggestions, we should be more clear about what methods we should use to solve the problems we encounter. Yes, it is to specify the character set when creating the database. Do not change the default configuration to achieve this goal. Of course, you can also use the
Specify the form of the character set of the table, but it is easy to miss, especially when many people are involved in the design.
Although it is not recommended to change the default character set of mysql to solve the problem, I still give some ways to modify the default character set for your reference.
MySQL default character set
MySQL can specify the character set into a database, a table, and a column. Traditional programs do not use such complex configurations when creating databases and data tables, they use default configurations.
(1) when compiling MySQL, you specify a default character set, which is latin1
(2) when installing MySQL, you can specify a default character set in the configuration file (my.ini). If not, this value inherits from the
(3) when you start mysqld, you can specify a default character set in the command line parameters. If not, this value inherits from the configuration in the configuration file. In this case, character_set_server is set to the default character set.
(4) install MySQL select multilingual support, the installer will automatically set default_character_set to UTF-8 in the configuration file to ensure that all columns of all tables in all databases are stored in UTF-8 by default.
View the default character set
(by default, the character set of mysql is latin1 (ISO_8859_1). How to check the commands we have given above
Modify the default character set
(1) the easiest way to modify is to modify the key values of the character set in the my.ini file of mysql.
For example, default-character-set = utf8
Character_set_server = utf8
After modification, restart the service of mysql
(2) there is another way to modify the character set, which is to use the command of mysql
Mysql > SET character_set_client = utf8
Mysql > SET character_set_connection = utf8
Mysql > SET character_set_database = utf8
Mysql > SET character_set_results = utf8
Mysql > SET character_set_server = utf8
Mysql > SET collation_connection = utf8
Mysql > SET collation_database = utf8
Mysql > SET collation_server = utf8
If the default character set of the table is set to utf8 and the query is sent by UTF-8 encoding, the code stored in the database is still garbled. Then there may be something wrong with the connection connection layer. The solution is to execute the following sentence before sending the query: SET NAMES 'utf8';, which is equivalent to
The following three instructions:
SET character_set_client = utf8
SET character_set_results = utf8
SET character_set_connection = utf8
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.