Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

The principle of MySQL 4.1character set support (transfer)

2025-02-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)06/01 Report--

The principle of MySQL 4.1character set support [@ more@] the following is a very boring article, full of a wide variety of encodings, transformations, clients, servers, connections. Well, I don't even want to see it myself, but when I think about it, it makes sense to write it down for four reasons:

MySQL 4.1 multilingual support has changed a lot (which leads to problems)

Although MySQL 3 is still dominant in most places (including personal use and hosting providers), MySQL 4.1is the database officially recommended by MySQL, which has been provided by hosting providers and will be more and more.

Many PHP programs use MySQL as the default database management software, but they generally do not distinguish between MySQL 4.1 and the following version. Generally speaking, "MySQL 3.xx.xx version or above" meets the installation requirements.

Because latin1 is used as the default character set in many places (which will be described in detail below), it successfully deceives many developers and users of PHP programs and masks the problems that may arise in language environments such as Chinese.

To put it simply, the changes in MySQL itself and the neglect of PHP programs using MySQL lead to the emergence and complexity of the problem, which is ignored because most users use English. The PHP program mentioned here is mainly in terms of WordPress.

Principles supported by MySQL 4.1character set

MySQL 4.1 the specification of the character set can be refined into a MySQL installed on a machine, one of the databases, one of the tables, one of the columns, what character set should be used. However, traditional Web programs do not use such complex configurations when creating databases and data tables, they use the default configuration, so where does the default configuration come from?

When compiling MySQL, you specify a default character set, which is latin1

When you install MySQL, you can specify a default character set in the configuration file (my.ini). If not, this value inherits from the

When you start mysqld, you can specify a default character set in the command line arguments, which, if not specified, inherits from the

At this point, character_set_server is set to the default character set

When creating a new database, unless explicitly specified, the character set of the database is set to character_set_server by default

When a database is selected, character_set_database is set as the default character set for that database

When you create a table in this database, the default character set of the table is set to character_set_database, which is the default character set of the database.

When setting a column in a table, unless explicitly specified, the default character set for this column is the default character set for the table

This character set is the character set used to actually store data in the database, and the content of mysqldump is the character set under this character set.

To sum up, if there is no modification anywhere, then all tables in all databases and all fields are stored in latin1, but if we install MySQL, we will generally choose multi-language support, that is to say, the installer will automatically set default_character_set to UTF-8 in the configuration file, which ensures that by default, all tables in all databases are stored in UTF-8.

When a PHP program connects to MySQL, what character set does the program use to send data to MySQL? MySQL does not know (it can only guess at most), so MySQL 4.1requires the client to specify this character set, which is the strange thing about character_set_client,MySQL is that the resulting character set is not immediately converted to the character set stored in the database, but is first converted to a character set specified by the character_set_connection variable. I don't quite understand the use of this connection layer, but after converting to this character set of character_set_connection, it has to be converted to the default character set of the database, that is to say, it has to be converted twice; when this data is output, it has to be converted from the default character set of the database to the character set specified by character_set_results.

A typical environment

A typical environment takes MySQL 4.1installed on my own computer as an example. My own computer installs Apache 2 Magi PHP 5 and WordPress 1.5.1.3 Magi specifies default_character_set as utf8 in the MySQL configuration file. So the problem arises:

WordPress is installed by default, so all tables store data in UTF-8

The default browsing character set for WordPress is UTF-8 (set in Options- > Reading), so the meta of all WP pages indicates that charset is utf-8.

So the browser will display all WP pages in utf-8, so that all Post and Comment of Write will be sent from the browser to Apache in UTF-8 format, and then Apache will give it to PHP

So the data WP gets from all forms is encoded by utf-8; WP sends the data directly to MySQL without conversion.

Both character_set_client and character_set_connection set by MySQL by default are latin1, and something strange happens, which is actually data in utf-8 format, which is converted into "as latin1". Unexpectedly, it is still converted to latin1, and then converted from this latin1 to utf-8. After such two conversions, some characters of utf-8 are lost and become?

The most amazing thing is not this, if WordPress is set to read in GB2312 format, then the GB2312 encoded data sent by WP to MySQL will be "as latin1" converted, what will be saved in the database is a strange format (really a strange format, mysqldump can come out and find that it is garbled either as utf-8 or as gb2312), but if this format is output to latin1, it can be changed back to GB2312!

What phenomenon will this lead to? If WP uses the MySQL 4.1database, it would be normal to change the encoding to GB2312, but unfortunately, this kind of normal only seems normal.

How to solve the problem

If you are already impatient (almost certainly), google, you will find that most of the answers are: SET NAMES 'utf8',: yes, this is the solution, but the purpose of this article is to explain why it is the solution.

To ensure that the results are correct, we must ensure that the data table is in the correct format, that is, at least all the Chinese characters can be stored, then we have only two options, gbk or utf-8, which is discussed below in the case of utf-8.

Because the default_character_set set by the configuration file is utf8, the datasheet is created by utf-8 by default. This should also be the configuration that all host providers using MySQL 4. 1 should adopt. So all we need to make sure is the correct encoding specified between the client and the MySQL interaction.

There are only two possibilities, the client sends data in gb2312 format, or it sends data in utf-8 format.

If sent in gb2312 format:

SET character_set_client='gb2312'

SET character_set_connection='utf8' or

SET character_set_connection='gb2312'

It is possible to ensure that the data will not be lost in the transcoding, that is, to ensure that the content stored in the database is correct.

How to ensure that what is taken out is the right content? Considering that the vast majority of clients (including WP), the encoding of the sent data is the encoding of the data it wants to receive, so:

SET character_set_results='gb2312'

It is guaranteed that the format taken out for display to the browser is gb2312.

In the second case, the client sends it in utf-8 format (the default for WP) and can be configured as follows:

SET character_set_client='utf8'

SET character_set_connection='utf8'

SET character_set_results='utf8'

This configuration is equivalent to SET NAMES 'utf8'.

What changes should be made to WP

Again, it is impossible for the database to know exactly what encoded data the client wants to send to the database. It can only be explained by the client itself. Therefore, WP must send the correct SET.... For MySQL. What is the best way to send it? PLog colleagues in Taiwan give some suggestions:

First of all, test whether the server > = 4.1and whether UTF-8 support is added at compile time; if yes, continue

Then test what format the database is stored in ($dbEncoding)

SET NAMES $dbEncoding

For the second point, the situation of WP is different. According to the typical configuration above, as long as WP is used, it is certain that the database is stored in UTF-8, so it is necessary to judge (bloginfo ('charset')) according to the user's setting of GB2312 or UTF-8 browsing, but this value can only be obtained after connecting to the database, so the most efficient way is to set SET NAMES according to this configuration after connecting to the database. You don't have to set it up before each query.

My modification goes like this, adding: in wp_includes/wp-db.php:

Function set_charset ($charset)

{

/ / check mysql version first.

$serverVersion = mysql_get_server_info ($this- > dbh)

$version = explode ('.', $serverVersion)

If ($version [0])

< 4) return; // check if utf8 support was compiled in $result = mysql_query("SHOW CHARACTER SET like 'utf8'", $this->

Dbh)

If (mysql_num_rows ($result)

< = 0) return; if ($charset == 'utf-8' || $charset == 'UTF-8') $charset = 'utf8'; @mysql_query("SET NAMES '$charset'", $this->

Dbh)

}

In the require of wp-settings.php (ABSPATH. WPINC. '/ vars.php'); add:

$wpdb- > set_charset (get_bloginfo ('charset'))

Posted in: Server-Side Author: jjgod

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report