What are the tips for PHP Chinese coding? 07/06 Update SLTechnology News&Howtos

What are the tips for PHP Chinese coding?

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article focuses on "what are the PHP Chinese coding tips", interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Now let the editor to take you to learn "what are the PHP Chinese coding tips"!

The problem of Chinese coding in PHP programming has perplexed many people, but the reason for this problem is actually very simple. Every country (or region) has stipulated the character code set for computer information exchange, such as extended ASCII code in the United States, GB2312-80 in China, JIS in Japan and so on. As the basis of information processing in this country / region, character coding sets play an important role in unified coding. Character coding sets are divided into SBCS (single-byte character set) and DBCS (double-byte character set) according to their length. Early software (especially the operating system), in order to solve the computer processing of local character information, there are various localized versions (L10N). In order to distinguish, the concepts of LANG, Codepage and so on are introduced. However, due to the overlap of the code scope of each local character set, it is difficult to exchange information with each other, and the cost of independent maintenance of each localized version of the software is high. Therefore, it is necessary to extract the commonness of localization work and deal with it consistently, so as to minimize the content of special localization processing. This is also called internationalization (118N). Various language information is further standardized as Locale information. The underlying character set that is processed becomes a Unicode that contains almost all glyphs.

Nowadays, most of the software core character processing with international characteristics is based on Unicode. When the software is running, the corresponding local character encoding settings are determined according to the ocale/Lang/Codepage settings at that time, and the local characters are processed accordingly. In the process of processing, it is necessary to realize the conversion between Unicode and local character sets, or even two different local character sets with Unicode as the middle. This method is further extended in the network environment, and any character information at both ends of the network needs to be converted into acceptable content according to the setting of the character set.

The problem of character set coding in Database

Popular relational database systems support database character set encoding, that is, you can specify its own character set settings when creating a database, and the data of the database is stored in the specified encoding form. When the application accesses the data, there will be character set coding conversion at both the entrance and the exit. For Chinese data, the setting of database character encoding should ensure the integrity of the data. GB2312, GBK, UTF-8, etc. are all optional database character set encodings. Of course, we can also choose ISO8859-1 (8-bit), but we have to split a Chinese character or Unicode of 16Bit into two 8-bit characters before the application writes data, and after reading the data, we also need to merge the two bytes and distinguish the SBCS characters, so we do not recommend using ISO8859-1 as the database character set encoding. This not only does not make full use of the character set coding support of the database itself, but also increases the complexity of programming. When programming, you can first use the management function provided by the database management system to check whether the Chinese data is correct.

PHP program before querying the database, first execute mysql_query ("SET NAMES xxxx"); where xxxx is the coding of your web page (charset=xxxx), if the page is charset=utf8, then xxxx=utf8, if the page is charset=gb2312, then xxxx=gb2312, almost all WEB procedures, have a piece of common code to connect to the database, put in a file, in this file, add mysql_query ("SET NAMES xxxx") on it.

SET NAMES shows what character set is used in the SQL statement sent by the client. Therefore, the SET NAMES 'utf-8' statement tells the server that "future information from this client is based on the character set utf-8". It also specifies the character set for the result that the server sends back to the client (for example, if you use a SELECT statement, it indicates what character set the column value uses).

Techniques commonly used in locating problems

The clumsiest and most effective way to locate Chinese coding problems is to print the internal code of the string after being processed by the program you think is suspected. By printing the internal code of a string, you can find out when Chinese characters are converted into Unicode, when Unicode is converted back to Chinese internal codes, when a Chinese character becomes two Unicode characters, when a Chinese string is converted into a string of question marks, and when the high order of the Chinese string is truncated.

Taking the appropriate sample string also helps to distinguish the type of problem. For example: "aa ah aa?@aa" and other Chinese and English, GB, GBK feature characters are strings. Generally speaking, English characters will not be distorted no matter how they are converted or processed (if encountered, try to increase the length of consecutive English letters).

Solve the problem of garbled codes in various applications

1) use tags to set page encoding

The purpose of this tag is to declare what character set encoding the client browser uses to display the page. Xxx can be GB2312, GBK, UTF-8 (unlike MySQL, MySQL is UTF8), and so on. Therefore, most pages can be used in this way to tell the browser what coding to use when displaying the page, so that it will not cause coding errors and garbled. But sometimes we find that it still doesn't work with this sentence. No matter which xxx it is, browsers always use the same code, which I'll talk about later.

Please note that it belongs to HTML information and is just a declaration indicating that the server has passed the HTML information to the browser.

2) header ("content-type:text/html; charset=xxx")

The function header () sends the information in parentheses to the http header. If the content in parentheses is like what is said in the article, the function is basically the same as the label, and we find that the characters are similar to the first one. But the difference is that if there is this function, the browser will always use your required xxx encoding, absolutely will not be disobedient, so this function is very useful. What causes it? Let's talk about the difference between http headers and HTML information:

The http header is a string sent by the server before sending HTML information to the browser over the http protocol. The tag belongs to HTML information, so the content sent by header () reaches the browser first, and the common point is that header () has a higher priority (I don't know if it's possible to say that). If a PHP page has both header ("content-type:text/html; charset=xxx") and meta, the browser only recognizes the former http header rather than meta. Of course, this function can only be used within PHP pages.

There is also a question: why does the former absolutely work while the latter sometimes does not? That's why we're going to talk about Apache.

3) AddDefaultCharset

In the conf folder of the Apache root directory, there is the configuration document httpd.conf for the entire Apache.

Open httpd.conf with a text editor, and line 708 (different versions may vary) has AddDefaultCharset xxx,xxx as the encoded name. The meaning of this line of code: set the character set in the http header of the web file throughout the server to your default xxx character set. Having this line is equivalent to adding a line of header ("content-type:text/html; charset=xxx") to each file. Now you can see why browsers always use gb2312 when utf-8 is clearly set.

If there is a header ("content-type:text/html; charset=xxx") in the page, change the default character set to the character set you set, so this function is always useful. If you add a "#" before AddDefaultCharset xxx, comment out this sentence, and the page does not contain header ("content-type …") At this time, it is time for the meta tag to work.

The above priorities are listed below:

.. Header ("content-type:text/html; charset=xxx")

.. AddDefaultCharset xxx

If you are a web programmer, it is recommended that you add a header ("content-type:text/html; charset=xxx") to each page so that it can be displayed correctly on any server and is portable.

4) default_charset configuration in PHP.ini:

Default_charset = "gb2312" in php.ini defines the default language character set for PHP. It is generally recommended to comment out this line and let the browser automatically select the language according to the charset in the page header instead of making a mandatory rule, so that web services in multiple languages can be provided on the same server.

At this point, I believe you have a deeper understanding of "what are the PHP Chinese coding tips"? you might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.