Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to understand PHP and UTF-8

2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/03 Report--

This article focuses on "how to understand PHP and UTF-8". Interested friends may wish to have a look at it. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn how to understand PHP and UTF-8.

The Unicode character set is not supported at the PHP language level, but most of the problems can be handled by UTF-8 coding.

The best practice is to know the input code explicitly (detect it without knowing it), convert it internally to UTF-8 code, and uniformly convert the output code to UTF-8 code.

How to deal with UTF-8 at PHP level

When manipulating the Unicode character set, be sure to install the mbstring extension and use the appropriate function instead of the native string function. For example, if it is wrong to use the strlen () function for a PHP code whose file is encoded as UTF-8, use the mb_strlen () function instead.

Most of the functions of the mbstring extension need to be handled based on an encoding (internal encoding), so be sure to use UTF-8 coding uniformly, most of which can be configured in PHP.INI.

Starting with PHP 5.6, the default_charset configuration can replace mbstring.http_input,mbstring.http_output.

Another important configuration is mbstring.language, which defaults to Neutral (UTF-8).

Note that the internal encoding of the file encoding and the mbstring extension are not the same concept.

To sum up:

The parts of PHP.INI that involve mbstring extensions use UTF-8 as much as possible.

Use the mbstring extension function instead of the native string manipulation function.

When using the related function, be sure to know what the encoding of the character you are operating is. When using the corresponding function, write the UTF-8 encoding parameters in the display, such as UTF-8 in the third parameter display of the htmlentities () function.

How the file IO operation handles UTF-8

For example, if you want to open a file, but do not know what the contents of the file is encoded, then what to do?

The best practice is to convert it to UTF-8 when you open it, then change the content back to the original encoding and save it to a file. Look at the code:

If (mb_internal_encoding ()! = "UTF-8") {mb_internal_encoding ("UTF-8");} $file = "file.txt"; / / A Chinese file encoded as gbk $str= file_get_contents ($file); / / converted to UTF-8 if (mb_check_encoding ($str, "GBK") $str= mb_convert_encoding ($str, "UTF-8", "GBK") when uniformly displayed. $str = "modify content"; $str = mb_convert_encoding ($str,$srcbm, "UTF-8"); / / go back to file_put_contents as is ($file,$str)

Best practices for Mysql and UTF-8

This is relatively simple, first of all, make sure that your Mysql is UTF-8. Then the Mysql client also maintains UTF-8 when connecting. In PHP, that is, when imysql or PDO extends the connection to Mysql, set UTF-8 as the connection code. If the two sides are consistent, there are generally no problems.

Best practices for browsers and UTF-8

This is also relatively simple, that is, if your output is a web page, then your string processing output should be kept as UTF-8; while the Meta Tag in PHP.INI that also explicitly sets default_charset as UTF-8;HTML is also clearly identified as UTF-8.

Is everything all right now? no, although the server and browser let the user use UTF-8 encoding, but the user's behavior is not constrained, he may enter other encoded characters, or upload the file name is other encoded characters, then what to do? You can detect the user's encoding through the mb_http_input () and mb_check_encoding () functions, and then convert it internally to UTF-8. Make sure that at any level, you end up dealing with UTF-8 coding. In other words, you need the means to know what code your input is, and the code that controls the output after processing is UTF-8.

The mbstring.encoding_translation directive and the mb_detect_encoding () function are not recommended. Tortured me for a long time.

Best practices for operating systems and UTF-8

Due to the operating system, PHP has different processing mechanisms when dealing with Unicode filenames.

In Linux, the file name is always UTF-8-encoded, while in the Chinese Windows environment, the file name is always GBK-encoded, just remember that.

The following examples are given:

/ Command line program function, running on the Chinese version of Windows 10 operating system, the file code is UTF-8function filenameexample () {$filename = "test .txt"; $gbk_filename = iconv ("UTF-8", "GBK", $filename); file_put_contents ($gbk_filename, "test"); echo file_get_contents ($gbk_filename);} function scandirexample () {$arr = scandir (". / tmp") Foreach ($arr as $v) {if ($v = = "." | | $v = = "..") Continue; $filename = iconv ("GBK", "UTF-8", $v); $content = file_get_contents (". / tmp/". $v);}}

If you don't want to write programs that are compatible with Windows and linux, you can encode the file name with urlencode, such as:

Function urlencodeexample () {$filename = "Test 2.txt"; $urlencodefilename = urlencode ($filename); file_put_contents ($urlencodefilename, "Test"); echo file_get_contents ($urlencodefilename);}

When downloading files through the header () function using PHP, you should also consider the browser and operating system (most people use Windows). For Chrome, the output file name encoding can be UTF-8,Chrome, which automatically converts the file name to GBK encoding.

For the lower version of IE, it inherits the environment of the operating system, so if the download file name is Chinese, it must be transcoded to UTF-8 code, otherwise the user will see the garbled file name when downloading. It is explained by code:

$agent=$_SERVER ["HTTP_USER_AGENT"]; if (strpos ($agent,'MSIE')! = = false {$filename= iconv ("UTF-8", "GBK", "Annex .txt"); header ("Content-Disposition: attachment; filename=\" $filename\ ");} so far, I believe you have a deeper understanding of" how to understand PHP and UTF-8 ", you might as well do it! Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report