An example to explain the problem of PHP Transcoding 07/13 Update SLTechnology News&Howtos

An example to explain the problem of PHP Transcoding

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article introduces the relevant knowledge of "an example explanation of PHP coding conversion". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

The difference between Unicode and Utf-8 coding

Unicode is a character set, and UTF-8 is one of the Unicode, Unicode is fixed length are double bytes, while UTF-8 is variable, for Chinese characters, Unicode occupies 1 byte less than UTF-8. Unicode is double-byte, while Chinese characters account for three bytes in UTF-8.

UTF-8 encoded characters can theoretically be up to 6 bytes long, while 16-bit BMP (Basic Multilingual Plane) characters can be up to 3 bytes long. Let's take a look.

UTF-8 coding table:

0xxxxxxx 00000000-Umur00007F: 110xxxxx 10xxxxxx Umur00000080-U-000007FF: 110xxxxx 10xxxxxx Umur00000800-U-0000FFFF: 1110xxxx 10xxxxxx 10xxxxxx Umur00010000-U-001FFFFF: 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx Umur00200000-U-03FFFFFF: 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx Umur04000000-U-7FFFFFFF: 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx

The position of the xxx is filled with bits represented by the binary representation of the number of character codes, and the more to the right, the less special meaning, using only the shortest multibyte string that is sufficient to express the number of character codes. Note that in a multi-byte string, the number of "1" at the beginning of the first byte is the number of bytes in the entire string. The first line begins with 0 to be compatible with ASCII encoding, which is one byte, the second line is a double-byte string, the third behavior is 3 bytes, such as Chinese characters, and so on. (personally, I think: in fact, we can simply regard the number of the previous 1s as the number of bytes)

How to convert Unicode to Utf-8?

In order to convert Unicode to UTF-8, of course you need to know what the difference is. Let's take a look at how the encoding in Unicode is converted to UTF-8. In UTF-8, if the byte of a character is less than 0x80, it is an ASCII character, accounting for one byte, without conversion, because UTF-8 is compatible with ASCII encoding. If the code of the Chinese character "you" in Unicode is "u4F60", convert it to binary to 100111101100000, and then convert it according to the method of UTF-8. The Unicode binary can be extracted from low to high bits, taking 6 bits at a time, such as the above binary, can be taken out into the format shown below, the previous format is filled, and the less than 8 bits are filled with 0.

The copy code is as follows:

Unicode: 100111101100000 4F60

Utf-8: 11100100,10111101,10100000 E4BDA0

From the above can be very intuitive to see the conversion between Unicode to UTF-8, of course, know the format of UTF-8, you can inverse operation, that is, according to the format of it in the binary to take out the corresponding position, and then in the conversion is the resulting Unicode characters (this operation can be completed by "displacement"). Such as the above "you" conversion, because its value is greater than 0x800 less than 0x10000, so it can be judged as three-byte storage, then the highest bit needs to be moved to the right "12" bit and then according to the three-byte format of the highest bit 11100000 (0xE0) OR (|) can get the highest bit value. Similarly, the second position is to move the "6" bit to the right, leaving the highest bit and the second bit of binary value, you can press 111111 (0x3F) to locate (&) operation, and then 11000000 (0x80) or (|). The third place does not need to be shifted, just take the last six digits (take & with 111111 (ox3F)) and add or (|) to 11000000 (0x80).

How can Utf-8 reverse to Unicode?

Of course, the conversion from UTF-8 to Unicode is also accomplished by shifting and so on, that is, to find out the binary numbers of the corresponding positions in UTF-8. In the above example, "you" is three bytes, so each byte is processed, from high to low. In UTF- 8, "you" is 11100, 100, 10111101, 10100000. From the high bit that is the first byte 11100100 is to take out the "0100", this is very simple as long as and 11111 (0x1F) take (&), from the three bytes can know that the most in place must be before 12 bits, because take six bits at a time. So we also need to move the result 12 bits to the left, and the highest bit completes 0100pr 000000pl 000000. And the second bit is to take out "111101", then just take the second byte 10111101 and 111111 (0x3F) with (&). When the result is moved 6 bits to the left and the result of the highest byte is taken or (|), the second bit is completed, and the result is 0100.111101j000000. By analogy, the last bit is directly associated with 111111 (0x3F) and (&), and then with the previous result or (|), the result 0100pc111101j100000 can be obtained.

PHP code implementation

/ * utf8 character converted to Unicode character * @ param [type] $utf8_str Utf-8 character * @ return [type] Unicode character * / function utf8_str_to_unicode ($utf8_str) {$unicode = 0; $unicode = (ord ($utf8_str [0]) & 0x1F) 12)); $ord_2 = decbin (0x80 | ($code > > 6) & 0x3f); $ord_3 = decbin (0x80 | ($code & 0x3f)) Utf8_str = chr (bindec ($ord_1)). Chr (bindec ($ord_2). Chr (bindec ($ord_3)); return $utf8_str;}

Tested it.

$utf8_str ='I'; / / this is the Unicode code $unicode_str of the Chinese character "you" / output 6211echo utf8_str_to_unicode ($utf8_str). "; / / output the Chinese character" you "echo unicode_str_to_utf8 ($unicode_str);" PHP coding conversion example explanation "is introduced here, thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.