How to use regularization to determine Chinese UTF-8 or GBK by PHP 03/25 Update SLTechnology News&Howtos

How to use regularization to determine Chinese UTF-8 or GBK by PHP

2026-03-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article mainly introduces PHP how to use regular judgment of Chinese UTF-8 or GBK, has a certain reference value, interested friends can refer to, I hope you can learn a lot after reading this article, the following let the editor take you to understand it.

UTF-8 matching: in javascript, it's easy to tell if a string is in Chinese. For example:

The code is as follows:

Var str = "php programming"

If (/ ^ [\ u4e00 -\ u9fa5] + $/ .test (str)) {

Alert ("the string is all in Chinese")

} else {

Alert ("the string is not all in Chinese")

}

In / / php, hexadecimal data is represented by\ x. So, transform it into the following code:

$str = "php programming"

If (preg_match ("/ ^ [\ x4e00 -\ x9fa5] + $/", $str)) {

Print ("the string is all in Chinese")

} else {

Print ("the string is not all in Chinese")

}

It seems that the report is not wrong, and the result of the judgment is also correct, but replacing the word $str with the word "programming" still shows that "the string is not all in Chinese". It seems that this judgment is not accurate enough. Important: I checked and found that for [\ x4e00 -\ x9fa5], I made an enhanced explanation of the rule of php. [\ x4e00 -\ x9fa5] is actually the concept of characters and character groups,\ x {hex}, expressing a hexadecimal number. It should be noted that hex can be 1-2 digits or 4 digits, but if it is 4 digits, curly braces must be added at the same time. If it is a hex greater than x {FF}, it must be used with the u modifier, otherwise an illegal error will occur.

Only regular characters that match full-width characters can be found on the Internet: ^ [\ x80 -\ xff] * ^ /. There can be no parentheses.

[\ u4e00 -\ u9fa5] can match Chinese, but PHP does not support it.

However, since\ x represents hexadecimal data, why is it different from the range\ x4e00 -\ x9fa5 provided in js? So I replaced it with the following code and found that it was really accurate:

The code is as follows:

$str = "php programming"

If (preg_match ("/ ^ [\ x {4e00} -\ x {9fa5}] + $/ u", $str)) {

Print ("the string is all in Chinese")

} else {

Print ("the string is not all in Chinese")

}

Know the final correct expression of matching Chinese characters with regular expressions under utf-8 coding in php-/ ^ [\ x {4e00} -\ x {9fa5}] + $/ u

Refer to the above article and write the following test code (copy the following code and save it as a .php file)

The code is as follows:

Enter characters (numbers, letters, Chinese characters, underscores):

GBK: preg_match ("/ ^ [" .chr (0xa1). "-" .chr (0xff). "A-Za-z0-9 _] + $/", $str); / / GB2312 Chinese character alphanumeric underscore regular expression.

Thank you for reading this article carefully. I hope the article "how to use PHP to regularly judge Chinese UTF-8 or GBK" shared by the editor will be helpful to everyone. At the same time, I also hope that you will support and pay attention to the industry information channel. More related knowledge is waiting for you to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.