In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-10 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly introduces the Python full-width and half-width conversion between the relevant knowledge, the content is detailed and easy to understand, the operation is simple and fast, has a certain reference value, I believe that everyone after reading this Python full-width and half-width conversion between the article will have a harvest, let's take a look.
1. Application areas:
The conversion between full-width and half-width is usually used in natural language processing, because the inconsistency between full-width and half-width will lead to inconsistent information extraction, and the use of corpus to train the language model will lead to inaccurate results of the model, so it needs to be unified.
two。 Overview of full-width and half-width conversion
The unicode encoding range of full-width characters ranges from 65281 to 65374 (the corresponding hexadecimal range is: 0xFF01 ~ 0xFF5E)
Half-width character unicode encodings range from 33x126 (corresponding hexadecimal range from 0x21 to 0x7E)
The space is special, full-width 12288 (0x3000), half-width 32 (0x20)
Except for spaces, the order of full-width / half-width according to unicode code is corresponding in order (half-width + 65248 = full-width or half-width + 0x7e = full-width)
Therefore, the non-space data can be processed directly by using the ±method, and the spaces can be processed separately.
3. Please note:
Chinese characters are always full-width, and only English letters, numeric keys and symbol keys have the concept of full-width.
The position of a letter or number in a Chinese character is called full-width, and the position of half a Chinese character is called half-width.
Quotation marks are different in Chinese and English and in full half-width cases.
4. Library functions applied to
The chr () function takes an integer in the range of range (256) as an argument and returns a corresponding character.
Unichr () is just like it, except that it returns the Unicode character.
The ord () function is the pairing function of the chr () function (for 8-bit ASCII strings) or the unichr () function (for Unicode objects). It takes a character (a string of length 1) as an argument and returns the corresponding ASCII numeric value, or Unicode numeric value.
Example:
5. Full-width half-width: def strQ2B (ustring): rstring = "" for uchar in ustring: inside_code = ord (uchar) if inside_code = = 12288: # full-width space direct conversion inside_code = 32 elif 65281
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.