In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article shows you why Bytes data decode is to combine several bits of data together, the content is concise and easy to understand, can definitely brighten your eyes, through the detailed introduction of this article, I hope you can get something.
In the process of developing Python, we often carry out the operation that the string encode is Bytes data, or the Bytes data decode is a string. For example:
We know that in Unicode encoding, Chinese accounts for 3 bytes, so when a Chinese character is encoded as Bytes data, it will occupy 3 Bytes characters, for example:
> a = 'Qing'
> > a.encode ()
B'\ xe9\ x9d\ x92'
> b = 'Qingnan'
> > b.encode ()
B'\ xe9\ x9d\ x92\ xe5\ x8d\ x97'
Note that the\ xe9 here needs to be looked at as a whole, representing a hexadecimal number.
So, when I want to convert Bytes data\ xe9\ x9d\ x92\ xe5\ x8d\ x97 into a string, Python will convert\ xe9\ x9d\ x92 into blue characters and\ xe5\ x8d\ x97 into southern characters. It seems that Python knows that every three Bytes symbols should be processed in groups.
However, in Unicode, the emoji emoji is 4 bytes, such as the emoji:, which corresponds to the Bytes data of:\ xf0\ x9f\ xa4\ x94, as shown in the following figure:
If I take the green? Convert south to Bytes data. The value is:\ xe9\ X9d\ X92\ xf0\ X9f\ xa4\ x94\ xe5\ x8d\ x97. As shown in the following figure, there are a total of 10 Bytes characters:
So the question is, what happens when I decode this Bytes data? As shown in the following figure:
Python can correctly divide Bytes data into:
\ xe9\ x9d\ x92 corresponds to "Qing"
\ xf0\ x9f\ xa4\ x94 corresponds to "??"
\ xe5\ x8d\ x97 corresponds to "south"
Why does Python know to group the four symbols\ xf0\ x9f\ xa4\ x94? Why aren't you grouped like this?
\ xe9\ x9d\ x92
\ xf0\ x9f\ xa4
\ x94\ xe5\ x8d\ x97
In fact, the cause of this problem can only be found when we look at it in binary.
The green corresponds to the first Bytes character\ xe9, where e9 is a hexadecimal number. Converting it to decimal is 233and to binary is 11101001.
South corresponds to the first Bytes character\ xe5, where E5 is a hexadecimal number. Converting it to decimal is 229 and to binary is 11100101.
? The corresponding first Bytes character\ xf0, where f0 is a hexadecimal number, converting it to decimal is 240and to binary is 11110000.
If you can't see their differences, let's put them together and compare them:
11101001
11100101
11110000
Do you see the difference? Chinese characters are three bytes. After being converted to Bytes data, the binary number corresponding to the first character begins with 1110. Emoji is 4 bytes, and after conversion to Bytes data, the binary number corresponding to the first character begins with 1111.
So, when a given Bytes type data needs to be converted to a string by Python, Python determines that there should be a set of several characters.
Given Bytes data:\ xe9\ x9d\ X92\ xf0\ X9f\ xa4\ x94\ xe5\ x8d\ x97 see that the high 4 bits of the first character corresponding to the binary number is 1110, so the current character and the following two characters (a total of 3 characters) are parsed to get cyan characters. Skip the parsed character, go directly to the fourth digit\ xf0, and find that its corresponding binary number is 1111, so this character is parsed with the next 3 characters (a total of 4 characters). Skip the parsed character and go to bit 8\ xe5. The corresponding binary high 4 bit is 1110, so this character and the next two characters are parsed in a group to get south. Done.
For numbers and letters, only one byte is used in Unicode, and their Ascii code is less than 128. The multi-byte Unicode characters all start from 129. so the Bytes data generated by the mixture of English alphanumeric and Chinese characters will not be grouped clearly when decoding.
The above content is why the Bytes data decode is to combine several bits of data together, have you learned the knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.