Example Analysis of unicode and bytes in python3 07/01 Update SLTechnology News&Howtos

Example Analysis of unicode and bytes in python3

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article mainly shows you the "sample analysis of unicode and bytes in python3", the content is simple and easy to understand, the organization is clear, I hope to help you solve doubts, let Xiaobian lead you to study and learn the article "sample analysis of unicode and bytes in python3".

Write some Python 3 programs, you can see bytes type everywhere, and it does not exist in Python2, which is also one of the significant differences between Python 3 and Python2.

In the past, when writing Python2 code, I often encountered a lot of coding errors, because Python2 support for unicode is not particularly ideal. In Python 3, all code written is unicode, and the Python parser internally converts (unless you explicitly define bytes) to unicode when running, reducing the possibility of errors.

In Python 3, there are two string types, the default is str, or unicode, also known as the text type. But a program will always have I/O operations (disk, network), i.e. I/O binary data, defined as bytes type in Python 3. bytes is a string of bytes containing an integer between 0 and 256.

So how to define bytes type, there are two ways to display, such as:

#Only ASCII values x=b'abc'y= b'\xe6\x88\x91 'print (x,y)#Specific encoding of unicode character set t=bytes("we","UTF-8")#Output b'\xe6\x88\x91\xe4\xbb\xac'#A Chinese character, UTF-8 encoding takes up three bytes print (t)#Returns 6, for python, is the length of a byte sequence print (len(t))#Returns 2 for two characters print (len("we"))

Next, talk about the conversion between str type and bytes type. For example, after reading binary data from the network, python needs you to convert it to str type, that is, python will not implicitly convert between str and bytes. It seems a lot of trouble, but it will reduce your chances of making mistakes.

If you want to convert str to bytes, you must choose an encoding that specifies how binary data is encoded, such as:

x="I"y=x.encode("UTF-8")z=x.encode("GBK")#b'\xe6\x88\x91' b'\xce\xd2'print (y,z)

If you want to convert bytes to str, you also need an encoding. It must be stated that you must know what the encoding of binary data is. If you choose the wrong one, you will make an error when converting to unicode. In addition, inside python, it does not care what the binary data is encoded. As long as it is of bytes type, it is a string of bytes. For example:

x=b'\xe6\x88\x91'print (x.decode("UTF-8"))#will report error print (x.decode("GBK"))

In a word," Python uses unicode internally and bytes externally", Python built-in library, many functions will specify whether str type or bytes type is required (strictly speaking, bytes-like objects, such as bytes, bytearray), when writing code, be sure to see clearly, such as the new method of hamc library, it requires:

hmac.new(key, msg=None, digestmod=None) key is a bytes or bytearray object giving the secret key

Many libraries, especially third-party libraries (such as requests), do a lot of conversion work internally in order to be compatible with Python2 and Python 3, so that you don't realize the existence of bytes type. Although productivity improves, it doesn't have much benefit for understanding Python.

To fully understand the application of bytes and str, you can refer to the two built-in functions open and write.

Open the file using text mode, python will automatically convert to str type internally, such as:

file ="t.txt"t = open(file,mode="r").read()

And if it is binary open, if you want to display in the terminal, you need to convert to str type, such as:

file ="t.txt"t = open(file,mode="rb").read()print (t.decode())print (t,type(t))

If it is written in binary mode, write bytes type data directly, such as:

file="t.txt"t=open(file,mode="wb")t.write(b'\xe6\x88\x91')

In the above examples, there is no specification of which encoding to use, and if no specification is shown, the general encoding is equivalent to locale. getpreferredencoding ().

The above is "unicode and bytes in python 3 sample analysis" all the content of this article, thank you for reading! I believe that everyone has a certain understanding, hope to share the content to help everyone, if you still want to learn more knowledge, welcome to pay attention to the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.