About unicode and str in python2 and str and bytes in python3 07/19 Update SLTechnology News&Howtos

About unicode and str in python2 and str and bytes in python3

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article is about unicode and str in python2 and str and bytes in python3. The editor thinks it is very practical, so I share it with you. I hope you can get something after reading this article.

If you often encounter this kind of error message: UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range (128), or sadly find that the program written in Eclipse is working properly and then jump out of the above paragraph under the terminal.

So, it proves that you, like me, have a tragic coding problem with Python.

Before, I was writing my graduation project in Python language, and then there was no problem, until the whole thing was finished, I had a whim to try the support for Chinese. And then, you know, pop up the above series of disgusting error tips, and then change for a long time, all kinds of changes, all kinds of mistakes, and then all kinds of want to hit the keyboard. In fact, as mentioned in a previous log, the best way to solve this kind of problem is to add the following lines of code at the beginning of the program:

Import sys

Reload (sys)

Sys.setdefaultencoding ("utf-8")

Then it can help you solve almost 95% of the problem, but you need to know a lot if you want to get to the bottom of it.

First of all, this is the problem of Python itself. Because in the syntax of Python 2.x, the default str is not a string we understand, but a byte array, or it can be understood as a string of pure ascii characters corresponding to a variable of type bytes in Python 3, while the string that is really universal is a variable of type unicode.

It corresponds to the str variable in Python 3. This seemingly bizarre setting, which is supposed to be used as a type of byte array but used as a string, has long been criticized by Python 2, but there is no way to be compatible with previous programs.

As two string types in Python 2, various conversions are needed between str and unicode. The first is an explicit conversion, that is, encode and decode. Here, the meaning of these two goods can easily be reversed, and the scientific way to invoke them is:

Str-decode method-> unicode

Unicode-encode method-> str

For example:

> type ('x')

> type ('x'.decode (' utf-8'))

> type (u'x'.encode ('utf-8'))

The logic is that unicode strings are encoded using utf-8 encoding, that is, the encode ('utf-8') method is called to generate the result of the byte array type. Instead, the byte array is decoded to generate a unicode string. The novice said that he could not understand, but he was not surprised when he was familiar with it.

The other is implicit conversion, similar to int to double in C language, when a unicode string is concatenated with a str string, the str string is automatically converted to unicode type by default and then concatenated. The coding method used at this time is the default coding method of the system. Use:

Import sys

Print sys.getdefaultencoding ()

You can get the current default encoding, isn't it 'ascii'? If so, congratulations on winning the lottery. At this point, if you have the following line of code, you are guaranteed to make an error:

> x = u 'Meow'

> > x

U'\ u55b5'

> y = x.encode ('utf-8')

> > x + y

Traceback (most recent call last):

File "", line 1, in

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe5 in position 0: ordinal not in range

X is a variable of type unicode, and y is a variable of type str of x after encode. When x + y, the first thing to do is to convert y to a unicode string, so what kind of encoding format is used for conversion, utf-8, gb2312 or utf-16? At this time, it should be determined according to sys.getdefaultencoding ().

While sys.getdefaultencoding () is a 'ascii' code, there are no characters greater than 128in the ascii character table, so of course you got it wrong! By joining

Import sys

Reload (sys)

Sys.setdefaultencoding ("utf-8")

You can change the default transcoding format to utf-8, and in most cases, the string in the program is encoded by utf-8, so just add the above three lines.

But do you think that adding this will make the code a little dirty? Well, at least it's really dirty to me. So I think we should get into the habit of using the displayed conversion as much as possible in the process of writing the program, and make it clear whether a function returns str or unicode, and if the active decode of str becomes unicode, do not confuse the two, so that the code written in this way is relatively clean. In addition, you can add 'from' to the top of the code

_ _ future__ import unicode_literals' can change a user-defined string to a unicode type by default.

Finally, I want to yell that str in Python 2.x is not a string, but an array of BYTE ~!

The above is about unicode and str in python2 and str and bytes in python3. The editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.