Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How does Python3 solve the thorny problem of character coding

2025-01-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

Python3 is how to solve the thorny character coding problem, many novices are not very clear about this, in order to help you solve this problem, the following editor will explain for you in detail, people with this need can come to learn, I hope you can get something.

One of the most important improvements of Python3 is to solve this big hole left by strings and character encoding in Python2. Why does Python code hurt so much? Some flaws in the design of Python2 strings have been described:

ASCII code is used as the default coding method, which is very unfriendly to Chinese processing.

Mislead developers by dividing strings into two types: unicode and str.

Of course, this is not Bug, as long as you deal with it carefully, you can avoid these pits. But in Python3, both problems are solved very well.

First, Python3 sets the system default encoding to UTF-8

> import sys

> > sys.getdefaultencoding ()

'utf-8'

> > >

Then, text characters and binary data are more clearly distinguished, represented by str and bytes, respectively. All text characters are represented by str types, str can represent all characters in the Unicode character set, and binary byte data is represented by a new data type, represented by bytes.

Str > a = "a"

> > a

'a'

> > type (a)

> b = "Zen Tao"

> > b

'Zen Tao'

> > type (b)

Bytes

In Python3, add 'bounded' before the character quotation marks to clearly indicate that this is an object of type bytes. In fact, it is a set of binary byte sequence of data. The bytes type can be characters in the range of ASCII and other hexadecimal forms of character data, but can not be expressed in Chinese and other non-ASCII characters.

> c = breada'

> > c

Baccaa'

> type (c)

> d = b'\ xe7\ xa6\ x85'

> > d

B'\ xe7\ xa6\ x85'

> > type (d)

> > >

> e = b 'Zen Tao'

File "", line 1

SyntaxError: bytes can only contain ASCII literal characters.

The bytes type provides the same operations as str, supporting sharding, indexing, basic numerical operations, and so on. However, data of type str and bytes cannot perform + operations, although it is feasible in py2.

> > b "a" + b "c" baked ac' > b "a" * 2biciaaaa'> > b "abcdef\ xd6" [1:] b'bcdef\ xd6' > > b "abcdef\ xd6" [- 1] 214 > b "a" + "b" Traceback (most recent call last): File ", line 1, in TypeError: can't concat bytes to str

The corresponding relationship between python2 and python3 byte and character

Encode and decode

The conversion between str and bytes can be done using encode and slave decode methods.

Encode is responsible for character-to-byte encoding. The UTF-8 code is used by default.

> s = "Python"

> > s.encode ()

B'Python\ xe4\ xb9\ x8b\ xe7\ xa6\ x85'

> s.encode ("gbk")

B'Python\ xd6\ xae\ xec\ xf8'

Decode is responsible for byte-to-character decoding conversion, which generally uses UTF-8 encoding format for conversion.

> > b'Python\ xe4\ xb9\ x8b\ xe7\ xa6\ x85'.decode ()

'Python'

> > b'Python\ xd6\ xae\ xec\ xf8'.decode ("gbk")

Will Python' be helpful to you after reading the above content? If you want to know more about the relevant knowledge or read more related articles, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report