Analysis of python string and Encoding examples 07/15 Update SLTechnology News&Howtos

Analysis of python string and Encoding examples

2025-07-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article focuses on "python string and coding example analysis", interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Now let the editor to take you to learn "python string and coding example analysis"!

The string belongs to a data type, and the most important thing about the string is the coding problem.

Let's take a look at some examples of coding.

Give an example of the differences between various codes, their advantages and disadvantages:

ASCII: Americans invented, for example, the capital letter A code is 65, the lowercase letter z code is 122; generally one byte

GB2312: invented by China, adding Chinese to it

Unicode: unifies all languages into one set of encodings; usually represents a character with two bytes

But this will also be a problem, so now basically use UTF-8 coding. Although using Unicode coding can solve the problem of garbled code, if it is all in English, Unicode coding has twice as much storage space as ASCII coding, so it is not cost-effective in storage and transmission.

Therefore, there is UTF-8 coding, UTF-8 coding encodes a Unicode character into 1-6 bytes according to different number sizes, commonly used English is compiled into 1 byte, and Chinese characters are 3 bytes, which basically solves the problem. Web pages are basically UTF-8-coded.

String of python

I am using the python3 version, and the string is encoded in Unicode, so the string in python supports multiple languages, for example:

> print ('including English' in Chinese)

Contains English in Chinese

For a single character, python uses the ord () function to get the integer representation of the character, and the chr () function converts the encoding to the corresponding character, for example:

> ord ('A')

sixty-five

> > ord ('medium')

20013

Chr (66)

'B'

Chr (25991)

'Wen'

Chr (10000)

'✐'

If you want to use the chr () function, you first need to know the encoding of the corresponding content. If you know the integer encoding of characters, you can also write in hexadecimal:

>'\ u4e2d\ u6587'

'Chinese'

The above two kinds of writing are equivalent. The string type of python is str, which is represented by Unicode in memory. One character corresponds to several characters. If you want to transfer it over the network, or save it to disk, you have to change str into bytes in bytes.

In python, use single or double quotation marks with a b prefix to represent data of type bytes, for example:

X = baked ABC'

Attention! 'ABC' is not the same as baked ABC'. ABC' is of type str, followed by bytes,bytes that takes only one byte per character.

The str represented by Unicode can be encoded to the specified bytes through the encode () method, for example:

> 'ABC'.encode (' ascii')

Baked ABC'

> > 'Chinese' .encode ('utf-8')

B'\ xe4\ xb8\ xad\ xe6\ x96\ x87'

> > 'Chinese' .encode ('ascii')

Traceback (most recent call last):

File "", line 1, in

'Chinese '.encode (' ascii')

UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range

As you can see, English str can be encoded as bytes in ASCII, and the content is the same.

Str containing Chinese can be encoded as bytes in UTF-8.

If str containing Chinese is encoded in ASCII, it will make an error because the scope of Chinese coding is beyond the range of ASCII coding.

In bytes, bytes that cannot be displayed as ASCII characters are displayed with\ xstream #, so if a byte stream is read from the network or disk, the data read is bytes. If you want to convert bytes to str, you need to use the decode () method, for example:

> b'ABC'.decode ('ascii')

'ABC'

> b'\ xe4\ xb8\ xad\ xe6\ x96\ x87'.decode ('utf-8')

Http://www.bhnkyixue.com/ of Wuxi Men's Hospital

Note that if the bytes contains bytes that cannot be decoded, the decode () method will report an error!

If there are only a small number of invalid bytes in the bytes, you can pass in the bytes that errors='ignore' uses to ignore the error, for example:

> b'\ xe4\ xb8\ xad\ xff'.decode ('utf-8')

Traceback (most recent call last):

File "", line 1, in

B'\ xe4\ xb8\ xad\ xff'.decode ('utf-8')

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 3: invalid start byte

> b'\ xe4\ xb8\ xad\ xff'.decode ('utf-8', errors='ignore')

'medium'

If you want to calculate how many characters str contains, you can use the len () function, for example:

> len ('ABC')

three

> len ('Chinese')

two

If you want to calculate how many characters bytes contains, you can also use len (), which calculates the number of bytes, for example:

> len (baked ABC')

three

> len (b'\ xe4\ xb8\ xad\ xe6\ x96\ x87')

six

> len ('Chinese' .encode ('utf-8'))

six

If you look carefully, a Chinese character encoded by UTF-8 usually takes up 3 bytes, while an English character takes up only one character.

However, in practice, the conversion between str and bytes is often used. In order to avoid garbled code, we specify to use UTF-8 coding when coding:

Ps: of course, python also supports other coding methods, such as encoding Unicode to GB2312, but it's too troublesome!

#-*-coding: utf-8-*-

This line of comment is not useless, but tells the python interpreter to read the source code according to the UTF-8 code, otherwise the Chinese output written in the code may have garbled code.

If the .py file itself uses UTF-8 encoding and specifies: #-*-coding: utf-8-* -, then opening the command prompt test will display the Chinese language normally.

Formatting

There is also a general problem of how to output formatted strings, using% implementation in python, for example:

> > 'Hello,% s'% 'world'

'Hello, world'

>'Hi,% s, you have $% d.'% ('jack', 1000000)

'Hi, jack, you have $1000000.'

So, the% operator is used to format strings, in which% s means to replace with strings, and% d means to replace with integers. How many%? Placeholder, followed by several variables or values, if there is only one%?, parentheses can be omitted.

Common placeholders are as follows:

Also, formatting integers and floating-point numbers can also specify whether to complement zeros and integer and decimal digits.

If you don't know what to use,% s will always work, and it will convert any data type to a string, for example:

> 'Age:% s. Gender:% s'% (25, True)

'Age: 25. Gender: True'

R if the% in the string is an ordinary character, it needs to be escaped, using%% to represent a%, for example:

> 'growth rate:% d%' 7

'growth rate: 7%'

Format ()

Another way to format a string is to use the format () method of the string, which replaces the placeholders {0}, {1}, and so on in the string with the passed parameters.

This way of writing is more troublesome than%, for example:

> 'Hello, {0}, the score has been improved by {1ve.1f}%' .format ('Xiaoming', 17.125)

"Hello, Xiao Ming, his grades have improved by 17.1%."

When formatting strings, you can use python's interactive environment to test, which is more convenient.

At this point, I believe that everyone on the "python string and coding example analysis" have a deeper understanding, might as well come to the actual operation! Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.