What are the knowledge points of Python coding? 02/10 Update SLTechnology News&Howtos

What are the knowledge points of Python coding?

2026-02-10 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article mainly introduces "what are the knowledge points of Python coding". In the daily operation, I believe that many people have doubts about the knowledge points of Python coding. The editor consulted all kinds of data and sorted out simple and easy-to-use methods of operation. I hope it will be helpful to answer the questions of "what are the knowledge points of Python coding?" Next, please follow the editor to study!

1. Str and bytes in Python 3

In Python3, there are two types of strings, str and bytes.

Today, let's talk about the difference between the two:

Unicode string (str type): stored in the form of Unicode code points, the form of human cognition

Byte string (bytes type): stored in byte form, in the form of machine awareness

All the strings you define in Python 3 are of unicode string type, which can be distinguished by using type and isinstance:

# python3 > str_obj = "Hello" > type (str_obj) > isinstance ("Hello", str) True > isinstance ("Hello", bytes) False >

While bytes is a binary sequence object, as long as you add a b before defining a string, it means that you want to define a string object of type bytes.

# python3 > > byte_obj = b "Hello World!" > > type (byte_obj) > isinstance (byte_obj, str) False > isinstance (byte_obj, bytes) True >

But when defining a Chinese string, you can't add b directly in front of it. Instead, you should use encode to transfer it.

Byte_obj=b "Hello" File ", line 1 SyntaxError: bytes can only contain ASCII literal characters. > str_obj= "Hello" > str_obj.encode ("utf-8") b'\ xe4\ xbd\ xa0\ xe5\ xa5\ xbd' >

2. Str and unicode in Python 2

In Python2, the type of string is different from that of Python3 and needs to be carefully distinguished.

In Python2, there are only two types of strings, unicode and str.

There is only the difference between unicode object and non-unicode object (which should be called str object):

Unicode string (unicode type): stored in the form of Unicode code points, the form of human cognition

Byte string (str type): stored in byte form, in the form of machine awareness

When we define a string directly using double or single quotation marks containing characters, it is the str string object, such as this:

# python2 > > str_obj= "Hello" > type (str_obj) > isinstance (str_obj, bytes) True > isinstance (str_obj, str) True >

When we put a u before double or single quotation marks, we are defining a unicode string object, such as this:

# python2 > > unicode_obj = u "Hello" > type (unicode_obj) > isinstance (unicode_obj, bytes) False > isinstance (unicode_obj, str) False >

3. How to detect the encoding of an object

All characters have corresponding coding values in the unicode character set (English name: code point).

Saving these coding values as binary bytecodes according to certain rules is what we call coding methods, such as UTF-8,GB2312 and so on.

In other words, when we want to persist the string in memory to the hard disk, we have to specify the encoding method, and in turn, when reading, we have to specify the correct encoding method (this process is called decoding), otherwise there will be garbled.

Then the problem arises: when we know the corresponding encoding method, we can decode it normally, but not all the time we can know what encoding method should be used to decode it.

At this point, we will introduce a library of python, chardet, which needs to be installed before using it:

Python3-m pip install chardet

Chardet has a detect method that predicts its encoding format:

> import chardet > chardet.detect ('Wechat official account: Python programming time' .encode ('gbk')) {' encoding': 'GB2312',' confidence': 0.99, 'language':' Chinese'}

Why is it a prediction? if you look at the output above, you will see that there is a confidence field that indicates the credibility of the prediction, or the success rate.

But when using it, if you have a small number of characters, you may be "misdiagnosed"). For example, there are only two characters in Chinese, like below, we use gbk coding, but use chardet but recognize it as KOI8-R coding.

> str_obj = "Chinese" > byte_obj = bytes (a, encoding='gbk') # first get a gbk encoded bytes > chardet.detect (byte_obj) {'encoding':' KOI8-R', 'confidence': 0.682639754276994,' language': 'Russian'} > strstr_obj2 = str (byte_obj, encoding='KOI8-R') > str_obj2' encoded codes

Therefore, in order to encode the diagnosis accurately, we should use as many characters as possible.

Chardet supports multiple languages, and you can see from the official documentation that these languages are supported

4. The difference between encoding and decoding

Encoding and decoding is actually the process of mutual transformation between str and bytes (Python 2 is long gone, here and later only Python 3 is used as an example)

Encoding: the encode method that converts a string object into a binary byte sequence

Decoding: decode method, which converts a binary byte sequence into a string object

Unicode & Character Encodings in Python

So if we do know the encoding format, how can we convert it to unicode?

There are two ways:

The first is to use the decode method directly

> byte_obj.decode ('gbk')' Chinese'>

The second is to use the str class to turn

> strstr_obj = str (byte_obj, encoding='gbk') > str_obj 'Chinese' > >

5. How to set file encoding

In Python 2, ASCII encoding is used by default to read, so when we use Python 2, if you have Chinese in your python file, you will get an error.

SyntaxError: Non-ASCII character'\ xe4' in file demo.py

The reason is that the ASCII coding table is too small to explain Chinese.

In Python 3, uft-8 is used by default to read, so it saves a lot of trouble.

There are usually two solutions to this problem:

(1) the first method

In python2, you can use the header to specify

It can be written like this, although it looks good.

#-*-coding: utf-8-*-

But it's too troublesome to write like this. I usually use the following two ways to write.

# coding:utf-8 # coding=utf-8

(2) the second method

Import sys reload (sys) sys.setdefaultencoding ('utf-8')

Here, reload (sys) is executed before calling sys.setdefaultencoding ('utf-8') to set the default decoding method, which is necessary, because python will delete the sys.setdefaultencoding method after loading sys, and we need to reload sys to call the sys.setdefaultencoding method.

At this point, the study of "what are the knowledge points of Python coding" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.