How to understand the encoding of Python files 04/19 Update SLTechnology News&Howtos

How to understand the encoding of Python files

2025-04-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces "how to understand Python file coding". In daily operation, I believe many people have doubts about how to understand Python file coding. The editor consulted all kinds of materials and sorted out simple and easy-to-use methods of operation. I hope it will be helpful to answer the doubts of "how to understand Python file coding". Next, please follow the editor to study!

1. Code coding problem

I believe many friends have heard of such words, the default encoding of Python3 is UTF-8, which sounds a little fantasy, what is the default coding of Python?

When we compile and run a py file (test.py), the Python compiler first reads the file test.py, then decodes the data by UTF-8 by default, and then compiles and runs, and the program runs.

We know that the decoding and encoding of data are in pairs, and the same encoding method needs to be adopted, otherwise the decoded data will be different from the original data. In fact, it is also similar to AES decryption and encryption.

Just imagine, the test.py file is written in GBK encoding by a text editor, and the default UTF-8 decoding method will cause garbled code.

If you don't believe me, show code with you:

# Test environment: OS: Mac os 10.15IDE: PycharmPython: Python3.8Author: Xiyuan Childe 1.1, case 1:

The test script test.py, the script file is saved in UTF-8 encoding.

Content:

# coding=gbk# Author: zwjjiaozhu# Date: 2021 Acme IDE: VsCodeimport sysprint (sys.getdefaultencoding ()) name = 'Little print' (f "name: {name}\ n name_type: {type (name)}\ n: {repr (name)}") with open ('utf.txt', 'walled, encoding='utf8') as f: f.write (name) with open (' gbk.txt', 'walled, encoding='gbk') as f: f.write (name)

Results:

Utf-8name: Ham / Ham /

You should be confused at this time. What is all this? Why does the print show Hao Hao Yun. Write to the file in the way of GBK, but the content is normally shown as Xiaojia, and the way to write to the file in the way of UTF-8 is displayed as Ham.

Next, I'll make it clear:

The first line of the above code adds # coding=gbk, which tells the compiler to use GBK decoding to decode the test.py file, convert it to the corresponding Unicode code, and then run the code.

Because name = 'Xiaojia' is in Chinese, when GBK is used for decoding and translated into Unicode code value, there is a problem of garbled code after printing and display (that is, the problem that the value does not correspond to each other). Other codes are letters, different coding and decoding methods can be displayed normally, who invented it in the United States?

Then write to the file:

Encoding='utf8', write the value of name decoded by the compiler into the file utf.txt with UTF-8 encoding. When using notepad to open the utf.txt file, notepad is opened by default with UTF-8 decoding mode, which will show Haoyuanyou. Encode and decode in pairs, the medium is UTF-8

Encoding='gbk', in the same way, writes the Unicode value of the compiler decoded name (the Unicode value corresponding to Unicode) into the file gbk.tx t with GBK encoding. If you use UTF-8 decoding to open this text at this time, it will show that Xiaojia is normal, if you use GBK to open it, I'm sorry, it still shows Hao Hao. The python compiler counteracts the GBK decoding of test.py and the GBK encoding of writing to the file, resulting in the original test.py encoding data in UTF-8.

Maybe you still don't understand, hand animation sketching is highly recommended. Here I draw a flow chart to deepen understanding.

I'm so tired. I have a headache in my liver. I finally explained it clearly.

2. String encoding

There are two common formats within Python, string and byte bytes types, which are usually converted to bytes during network transfers and when writing to files. For example, "Hello" and b "\ xe4\ xbd\ xa0\ xe5\ xa5\ xbd" (encoded by utf-8) are equivalent.

2.1, string and byte conversion

Bytes begin with\ x followed by hexadecimal numbers

Mode 1: direct transfer using encode and decode

Name = "Hello" name_bytes = name.encode ("utf8") # encode with utf8, use gbk or ascii to name2 = name_bytes.decode ("utf8") # similarly, you must use utf8 to decode, so that the encoding and decoding are corresponding Print (f "name: {name}, type: {type (name)}") print (f "name_bytes: {name_bytes}, type: {type (name_bytes)}") print (f "name2: {name2}, type: {type (name2)}") # result: # name: Hello, type:# name_bytes:b'\ xe4\ xbd\ xa0\ xe5\ xa5\ xbd',type:# name2: Hello, type:

Mode 2, use str and bytes to transfer

Age = '12'age_bytes = bytes (age, encoding= "utf8") age_str = str (age_bytes.decode ("utf8")) print (f "age: {age}, type: {type (age)}") print (f "age_bytes: {age_bytes}, type: {type (age_bytes)}") print (f "age_str: {age_str}, type: {type (age_str)}") # result: Type:# age_str:12,type:

Description: strings exist in Unicode in Python, that is, in memory, therefore, when doing encoding conversion, you usually need to use Unicode as the intermediate encoding, that is, you first decode (decode) other encoded strings into Unicode, and then encode unicode into another encoding.

The function of decode is to decode bytes bytes into str type strings, such as str1.decode ('gb2312'), which means that the string str1 is decoded and converted into str type by gb2312 decoding, that is, the str type of Python.

The role of encode is to convert str type string encoding to bytes bytes, such as str2.encode ('gb2312'), which means that the string str2 is converted to bytes type in gb2312 encoding. Therefore, when transcoding, you must first figure out what the data str1 is encoded and how it is encoded, and decode it in what way.

At this point, the study on "how to understand Python file coding" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.