Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Common bug of character conversion in python

2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/02 Report--

Why did 1.python report an error when writing a unicode string to a file?

The parameter type of the write method is that str,str is a binary stream (no encoding information). When you give a unicode object, it executes the str function to convert to the str type and then sends it to the write method. The conversion from unicode to str contains one-time encoding. If it is not specified, ascii coding is used by default, but there is no corresponding Chinese character in the ascii coding set, so an error is reported.

The right thing to do is to specify the code in the code. For example, specify (fp= open ('test.txt', 'wrought, encoding='utf-8')) in open, or manually encode the unicode object through the encode method to generate str when write. It means to write it as fp.write (s.encode ('utf8')). Note that it makes sense to use encode for unicode objects. Str objects allow you to use encode for str objects in py2, but this is only valid if default encoding is specified, so it is not recommended for beginners to encode str directly.

2.Error:UnicodeEncodeError: 'gbk' codec can't encode character u'\ u200e' in position 43: illegal multibyte sequence

The root cause of the original 'gbk' codec can't encode' error is that for the previous, whether using the

TitleHtml.decode ("UTF-8")

Or titleHtml.decode ("UTF-8", 'ignore')

Or titleHtml.decode ("UTF-8", 'replace')

You can get normal Unicode characters of titleUni, and then for the characters of this Unicode, if you need to print out, because the local system is cmd in Win7, and the default codepage is CP936, that is, the encoding of GBK, you need to first encode the titleUni of the above Unicode as GBK, and then display it in cmd, and then because titleUni contains some characters that cannot be displayed in GBK, it will prompt "'gbk' codec can't encode" error at this time.

For this (class) question:

(1) UnicodeEncodeError-> indicates that it is a problem with Unicode coding.

(2) 'gbk' codec can't encode character-> description is a problem when encoding Unicode characters to GBK.

At this point, it is often most likely that the characters of the Unicode type itself contain some characters that cannot be converted to GBK encoding.

The solution is:

Option 1:

When encoding unicode characters, add the ignore parameter and ignore characters that cannot be encoded, so that it can be encoded as GBK normally.

The corresponding code is:

GbkTypeStr = unicodeTypeStr.encode ("GBK", 'ignore')

Option 2:

Alternatively, convert it to a GBK-encoded superset GB18030 (that is, GBK is a subset of GB18030):

Gb18030TypeStr = unicodeTypeStr.encode ("GB18030")

The corresponding character is the encoding of GB18030.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report