In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article mainly explains "how to solve the problem of reading and writing garbled codes in csv documents". Interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn "how to solve the problem of reading and writing garbled csv files".
You may have a similar experience, use excel to open a csv file, Chinese all show garbled. Then, open it manually with notepad++, modify the code to utf-8 and save it, and then open it with excel and display normally.
With Python today, the above process can be automated with very little code. First, import three modules:
# coding: utf-8 # @ author: zhenguo # @ date: 2020-12-16 # @ describe: functions about automatic file processing import pandas as pd import os import chardet
The chardet module is used to get the encoding format of the file, which is read by pandas and saved to xlsx format.
Get the encoding format of the filename file:
Def get_encoding (filename): "return file encoding format" with open (filename,'rb') as f: return chardet.detect (f.read ()) ['encoding'] "
Save as utf-8 encoding xlsx format file, support csv, xls, xlsx format file garbled handling. It is important to note that if the read-in file is in csv format, save it in xlsx format:
Def to_utf8 (filename): "" Save as to_utf-8 "" encoding= get_encoding (filename) ext = os.path.splitext (filename) if ext [1] = '.csv': if 'gb' in encoding or' GB' in encoding: df = pd.read_csv (filename,engine='python',encoding='GBK') else: df = pd.read_csv (filename) Engine='python',encoding='utf-8') df.to_excel (ext [0] + '.xlsx') elif ext [1] = = '.xls' or ext [1] = '.xlsx': if 'gb' in encoding or' GB' in encoding: df = pd.read_excel (filename,encoding='GBK') else: df = pd.read_excel (filename) Encoding='utf-8') df.to_excel (filename) else: print ('only support csv, xls, xlsx format')
The above function implements the conversion of a single file, and the following batch_to_utf8 implements batch garbled conversion of all ext_name files with suffixes under the directory path:
Def batch_to_utf8 (path,ext_name='csv'): "" under path, garbled files with the suffix ext_name are converted into readable files in batches "" for file in os.listdir (path): if os.path.splitext (file) [1] = ='.'+ ext_name: to_utf8 (os.path.join (path,file))
Call:
If _ _ name__ = ='_ main__': batch_to_utf8 ('.') # Save all csv files in the current directory as xlsx format, utf-8-encoded files
The problem of garbled code when reading and writing files is often encountered. I believe the to_utf8,batch_to_utf8 function in today's article will solve this problem. If you encounter it later, you might as well directly refer to these two functions to try.
At this point, I believe you have a deeper understanding of "how to solve the problem of reading and writing garbled csv files". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.