Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Common errors in Python data reading

2025-02-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

This article mainly explains the "common errors in reading Python data". The content in the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "common mistakes in reading Python data".

1 、 UnicodeDecodeError

The default encoding format of the read_csv read-in file is: utf-8. If the read-in file cannot be encoded by utf-8, the above error will be reported.

But how do we know the encoding format of the read-in file? Today, I'd like to introduce you to a package in this area: chardet, which returns the encoding format of the file. Use pip install chardet before use and install it. The next step is to write this to return the encoding format, and file is the name of the read file.

# get file encoding type def get_encoding (file): # read in binary mode, get byte data, detect type with open (file, 'rb') as f: return chardet.detect (f.read ()) [' encoding']

After parsing the encoding format of the file through the charadet package, whether you use python native open, read, or pandas read_csv, you can pass it to the parameter encoding.

2. Sep delimiter

Common file delimiters, such as\ t, csv files default to commas, but commonly used large databases, such as hive, sometimes use the delimiter\ t, so you need to adjust the parameter sep. This kind of mistake is easier to solve.

3. If you encounter a row that does not correspond to the number of columns when reading a file, an error will be reported at this time

Especially when the reading file is hundreds of millions of lines, when it is almost finished, this error is suddenly reported, and the number of fields parsed by this line does not match the number of previous rows.

At this point, you need to adjust one parameter: error_bad_lines is false, which means to ignore this line.

Pandas.read_csv (* *, error_bad_lines=False)

In the actual project, the file data environment read is more complex than we expected. Suppose the default delimiter for our data file is a comma, and then if a cell in a row has a value of:

Shandong Province, Weifang City, Qingzhou City

This cell alone will parse multiple columns, and it is natural to report errors, which requires us to fully clean the data before reading it.

4. EOF inside string starting at line error

This error often occurs when reading into a file. This type of error requires modification of the quoting parameter.

Df = pd.read_csv (csvfile, quoting=csv.QUOTE_NONE)

The default value is 0, which can be adjusted according to the document when an error is encountered.

Quoting: int or csv.QUOTE_* instance, default 0Control field quoting behavior per csv.QUOTE_* constants. Use one of QUOTE_MINIMAL (0), QUOTE_ALL (1), QUOTE_NONNUMERIC (2) or QUOTE_NONE (3). Thank you for your reading, the above is the content of "common errors in Python data reading". After the study of this article, I believe you have a deeper understanding of the common errors in Python data reading, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report