Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use Pandas to deal with large files in blocks

2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article is about how to use Pandas to block large files. Xiaobian thinks it is quite practical, so share it with everyone for reference. Let's follow Xiaobian and have a look.

Processing large files in chunks using Pandas

Question: When processing fast hand user data today, I encountered a txt text of almost 600M, which jumped when opened with sublime. I used pandas.read_table() to read it. It took almost 2 minutes, and finally opened it and found almost 30 million lines of data. It's just opening, and it's going to take a lot of effort to process.

Solution: I flipped through the document, this type of function to read files has two parameters: chunksize, iterator

The principle is not to read the file data into memory at one time, but to divide it into multiple times.

1. Specify chunksize to read the file in chunks

read_csv and read_table have a chunksize parameter that specifies a block size (how many rows are read at a time) and return an iterable TextFileReader object.

table=pd.read_table(path+'kuaishou.txt',sep='\t',chunksize=1000000) for df in table: df processing #e.g. df.drop(columns=['page','video_id'],axis=1,inplace=True) #print(type(df),df.shape) Print see message 12345

Here I divide the file into several sub-files for processing separately (yes, to_csv also has chunksize parameter)

2. Specify iterator=True

iterator=True also returns TextFileReader object

reader = pd.read_table ('tmp.sv', sep ='\t', iterator=True) df=reader.get_chunk(10000) #Return a chunk of a size row by getting_chunk(size)#Then you can also process df

Thank you for reading! About "how to use Pandas block processing large files" this article is shared here, I hope the above content can be of some help to everyone, so that everyone can learn more knowledge, if you think the article is good, you can share it to let more people see it!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report