Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use Python to save Excel and encounter big data's problem

2025-01-29 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

This article shows you how to use Python to save Excel encounter big data problem, the content is concise and easy to understand, absolutely can make your eyes bright, through the detailed introduction of this article, I hope you can get something.

Excel can do a lot of things; it's a pain when it comes to larger data sets. It takes a long time for the data to load, and the whole thing becomes unmanageable before you realize that the machine is running out of memory. Not to mention that Excel can only support up to 1048576 lines.

If there is a simple way, it is to transfer the data to the SQL database for analysis. This is how Python saves the world.

SQL in Python

First, let's take a look at the most popular options for using SQL in Python: MySQL and SQLite.

MySQL has two popular libraries: PyMySQL and MySQLDb;, while SQLite has SQLite3.

SQLite is the so-called embedded database, which means it runs in our application, so you don't need to install it somewhere first (unlike MySQL).

This is an important difference; it plays a key role in our search for rapid data analysis. Therefore, we will continue to learn how to use SQLite.

Set up SQLite in Python

The first thing we need to do is import the library:

Import sqlite3

Then we need to determine whether to save the database anywhere or in memory while the application is running.

If we decide to actually save the database by importing any data, we must give the database a name, such as' FinanceExplainedDb', and use the following command:

Dbname = 'FinanceExplainedDb' conn = sqlite3.connect (dbname +' .sqlite')

On the other hand, if we want to keep the whole thing in memory and make it disappear when it's done, we can use the following command:

Conn = sqlite3.connect (': memory:')

At this point, SQLite is all set up and ready to be used in Python. Assuming we load some data in Table 1, we can execute the SQL command in the following ways:

Cur = conn.cursor () cur.execute ('SELECT * FROM Table1') for row in cur: print (row)

Now let's explore how to make data available through applications that use pandas.

Use pandas to load data

Assuming we already have the data and we want to analyze it, we can use the Pandas library to do this.

First, we need to import the pandas library, and then we can load the data:

Import pandas as pd # if we have a csv file df = pd.read_csv ('ourfile.csv') # if we have an excel file df = pd.read_excel (' ourfile.xlsx')

Once we load the data, we can put it directly into our SQL database with a simple command:

Df.to_sql (name='Table1', con=conn)

If you load multiple files in the same table, you can use the if_exists parameter:

Df.to_sql (name='Table1', con=conn, if_exists='append')

When dealing with large datasets, we will not be able to use this single-line command to load data. Our application will run out of memory. Instead, we have to load the data bit by bit. In this example, we assume that 10000 rows are loaded at a time:

Chunksize= 10000 for chunk in pd.read_csv ('ourfile.csv', chunksizechunksize=chunksize): chunk.to_sql (name='Table1', con=conn, if_exists='append')

Put everything together.

To put it all together, we provide a Python script that covers most of what we discussed.

Import sqlite3, pandas as pd, numpy as np # Creating test data for us-- you can ignore from sklearn import datasets iris = datasets.load_iris () df1 = pd.DataFrame (data= np.c_ [iris ['data'], iris [' target']], columns= iris ['feature_names'] + [' target']) df1.to_csv ('TestData.csv') Index=False) # # conn = sqlite3.connect (': memory:') cur = conn.cursor () chunksize= 10 for chunk in pd.read_csv ('TestData.csv', chunksizechunksize=chunksize): chunkchunk.columns = chunk.columns.str.replace (','_') # replacing spaces with underscores for column names chunk.to_sql (name='Table1', con=conn If_exists='append') cur.execute ('SELECT * FROM Table1') names = list (map (lambda x: X [0], cur.description)) # Returns the column names print (names) for row in cur: print (row) cur.close () the above is how to use Python to save Excel and encounter big data's problem. Have you learned any knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report