Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to quickly process more than 30, 000 pieces of data with Python

2025-02-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

This article introduces you how to quickly use Python to deal with more than 30, 000 pieces of data, the content is very detailed, interested friends can refer to, hope to be helpful to you.

Application scenarios: we often encounter a large number of data that need to be integrated, de-duplicated, exported according to a specific format, and so on. If you use Excel, it is not only time-consuming and laborious, but also inaccurate. Is there a more efficient solution?

Taking 17 txt texts and more than 30, 000 pieces of data as an example, Python is used to connect to MySQL database to achieve fast operation.

My Python assistant took care of the overtime work done by others in a few seconds!

Write data to the MySQL database

The file shown in the following figure is the data source for this article:

Our idea is to write a Mini Program, throw it in this folder, and double-click it to automatically read the data in each txt document and write it to the database.

The code is as follows:

Import pymysql import os conn = pymysql.connect (host='localhost', user='root', password='123456', db='qq', charset='utf8') cur = conn.cursor () cur.execute ("CREATE TABLE qq (id int (5) NOT NULL auto_increment, qq varchar (20) NOT NULL, PRIMARY KEY (id) ") conn.commit () path = os.getcwd () files = os.listdir (path) I = 0 for file in files: F = open (file,'r',encoding = 'UTF-8') next (f) for line in f: I + = 1 # print (line) sql =" insert into qq (qq) values (% s) "cur.execute (sql,line) print (" insert item ", I," data! ") Conn.commit () f.close () cur.close () conn.close ()

Running effect:

Key code explanation:

This code uses two libraries, pymysql and os.

Pymysql: used to manipulate MySQL database

Os: used to traverse all the files in the folder.

The main code is now explained as follows:

1. Traverse all file names under any folder

After the program is written, use pyinstaller to package the exe program and put it under the folder you want to operate.

Use the path = os.getcwd () command to get the directory where the exe file is located.

Through the files = os.listdir (path) command, get all the file names in the same directory as the exe file and store them in the files list.

In this way, we get all the txt file names, you can name your txt file names at will, and the program can read them.

2. Write data to the database

(1) connect to the database and create a new table in the database

a. Connect to my qq database

Conn = pymysql.connect (host='localhost', user='root', password='123456', db='qq', charset='utf8')

b. Create a new table qq

Create a new table in the qq database with the name qq and contain two fields: the id field is the primary key and is automatically incremented, and the qq field is character type and is used to store data.

Cur.execute ("CREATE TABLE qq (id int (5) NOT NULL auto_increment, qq varchar (20) NOT NULL, PRIMARY KEY (id)")

(2) write data to the database

Two layers of loops are used here:

For file in files: F = open (file,'r',encoding = 'UTF-8') next (f) for line in f: I + = 1 # print (line) sql = "insert into qq (qq) values (% s);" cur.execute (sql,line) print ("insert item", I, "data!") Conn.commit () f.close ()

The first layer loop is used to open the above 17 txt files in turn.

The second layer loop reads each line of each txt file in turn and inserts the renamed data into the qq field of the database table qq.

At this point, the import of data is completed, with a total of 32073 pieces of data.

Data cleaning

Here, taking the removal of duplicate values as an example, a brief introduction to data cleaning.

1. Create a new table to store the cleaned data

You can log in to MySQL under the cmd window, open the qq database, and do the following:

CREATE TABLE qq_dist (id int (5) NOT NULL auto_increment, qq varchar (20) NOT NULL, PRIMARY KEY (id))

In this way, a new table qq_dist is created to store the cleaned data for later call.

2. Clean the data

After logging in to MySQL, do the following:

Insert into qq_dis (qq) select distinct qq from qq

Insert the non-duplicate qq field contents found from the qq table into the qq field in the qq_dist table.

Export data in a specific format

Example: export lines 101-200 of the cleaned data to the new txt text.

The code is as follows:

Import pymysql conn = pymysql.connect (host='localhost', user='root', password='123456', db='wxid', charset='utf8') print ("writing, please wait …") Cur = conn.cursor () sql = "select wxid from wd_dis limit 100100;" cur.execute (sql) conn.commit () alldata = cur.fetchall () f = open ('data101-200.txtwords last paragraph) I = 0 for data in alldata: I + = 1 f.write (data [0]) f.flush f.close cur.close () conn.close () print ("write completed, a total of {} pieces of data have been written!" .format (I))

Key code explanation:

1 、 limit

The meaning of the limit mdirection n function in MySQL is to read n lines starting with line 1.

So, reading lines 101-200 in this case is limit 100100.

2. Flush ()

The flush () function must be added, which writes data from the buffer to the file. Otherwise, there will be an error that the generated txt document is blank.

On how to quickly use Python to deal with more than 30, 000 pieces of data to share here, I hope the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report