In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article introduces you how to quickly use Python to deal with more than 30, 000 pieces of data, the content is very detailed, interested friends can refer to, hope to be helpful to you.
Application scenarios: we often encounter a large number of data that need to be integrated, de-duplicated, exported according to a specific format, and so on. If you use Excel, it is not only time-consuming and laborious, but also inaccurate. Is there a more efficient solution?
Taking 17 txt texts and more than 30, 000 pieces of data as an example, Python is used to connect to MySQL database to achieve fast operation.
My Python assistant took care of the overtime work done by others in a few seconds!
Write data to the MySQL database
The file shown in the following figure is the data source for this article:
Our idea is to write a Mini Program, throw it in this folder, and double-click it to automatically read the data in each txt document and write it to the database.
The code is as follows:
Import pymysql import os conn = pymysql.connect (host='localhost', user='root', password='123456', db='qq', charset='utf8') cur = conn.cursor () cur.execute ("CREATE TABLE qq (id int (5) NOT NULL auto_increment, qq varchar (20) NOT NULL, PRIMARY KEY (id) ") conn.commit () path = os.getcwd () files = os.listdir (path) I = 0 for file in files: F = open (file,'r',encoding = 'UTF-8') next (f) for line in f: I + = 1 # print (line) sql =" insert into qq (qq) values (% s) "cur.execute (sql,line) print (" insert item ", I," data! ") Conn.commit () f.close () cur.close () conn.close ()
Running effect:
Key code explanation:
This code uses two libraries, pymysql and os.
Pymysql: used to manipulate MySQL database
Os: used to traverse all the files in the folder.
The main code is now explained as follows:
1. Traverse all file names under any folder
After the program is written, use pyinstaller to package the exe program and put it under the folder you want to operate.
Use the path = os.getcwd () command to get the directory where the exe file is located.
Through the files = os.listdir (path) command, get all the file names in the same directory as the exe file and store them in the files list.
In this way, we get all the txt file names, you can name your txt file names at will, and the program can read them.
2. Write data to the database
(1) connect to the database and create a new table in the database
a. Connect to my qq database
Conn = pymysql.connect (host='localhost', user='root', password='123456', db='qq', charset='utf8')
b. Create a new table qq
Create a new table in the qq database with the name qq and contain two fields: the id field is the primary key and is automatically incremented, and the qq field is character type and is used to store data.
Cur.execute ("CREATE TABLE qq (id int (5) NOT NULL auto_increment, qq varchar (20) NOT NULL, PRIMARY KEY (id)")
(2) write data to the database
Two layers of loops are used here:
For file in files: F = open (file,'r',encoding = 'UTF-8') next (f) for line in f: I + = 1 # print (line) sql = "insert into qq (qq) values (% s);" cur.execute (sql,line) print ("insert item", I, "data!") Conn.commit () f.close ()
The first layer loop is used to open the above 17 txt files in turn.
The second layer loop reads each line of each txt file in turn and inserts the renamed data into the qq field of the database table qq.
At this point, the import of data is completed, with a total of 32073 pieces of data.
Data cleaning
Here, taking the removal of duplicate values as an example, a brief introduction to data cleaning.
1. Create a new table to store the cleaned data
You can log in to MySQL under the cmd window, open the qq database, and do the following:
CREATE TABLE qq_dist (id int (5) NOT NULL auto_increment, qq varchar (20) NOT NULL, PRIMARY KEY (id))
In this way, a new table qq_dist is created to store the cleaned data for later call.
2. Clean the data
After logging in to MySQL, do the following:
Insert into qq_dis (qq) select distinct qq from qq
Insert the non-duplicate qq field contents found from the qq table into the qq field in the qq_dist table.
Export data in a specific format
Example: export lines 101-200 of the cleaned data to the new txt text.
The code is as follows:
Import pymysql conn = pymysql.connect (host='localhost', user='root', password='123456', db='wxid', charset='utf8') print ("writing, please wait …") Cur = conn.cursor () sql = "select wxid from wd_dis limit 100100;" cur.execute (sql) conn.commit () alldata = cur.fetchall () f = open ('data101-200.txtwords last paragraph) I = 0 for data in alldata: I + = 1 f.write (data [0]) f.flush f.close cur.close () conn.close () print ("write completed, a total of {} pieces of data have been written!" .format (I))
Key code explanation:
1 、 limit
The meaning of the limit mdirection n function in MySQL is to read n lines starting with line 1.
So, reading lines 101-200 in this case is limit 100100.
2. Flush ()
The flush () function must be added, which writes data from the buffer to the file. Otherwise, there will be an error that the generated txt document is blank.
On how to quickly use Python to deal with more than 30, 000 pieces of data to share here, I hope the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.