How to use Python to read and modify Excel files 04/15 Update SLTechnology News&Howtos

How to use Python to read and modify Excel files

2025-04-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article introduces you how to use Python to read and modify Excel files, the content is very detailed, interested friends can refer to, hope to be helpful to you.

1. Use xlrd module to read xls file.

Suppose our table is as follows, which is a table of "per capita net income of rural households and per capita consumption of rural households". The suffix is .xls. It contains two worksheets, "provinces and cities" and "Test sheet".

To mention, an Excel file is equivalent to a "workbook", and a "workbook" can contain multiple "sheet".

1.1 get workbook objects

Introduce the module to get the workbook object.

Import xlrd # introduction module # Open the file and get the workbook (workbook) object workbook=xlrd.open_workbook ("DataSource/Economics.xls") of the excel file ("workbook") # File path 1.2 get the worksheet object

We know that a workbook can contain multiple worksheets, and when we get the "workbook object", we can then get the worksheet object, either by "index" or by "table name".

'' operate on workbook object''# get the names of all sheet names=workbook.sheet_names () print (names) # ['provinces and cities', 'test table'] output all table names in the form of a list # get sheet object worksheet=workbook.sheet_by_index (0) print (worksheet) # # get sheet object worksheet=workbook.sheet_by_name ("provinces and cities") print (worksheet) # # as shown above by sheet index Workbook.sheet_names () returns a list object. You can operate on this list object sheet0_name=workbook.sheet_names () [0] # get the sheet name print (sheet0_name) # through the sheet index # get the basic information of the worksheet

After getting the Table object, we can get the basic information about the worksheet. Including table name, number of rows, and number of columns.

'' operate on the sheet object''name=worksheet.name # get the name of the table print (name) # provinces and cities nrows=worksheet.nrows # get the total number of rows of the table print (nrows) # 32 ncols=worksheet.ncols # get the total number of columns of the table print (ncols) # 131.4 get the data of the worksheet by row or column

With the number of rows and columns, it is easy to print out the entire contents of the table in a loop.

For i in range (nrows): # circularly print each line print (worksheet.row_values (I)) # read out in the form of a list, each item in the list is of str type # ['provinces and cities', 'wage income', 'net household income', 'property income', …] # [Beijing', '5047.4', '1957.1', '678.8', '592.2', '1879.0,.] col_data=worksheet.col_values (0) # get the content of the first column print (col_data)

1.5 get the data of a cell

We can also pinpoint the query to a particular cell.

In the xlrd module, the rows and columns of the worksheet are counted starting at 0.

# read the data in the table cell_value1=sheet0.cell_value (0) cell_value2=sheet0.cell_value (1) print (cell_value1) # provinces and cities print (cell_value2) # Beijing cell_value1=sheet0.cell (0). Valueprint (cell_value1) # provinces and cities cell_value1=sheet0.row (0) [0] .valueprint (cell_value1) # provinces and cities 2. Use the xlwt module to write to the xls file 2.1 create a workbook # Import xlwt module import xlwt # create a Workbook object Equivalent to creating an Excel file book=xlwt.Workbook (encoding= "utf-8", style_compression=0)''Workbook class initialization with encoding and style_compression parameters encoding: set character encoding, generally set: W = Workbook (encoding='utf-8'), you can output Chinese in excel. The default is ascii. Style_compression: indicates whether it is compressed or not, which is not commonly used.'' 2.2 create a worksheet

After you have created the workbook, you can create a worksheet in the appropriate workbook.

# create a sheet object, and a sheet object corresponds to a table in the Excel file. Sheet = book.add_sheet ('test01', cell_overwrite_ok=True) # where test is the name of the table, cell_overwrite_ok, indicating whether cells can be overridden, which is actually a parameter instantiated by Worksheet The default value is False2.3 to add data to the worksheet as cells # add data to the table test sheet.write (0,0, 'provinces and cities') # where'0-rows, 0-columns' specify the cells in the table "provinces and cities" is the content written to the unit sheet.write (0,1, 'wage income') # you can also add the data txt1 = 'Beijing' sheet.write (1 txt1 0, txt1) txt2 = 5047.4sheet.write (1, 1, txt2)

After the file is finally saved, the worksheet formed by the above statement is as follows:

2.4 add data to the worksheet in rows or columns

To verify this function, we create another worksheet in the workbook, the last worksheet is called "test01", then the worksheet is named "test02", all belong to the same workbook. In the following code, test02 is the table name, and sheet2 is the worksheet object that can be manipulated.

# add the second table sheet2=book.add_sheet ("test02", cell_overwrite_ok=True) Province= ['Beijing', 'Tianjin', 'Hebei', 'Shanxi Province', 'Inner Mongolia Autonomous region', 'Liaoning Province', 'Jilin Province', 'Heilongjiang Province', 'Shanghai', 'Jiangsu Province', 'Zhejiang Province', 'Anhui Province', 'Fujian Province' Jiangxi Province, Shandong Province, Henan Province, Hubei Province, Hunan Province, Guangdong Province, Guangxi Zhuang Autonomous region, Hainan Province, Chongqing, Sichuan Province, Guizhou Province, Yunnan Province, Xizang Autonomous region, Shaanxi Province, Gansu Province, Qinghai Province 'Ningxia Hui Autonomous region', 'Xinjiang Uygur Autonomous region'] Income= ['5047.4', '3247.9', '1514.7', '1374.3', '590.7', '1499.5', '605.1', '654.9', '6686.0', '3104.8', '3575.1', '1184.1', '1855.5', '1441.3', '1671.5' '1022.7', '1199.2', '1449.6', '2906.2', '972.3', '555.7', '1309.9', '1219.5', '715.5', '441.8', '568.4', '848.3', '637.4', '653.3', '823.1', '254.1'] Project= ['provinces and cities'' 'wage income', 'net income from household operations', 'property income', 'transfer income', 'food', 'clothing', 'residence', 'household equipment and services', 'transportation and communications', 'cultural, educational and recreational supplies and services', 'health care', 'other goods and services'] # fill in the first column for i in range (0 Len (Province): sheet2.write (sheet2.write 1,0, Province [I]) # fill in the second column for i in range (0Len (Income)): sheet2.write (iLike 1Len (Income)) # fill in the first line for i in range (0Len (Project)): sheet2.write (0Len I, Project [I]) 2.5 save the created file

Finally, save it in a specific path.

# finally, save the above operations to the specified Excel file book.save ('DataSource\\ test1.xls')

The executed worksheet test02 is as follows:

3. Use openpyxl module to read xlsx file.

The above two modules, xlrd and xlwt, both operate on Excel97-2003, that is, files that end in xls. Obviously, most of them are now above Excel2007, with xlsx as the suffix. To operate on this type of Excel file, you can use openpyxl, which can be used for both "read" and "write" operations, and to modify existing files.

3.1 get workbook object import openpyxl # get the difference between workbook object workbook=openpyxl.load_workbook ("DataSource\ Economics.xlsx") # and xlrd module # wokrbook=xlrd.open_workbook ("" DataSource\ Economics.xls) 3.2get all worksheet names # get all worksheet shenames=workbook.get_sheet_names () print (shenames) of workbook workbook # ['provinces and municipalities' 'test table'] # using the above statement for sheetnames=workbook.sheet_names () # in the xlrd module will issue a warning: DeprecationWarning: Call to deprecated function get_sheet_names (Use wb.sheetnames). # indicates that get_sheet_names has been deprecated and you can use the wb.sheetnames method shenames=workbook.sheetnamesprint (shenames) # ['provinces and cities', 'Test Table'] 3.3 to get worksheet objects

The worksheet name obtained in the previous section can be applied in this section to get the worksheet object.

# after obtaining the table name of the workbook, you can get the table object worksheet=workbook.get_sheet_by_name ("provinces and cities") print (worksheet) # # use the above statement to also pop up the warning: DeprecationWarning: Call to deprecated function get_sheet_by_name (Use wb [sheetname]). # rewrite it into the following format: worksheet=workbook ["provinces and cities"] print (worksheet) # # you can also obtain the table object worksheet1=workbook [shenames [1]] print (worksheet1) #

3.4 get worksheet objects according to the index method

The way to get the worksheet object in the previous section is actually through the "table name", we can get the worksheet object in a more convenient way, that is, through the "index" way.

# you can also get the table object worksheet=workbook.worksheets [0] print (worksheet) # # you can also use the following way to get the currently active worksheet. By default, the first worksheetws = workbook.active3.5 gets the properties of the worksheet

After you get the worksheet object, you can get the corresponding properties of the worksheet, including "table name", "number of rows", "number of columns"

# after the above operations, we have obtained the "table object" of the first "table". Next, we can operate on the table object name=worksheet.title # to get the table name print (name) # provinces and cities # get the corresponding number of rows and columns of the table for worksheet.name # in xlrd. Rows=worksheet.max_rowcolumns=worksheet.max_columnprint (rows,columns) # 321 gets the data in the table by row or column for worksheet.nrows worksheet.ncols3.6 in xlrd

To get the contents of the entire worksheet in rows or columns, we need to use the following two generators:

Sheet.rows, which is a generator that contains each row of data, each wrapped by a tuple type.

Sheet.columns, ditto, contains each column of data.

For row in worksheet.rows: for cell in row: print (cell.value,end= ") print ()" wage income families in provinces and cities operate net income, property income, transfer income, food, clothing, housing, household equipment and services. Beijing 5047.4 1957.1 678.8 592.2 1879.0 451.6 859.4 303.5 698.1 844.1 575.8 113.1. Tianjin 3247.9 2707.4 126.4 146.3 1212.6 265.3 664.4 122.4 441.3 315.6 263.2 56.1. For col in worksheet.columns: for cell in col: print (cell.value,end= ") print () all provinces and cities Beijing, Tianjin, Hebei Province, Shanxi Province, Inner Mongolia Autonomous region, Liaoning Province, Jilin Province, Heilongjiang Province, Shanghai, Jiangsu and Zhejiang provinces. Wage income 5047.4 3247.9 1514.7 1374.3 590.7 1499.5 605.1 654.9 6686.0 3104.8 3575.1. Net income from household operation 1957.1 2707.4 2039.6 1622.9 2406.2 2210.8 2556.7 2521.5 767.7 2271.4.

We can better understand the code by looking at the specific format in sheet.rows.

For row in worksheet.rows: print (row)'(,. (,.'' # We need two iterations for row in worksheet.rows: for cell in row: print (cell,end= "") print () '... .'' # cell.valuefor row in worksheet.rows: for cell in row: print (cell.value,end= "") print () 3.7 is also required to get data for a specific row or column

The above method can iterate over all the contents of the output table, but what if you want to get the contents of a specific row or column? We can think of an "index" approach, but sheet.rows is a generator type and cannot be used as an index. So we convert it to list and then use the index, such as list (sheet.rows) [3] to get the tuple object in the fourth row.

# output specific row for cell in list (worksheet.rows) [3]: # get the data of the fourth row print (cell.value,end= ") print () # Hebei 1514.7 2039.6 107.7 139.8 915.5 167.9 531.7 115.8 285.7 265.4 166.3 47.0 # output specific column for cell in list (worksheet.columns) [2]: # get the data print of the third column (cell.value End= ") print () # net income from household operations 1957.1 2707.4 2039.6 1622.9 2406.2 2210.8 2556.7 2521.5 767.7 2271.4 3084.3. # has been converted to the list type, naturally counting from 0. 3.8 get a piece of data

Sometimes we don't need a whole row or column of content, so we can get a small piece of content in the following way.

Notice the difference between the two ways. In the first way, because the generator is converted to the form of a list, the index is counted from 0.

In the second way, rows and columns are counted from 1, which is the biggest difference from the xlrd module. In xlrd, rows and columns are counted from 0, and openpyxl does this to be consistent with the Excel table, because in the Excel table, it starts at 1.

For rows in list (worksheet.rows) [0:3]: for cell in rows [0:3]: print (cell.value,end= "") print () net operating income of wage income households in various provinces and cities Beijing 5047.4 1957.1 Tianjin 3247.9 2707.4''for i in range (1,4): for j in range (1,4): print (worksheet.cell (row=i, column=j). Value End= () print () net operating income of wage income households in various provinces and cities Beijing 5047.4 1957.1 Tianjin 3247.9 2707.4 '3.9 to obtain the data of a certain cell

There are two ways.

# accurate reading of a cell in the table content _ A1=worksheet ['A1'] .valueprint (content_A1) content_A1=worksheet.cell (row=1,column=1) .value # equals content_A1=worksheet.cell (1Power1) .valueprint (content_A1) # the number of rows and columns here is counted from 1, while in xlrd, the number of rows and columns is counted from 0. 4. Use the openpyxl module to write to the xlsx file 4.1 to create a workbook and get a worksheet

The "W" in the same workbook=openpyxl.Workbook () should be capitalized.

Import openpyxl # creates a Workbook object, which is equivalent to creating an Excel file workbook=openpyxl.Workbook () # wb=openpyxl.Workbook (encoding='UTF-8') # to get the currently active worksheet. The default is the first worksheetworksheet = workbook.activeworksheet.title= "mysheet".

4.2 create a new worksheet

Worksheet2 = workbook.create_sheet () # insert at the end of the workbook by default # worksheet2 = workbook.create_sheet (0) # insert in the first position of the workbook worksheet2.title = "New Title" 4.3 write the data to the worksheet # the following is the data we want to write Province= ['Beijing', 'Tianjin', 'Hebei', 'Shanxi', 'Inner Mongolia Autonomous region', 'Liaoning Province' Jilin Province, Heilongjiang Province, Shanghai, Jiangsu, Zhejiang, Anhui, Fujian, Jiangxi, Shandong, Henan, Hubei, Hunan, Guangdong, Guangxi Zhuang Autonomous region, Hainan, Chongqing 'Sichuan Province', 'Guizhou Province', 'Yunnan Province', 'Xizang Autonomous region', 'Shaanxi Province', 'Gansu Province', 'Qinghai Province', 'Ningxia Hui Autonomous region', 'Xinjiang Uygur Autonomous region'] Income= ['5047.4', '3247.9', '1514.7', '1374.3', '590.7', '1499.5', '605.1' '654.9', '6686.0', '3104.8', '3575.1', '1184.1', '1855.5', '1441.3', '1671.5', '1022.7', '1199.2', '1449.6', '2906.2', '972.3', '555.7', '1309.9', '1219.5', '715.5', '441.8' '568.4', '848.3', '637.4', '653.3', '823.1', '254.1'] Project= ['provinces and cities', 'wage income', 'net household income', 'property income', 'transfer income', 'food', 'clothing', 'residence', 'household equipment and services' 'Transportation and communications', 'cultural, educational and recreational supplies and services', 'health care', 'other goods and services'] # write to the first line of data Both row and column numbers are counted from 1 to for i in range (len (Project)): worksheet.cell (1, item1jinjinProject [I]) # writes to the first column of data, because the first row already has data I+2for i in range (len (Province)): worksheet.cell (iMagazine 1 script [I]) # write the second column of data for i in range (len (Income)): worksheet.cell (iMagazine 2, Income [I]) 4.4 Save the workbook workbook.save (filename='DataSource\\ myfile.xlsx')

The final run result is as follows:

5. Modify the existing workbook (table) 5.1 insert a column of data

Using the last saved myfile.xlsx in section 4 as the table we want to modify, we plan to insert a column "number" at the top, as follows:

Import openpyxl workbook=openpyxl.load_workbook ("DataSource\ myfile.xlsx") worksheet=workbook.worksheets [0] # insert a list of worksheet.insert _ cols (1) # for index,row in enumerate (worksheet.rows) before the first column: if index==0: row [0] .value = "number" # one row [0] of each row is the first column else: row [0] .value = index# enumerates to be tuple type Count workbook.save from 0 (filename= "DataSource\ myfile.xlsx")

The running results are as follows:

5.2 modify a specific cell worksheet.cell (2d3) worksheet ["B2"] = "Peking"

The running results are as follows:

5.3 modify data in batches

Batch modification of data is equivalent to writing and will be overwritten automatically. It has been introduced in the previous section and I will not repeat it.

There is also the sheet.append () method, which can be used to add rows.

Taiwan= [32, Taiwan Province] worksheet.append (taiwan)

The running results are as follows:

On how to use Python to read and modify Excel files to share here, I hope that the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.