In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/01 Report--
This article will explain in detail how pandas reads excel files. The editor thinks it is very practical, so I share it for you as a reference. I hope you can get something after reading this article.
Basic usage of read_excel () import pandas as pdfile_name = 'xxx.xlsx'pd.read_excel (file_name) two common parameters of read_excel ():
Io: the excel path can be a file path, a class file object, a file path object, etc.
Sheet_name=0: access a worksheet of the specified excel. Sheet_name can be of type str, int, list or None, with a default value of 0.
The str type is to specify the name of the worksheet directly
The int type is the index that specifies the worksheet starting at 0, so the sheelt_name default value is 0, which is the first worksheet.
The list type is a list consisting of multiple indexes or worksheet names that specify multiple worksheets.
None type, accessing all worksheets
Sheet_name=0: you get the data of type DataFrame of the first sheet
Sheet_name=2: what you get is the DataFrame type data of the third sheet
What sheet_name='Test1': gets is data of type DataFrame named 'Test1'' sheet
Sheet_name= [0,3, 'Test5']: the first, the fourth, and the worksheet named Test5 are used as dictionaries of data of type DataFrame.
Header=0:header is the header row, which is used as the header row of the data, that is, the column name of the entire data, by specifying a specific row index. The default first row data (0-index) is used as the header row, and if a list of integers is passed in, these rows are combined into a multi-level column index. No header line uses header=None.
Name=None: pass in a column of data of class array type, which is used as the column name of the data. If the file data does not contain a header line, explicitly indicate the header=None.
Skiprows:int type, class list type, or tunable function. The number of lines to skip (index 0) or the number of lines to skip at the beginning of the file (int). If callable, the callable function evaluates based on the row index, returning True if the row should be skipped, or False otherwise. An example of a valid callable parameter is lambda x: X in [0,1,2].
Skipfooter=0: int type. Default is 0. From the bottom up, data that specifies the number of skipped rows from the tail.
Usecols=None: specify the columns to use, if all columns are not resolved by default.
Index_col=None: the int or element is a list of int, using the data of a column as the row label of DataFrame. If a list is passed, these columns will be combined into a multiple index, and if a subset selected by usecols is used, the index_col will be based on that subset.
Squeeze=False, Boolean value, default False. If the parsed data has only one column, a Series is returned.
Dtype=None: specifies the data type of a column, which can make the type name or a dictionary corresponding to the column name and type, for example {'missing: np.int64,' substituted: str}
Nrows=None: int type, default None. Parses only the data for the specified number of rows.
Three examples
The figure shows the excel file used in the illustration, which contains five worksheets.
1. IO: path
To take an example of IO as a file object, sometimes when the file file path contains more complex Chinese strings, pandas may fail to parse the file path, which can be solved by using the file object.
File = 'xxxx.xlsx'f = open (file,' rb') df = pd.read_excel (f, sheet_name='Sheet1') f.close () # if you don't use with, remember to release it manually. #-with mode-with open (file, 'rb') as f: df = pd.read_excel (f, sheet_name='Sheet1') 2. Sheet_name: specify the worksheet name
Sheet_name='Sheet', specifies that the worksheet named "Sheet1" is resolved. Returns a data of type DataFrame.
Df = pd.read_excel (file, sheet_name='Sheet1')
Sheet_name= [0,1, 'Sheet1'] corresponds to the first and second worksheets of the parsing file and the worksheet named "Sheet1". It returns an ordered dictionary. The structure is of type {name:DataFrame}.
Df_dict = pd.read_excel (file, sheet_name= [0meme 1pm sheet 1'])
Sheet_name=None parses all the worksheets in the file and returns data of the same dictionary type as above.
Df_dict = pd.read_excel (file, sheet_name=None)
3. Header: specify the header line
Header is used to specify the header row of the data, that is, the column name of the data. The sample file used in this article has two row column names in both English and Chinese, and the default header=0 is the column name that uses the first row of data as the data.
Df_dict = pd.read_excel (file, sheet_name='Sheet1')
Header=1, using the English column name specified to use the second row.
Df_dict = pd.read_excel (file, sheet_name='Sheet1', header=1)
It is important to note that if it is not possible to specify any row as the column name, or if the data source is untitled row data, you can display the specified header=None to indicate that the column name is not used.
Df_dict = pd.read_excel (file, sheet_name='Sheet1', header=None)
4. Names: specify the column name
Specify the column name of the data. If the data already has a column name, the original column name will be replaced.
Df = pd.read_excel (file, sheet_name='Sheet1', names=list ('123456789ABCDE'))
The figure above shows that the first Chinese name of header=0 is the header row by default, and finally the column name is replaced by names. If you only want to use names and do not make any changes to the source data, we can specify header=None.
Df = pd.read_excel (file, sheet_name='Sheet1', names=list ('123456789ABCDE'), header=None)
5. Index_col: specify column index df = pd.read_excel (file, sheet_name='Sheet1', header=1, index_col=0)
6. Skiprows: skip the data of specified number of rows df = pd.read_excel (file, sheet_name='Sheet1', skiprows=0)
Df = pd.read_excel (file, sheet_name='Sheet1', skiprows= [1, 3, 5, 7, 9])
Header and skiprows sometimes have the same effect, such as skiprows=5 and header=5. Because after skipping five lines, the sixth row, that is, the row with an index of 5, defaults to the title row. It is important to note that 5 of skiprows=5 is the number of rows and 5 of header=5 is the row with index 5.
Df = pd.read_excel (file, sheet_name='Sheet1', header=5)
Df = pd.read_excel (file, sheet_name='Sheet1', skiprows=5)
7. Skipfooter: omit row data from the tail
The original data has 47 rows, as shown in the following figure:
Skip 5 lines from the tail:
Df = pd.read_excel (file, sheet_name='Sheet1', skipfooter=5)
8.dtype specifies the data type for some columns
In the sample data, the test encoded data is text, and pandas is automatically converted to int64 type when parsing, so that the first zero of the codes column disappears, resulting in a data error, as shown in the following figure
Specify the data type of the codes column:
Df = pd.read_excel (file, sheet_name='Sheet1', header=1, dtype= {'codes': str})
This is the end of the article on "how pandas reads excel files". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, please share it for more people to see.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.