In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly introduces "how to deal with missing values in Python". In daily operation, I believe that many people have doubts about how to deal with missing values in Python. The editor consulted all kinds of materials and sorted out simple and easy-to-use methods of operation. I hope it will be helpful to answer the doubts of "how to deal with missing values in Python". Next, please follow the editor to study!
Detect missing value
Let's first create a data box (DataFrame) with missing values.
Import pandas as pddf = pd.DataFrame ({'Aids: [None, 2, None, 4],' None: [10, None, None, 40], 'None: [100,200, None, 400],' dating: [None, 2000, 3000, None]}) df
The missing values of the numeric class are displayed as NaN (Not A Number) in Pandas. Let's see how to determine which columns or rows have missing values.
1.info ()
In the results returned by info (), we just need to observe whether the number of Non-Null Count corresponding to each column is equal to RangeIndex (index range).
2.isnull ()
Isnull () returns a data box of the same size as the original DataFrame (number of columns, rows), and the data corresponding to the row represents whether the position is missing or not.
Df.isnull ()
Use sum () to detect the number of missing values in each column.
Df.isnull () .sum ()
Transpose the DataFrame through .T to get the number of missing values detected in each row.
Df.isnull () .T.sum ()
Missing value processing deleting missing value
If the row / column with missing values is of little importance, you can delete the row / column with missing values directly using dropna ().
Df.dropna (axis=0, how='any', thresh=None, subset=None, inplace=False)
Parameter meaning
Axis: parameters that control rows and rows, 0 rows, 1 column.
How:any, if there is NaN, delete the row or column; all, if all values are NaN, delete the row or column.
Thresh: specify the number of NaN, which will be deleted when the number of NaN reaches.
Subset: to consider the data range, such as: delete missing rows, use subset to specify the reference column, the default is all columns.
Inplace: whether to modify the original data. True directly modifies the original data. If None,False is returned, the processed data box will be returned.
Specify axis = 1 and delete the column if there is a missing value in the column.
Df.dropna (axis=1, how='any')
Because each column has a missing value, only the index is left.
Specify axis = 0 (the default) and delete the row if there is a missing value in the row.
Df.dropna (axis=0, how='any')
Delete rows where all three columns are missing values by reference to the ABC column.
Df.dropna (axis=0, subset= ['A','B','C'], how='all')
Keep rows with at least 3 non-nan values.
Df.dropna (axis=0, thresh=3)
Fill the missing value
Another common way to handle missing values is to use fillna () to fill in missing values.
Df.fillna (value=None, method=None, axis=0, inplace=False, limit=None)
1. Specify the fill value directly
Df.fillna (666)
two。 Fill with the value before / after the missing value
Fill by the previous value
When the method value is ffill or pad, fill it according to the previous value.
When axis = 0, it is populated with the last value of the same column, but not if the missing value is in the first row.
When axis = 1, populate it with the previous value of the same row as the missing value, but not if the missing value is in the first column.
Df.fillna (axis=0, method='pad')
Fill by the last value
When the method value is backfill or bfill, fill it with the latter value.
When axis = 0, it is populated with the next value of the same column, but not if the missing value is in the last row.
When axis = 1, populate it with the next value in the same row as the missing value, but not if the missing value is in the last column.
Df.fillna (axis=0, method='bfill')
Specify the appropriate method to populate the
Df.fillna (df.mean ())
Limit limits the number of filling times
On the ABCD column, only the first null value is populated in each column.
Df.fillna (value=666, axis=1, limit=1)
At this point, the study on "how to deal with missing values in Python" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.