In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
Editor to share with you how to deal with missing values in Python data analysis. I hope you will gain something after reading this article. Let's discuss it together.
Let's first create a sample data box and add some missing values to it.
We have a data box with 10 rows and 6 columns.
The next step is to add the missing value. We will use the loc method to select the row and column combinations and make them equal to "np.nan", which is one of the standard missing value representations.
This is what the data frame looks like now:
The item and measure 1 columns have integer values, but due to the lack of values, they have been converted up to floating point numbers.
In Pandas 1.0, the integer type missing value representation () was introduced, so we can also include missing values in integer columns. However, we need to explicitly declare the data type.
Although there are missing values, we can now keep integer columns.
Now we have a data box that contains some missing values. It's time to look at the different ways to deal with them.
1. Delete rows or columns with missing values
One option is to delete rows or columns that contain missing values.
Using the default parameter values, the dropna function deletes the row that contains any missing values. There is only one row in the data box without any missing values. At the same time, we can also choose to use the axis parameter to delete columns with at least one missing value.
two。 Delete rows or columns with only missing values
Another case is that there is a column or row full of missing values. Such columns or rows are useless, so we can delete them.
The dropna function can also be used for this purpose. We just need to change the value of the how parameter.
3. Delete rows or columns based on threshold
Deletions based on "any" or "all" are not always the best choice. We sometimes need to delete rows or columns with "a large number" or "some" missing values.
We cannot assign such an expression to the how parameter, but Pandas provides us with a more accurate method, the thresh parameter.
For example, "thresh=4" means that at least four rows with non-missing values will be retained. The rest will be discarded.
Our data box has six columns, so rows with 3 or more missing values will be deleted.
Only the third row has more than 2 missing values, so it is the only one that is discarded.
4. Delete based on a specific subset of columns
When deleting a column, we can consider only some of the columns.
A subset of the dropna function is used for this task. For example, we can delete a row with missing values in a measure 1 or measure 2 column, as follows:
So far, we have seen different ways to delete rows or columns based on missing values. Giving up is not the only option. In some cases, we may choose to populate the missing values rather than delete them.
In fact, padding may be a better choice, because data means value. How to fill the missing values certainly depends on the structure and task of the data.
The fillna function is used to fill in the missing values.
5. Fill in a constant value
We can choose a constant value to replace the missing value. If we only give the fillna function a constant value, it will replace all missing values in the data box with that value.
A more reasonable approach is to determine separate constant values for different columns. We can write them to the dictionary and pass them to the values parameter.
The missing value in the item column is replaced with 1014, while the missing value in the measure 1 column is replaced with 0.
6. Fill aggregate valu
Another option is to use aggregate values, such as average, median, or mode.
The following line of code replaces the missing value in column 2 with the average of the column.
7. Replace with the previous or next value
You can replace the missing value in the column with the previous or next value in the column. This method may come in handy when dealing with time series data. Suppose you have a data box that contains daily temperature measurements, but lacks a day's temperate zone. The best solution is to use the temperature of the next day or the day before.
The method arguments of the fillna function are used to perform this task.
"bfill" fills the missing values backwards to replace them with the next value. Look at the last column. The missing value is replaced to the first row. This may not be appropriate for some situations.
Fortunately, we can limit the number of missing values that can be replaced in this way. If we set the limit parameter to 1, a missing value can only be replaced with its next value. The second or third value that follows will not be used for replacement.
8. Populate with another data box
We can also pass another data frame to the fillna function. The values in the new data box will be used to replace the missing values in the current data box.
The value is selected based on the row index and column name. For example, if there is a missing value in the second row of the item column, the value in the same location in the new data box will be used.
The above are two data boxes with the same column. The first one does not have any missing values.
We can use the fillna function as follows:
The values in df are replaced with the values in df2 about column names and row indexes.
After reading this article, I believe you have a certain understanding of "how to deal with missing values in Python data analysis". If you want to know more about it, you are welcome to follow the industry information channel. Thank you for reading!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.