In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/01 Report--
This article introduces the relevant knowledge of "how to replace the np.nan value in the numpy two-dimensional array with the specified value". In the operation of the actual case, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!
In numpy, the nan value usually appears before data cleaning, and this value indicates that the data is missing. Sometimes we choose to delete the data directly, but sometimes the data cannot be deleted, so we need to use some methods to replace the np.nan value with the specified value.
Basics:
(1) np.nan indicates that the value is not a number, such as the missing value of income and age in the data; np.inf means infinity.
(2) the result of np.nan = = np.nan is False
(3) the operation result of nan and any number is nan, for example, the result of sum ((np.nan,4)) is nan.
(4) A ndarray array T1, you can use np.isnan (T1) to locate the position of the nanvalue, and then use T1 [np.isnan (T1)] = to replace nan with the specified value.
(5) np.nan_to_num (T1), you can replace nan in T1 with 0
(6) T1 [T1 = = T1] can eliminate all nan and retain only non-nanvalue.
Now generate an array of 3-4, set the first row, and the second and third column positions as np.nan.
Import numpy as npt1 = np.arange (12). Reshape (3pj4). Astype ('float') T1 [1Magne2:] = np.nanprint (T1)
[[0. 1. 2. 3.]
[4. 5. Nan nan]
[8. 9. 10. 11.]]
1. Question 1:
How to replace nan in T1 with 0
# method 1:for i in range (t1.shape [1]): col = T1 [:, I] coll [np.isnan (col)] = method 2: call np.nan_to_num method T1 = np.nan_to_num (T1) # method 3: or index with np.isnan (T1), and then replace it. It is recommended to use this method T1 [np.isnan (T1)] = 0
Method 3 can be replaced not only with 0, but also with other values, which is recommended.
two。 Question 2:
How to replace nan in T1 with some calculated values, such as the mean of all non-nan elements in the column
It is sometimes not appropriate to replace the missing value in the original data with 0. For example, if the age of some people in the original data is not filled in, if it is replaced by 0, there will be unreasonable consequences in calculating the average age or doing data analysis in the future. At this point, it is more reasonable to set the age of people without age to the average.
(1) method 1:for i in range (t1.shape [1]): col = T1 [:, I] # if there is nan in the current column, because np.nan is not equal to np.nan, if there is nan in a column, then collocated nan will have elements that are True,np.count_nonzero and the method will accumulate the number of elements with the value of True This method can be used to determine whether the column has nan nan_num = np.count_nonzero (col! = col) if nan_num: not_nan_col = col [col = = col] # filter the matrix with the Boolean matrix col = = col as the index, and the elements of the False position in the Boolean matrix will be removed. Col. [np.isnan (col)] = not_nan_col.mean () print (T1)
Running result:
[[0. 1. 2. 3.]
[4. 5. 6. 7.]
[8. 9. 10. 11.]]
(2) the 2:np.nanmean method can be used to calculate the mean value of non-nan. in addition, there are np.nanmax and np.nanmin methods. So the above program can be rewritten as follows: mean = np.nanmean (T1 recording axisym0) print ('the average of each column is:% s'% mean) for i in range (t1.shape [1]): col = T1 [:, I] coll [np.isnan (col)] = print [T1)
The running result is the same as above.
(3) method 3
Use the powerful pandas library
# can also be handled with pandas, which is more simple and convenient import pandas as pddf = pd.DataFrame (T1) T1 = df.fillna (df.mean ()). Values # values instead of as_matrix (), you can convert DataFrame to ndarrayprint (T1)
The running result is the same as above.
Add: python quickly replaces Nan (null) and inf (infinite) in Numpy
In data processing, in order to ensure that the number of data remains the same, it is necessary to replace the null and infinite values in the data with the specified values (here it is 255. considering the large amount of data (50000000 pieces of data), efficiency is also a consideration.
The following is the core code for replacing data # +-+ print ('Predict New Data.') start = datetime.datetime.now () dataPre = input_Data # the raw data to be processed is entered here # 0: 00 start 23.012951 marks the time of this method (take 50000000 pieces of data as an example) dataPre0 = np.array (dataPre) dataPre0 [np.isnan (dataPre0)] = 255dataPre0 [np.isinf (dataPre0)] = 255th 0:02:03.038840dataPre1 = (dataPre) dataPre1 = dataPre1.replace ([np.inf) -np.inf], np.nan) dataPre1 = dataPre1.fillna (value = 255) # 0:02:03.140287dataPre2 = (dataPre) dataPre2 = (dataPre2.replace ([np.inf,-np.inf]) Np.nan) .fillna (value) # shi yong te ding shuju tian chong# 0:00:30.346661dataPre3 = np.array (dataPre) dataPre3 [(dataPre3 = = float ('inf')) | (dataPre3 = = float ('-inf')) | (dataPre3 = = float ('nan'))] = 25 values 0:00:19.702519dataPre4 = np.array (dataPre) dataPre4 [np.isinf (dataPre4)] = np.nan # convert the infinite value in the array to a null value dataPre4 [np.isnan (dataPre4)] = 25 values # replace the nan value with 25 values 0:01:10.404677dataPre5 = np.array (dataPre) dataPre5 = np.where (np.isnan (dataPre5) DataPre5) dataPre5 = np.where (np.isinf (dataPre5), 255, dataPre5)
It can be seen that the efficiency of several methods is quite different, especially the method that uses replace or np.where function, which is relatively slow.
This is the end of the content of "how to replace the np.nan value in the numpy 2D array with the specified value". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.