In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/01 Report--
How to remove the interpolation of outliers and missing values by python? aiming at this problem, this article introduces the corresponding analysis and solution in detail, hoping to help more partners who want to solve this problem to find a more simple and feasible method.
1. Use the box method to remove outliers:
Import numpy as npimport pandas as pdimport matplotlib as pltimport osdata = pd.read_excel ('try.xls' Header=0) # print (data.shape) # print (data.head (10)) # print (data.describe ()) neg_list = ['displacement'] print ("(1) the number of rows of data is:") R = data.shape [0] print (R) print ("(2) data extraction below or greater than threshold:") for item in neg_list: neg_item = data [item] [item] .quantile (0.75) + 1.5 * iqr Print (item + 'have' + str (q_abnormal_L.sum () + q_abnormal_U.sum ()) + 'outliers') print ("(4) Box chart to determine the upper and lower limits:") for item in neg_list: iqr = data [item] .quantile (0.75)-data [item] .quantile Too_small = da [item] .quantile (0.25)-1.5 * iqr Too_big = data [item] .quantile (0.25) + 1.5 * iqrprint ("lower limit is") Too_small) print ("upper limit is", Too_big) print ("(5) all data are:") a = [] for i in neg_list: a.append (data [I]) print (a) print ("(6) all normal data:") b = [] j = 0while j
< R: if (a[0][j] >Too_small): if (a [0] [j]
< Too_big): b.append(a[0][j]) j += 1print(b)print("(7)所有异常数据:")c = []i = 0while i < R: if (a[0][i] < Too_small or a[0][i] >Too_big): c.append (a [0] [I]) a [0] [I] = None I + = 1print (c) print ("(8) after deleting all abnormal data:") print (a) print ("(9) output after all data processing:") d = [] k = 0while k
< R: d.append(a[0][k]) k +=1print(d)df = pd.DataFrame(d,columns= ['位移'])df.to_excel("try_result.xls") 2.拉格朗日插值: import osimport pandas as pdimport numpy as npfrom scipy.interpolate import lagrangeimport matplotlib.pyplot as pltplt.rcParams['font.sans-serif']=['SimHei'] #用来正常显示中文标签plt.rcParams['axes.unicode_minus']=False #用来正常显示负号# 数据的读取data = pd.read_excel('try.xls', header=0)neg_list = ['位移']# 数据的行数R = data.shape[0]# 异常数据的个数for item in neg_list: iqr = data[item].quantile(0.75) - data[item].quantile(0.25) q_abnormal_L = data[item] < data[item].quantile(0.25) - 1.5 * iqr q_abnormal_U = data[item] >Print (item + 'has' + str (q_abnormal_L.sum () + q_abnormal_U.sum ()) + 'outliers') # determine the upper and lower limits of data for item in neg_list: iqr = data [item]. Quantile (0.75)-iqr [item]. Qu antile (0.25) Too_small = data [item] .quantile (0. 25)-1.5 * iqr Too_big = data [item] .quantile (0.25) + 1.5 * iqrdata [u 'displacement'] [(data [u 'displacement'] Too_big)] = None # filter outliers Make it null # s is the column vector, n is the interpolated position K is the number of data before and after fetching (def ployinter (range): y = s [list (range (nlyk)) + list (range (name1))] y = y [y.notnull ()] # eliminate null return lagrange (y.index List (y)) (n) # determine whether interpolation is needed element by element: for i in data.columns: for j in range (len (data)): if (data [I]. Isnull ()) [j]: data [I] [j] = ployinter (data [I] J) # print (data [u 'displacement']) # output Lagrangian interpolated data data.to_excel ("try_result.xls") # adjust table column data to arr Arr is the modified data print ("data after Lagrangian interpolation:") d = [] k = 0while k < R: d.append (data [u 'displacement']] [k]) k + = output print (d) arr = np.array (d) print (arr) # output image x = np.arange (len (d)) plt.plot Linewidth=1) # b stands for blue color-represents the straight line plt.title ('displacement curve') plt.legend (loc='upper left',bbox_to_anchor= (1.0 and 1.0)) # directly changes the number of X-axis coordinates # plt.xticks ((0, 1, 2, 3, 4, 5, 6, 7, 8), ('0, 1, 2, 2, 3, 3, 4, 5, 6, 7, 8), (0, 1, 2, 3, 3, 4, 5, 6, 7) '8') plt.xlabel (' time / h') plt.ylabel ('displacement / mm') # plt.grid (x1) plt.show ()
3. Data fitting:
Import numpy as npimport pandas as pdimport matplotlib.pyplot as pltfrom scipy.optimize import leastsqdef Fun (p, x): # define fitting function form A1, a2, A3, A4 = p return A1 * x * * 3 + a2 * x * * 2 + a3 * x + a4def error (p, x, y): # fitting residual return Fun (p, x)-ydef main (): X = np.linspace (1,31) 31) # create time series data = pd.read_excel ('try.xls', header=0) y = data [u' displacement'] p0 = [0.1,0.01,100,1000] # set the initial parameters of fitting para = leastsq (error, p0, args= (x, y)) # to fit y_fitted = Fun (para [0]) X) # draw the fitted curve plt.figure plt.plot (x, y, 'ringing, label='Original curve') plt.plot (x, y_fitted,'-baked, label='Fitted curve') plt.legend () plt.show () print (para [0]) if _ _ name__ ='_ _ main__': main ()
4. Output image:
Import pandas as pdimport numpy as npimport matplotlib.pyplot as pltplt.rcParams ['font.sans-serif'] = [' SimHei'] # is used to display the Chinese label plt.rcParams ['axes.unicode_minus'] = False # normally. It is used to display the negative sign jiaodu = [' 0,15,'30,'15,'60,'75,'90, '105,' 120'] x = range (len (jiaodu)) y = [85.6801, 7.64586) 86.0956, 159.229, 179.534, 163.238, 96.4436, 10.1619, 90.9262,] # plt.figure (figsize= (10, 6)) plt.plot (XRecience) marker='*',markersize=7,linewidth=3) # b represents blue color-represents straight line plt.title ('brightness change of each region') plt.legend (loc='upper left') Bbox_to_anchor= (1.0 and 1.0)) plt.xticks ((0min1, 2, 3, 4, 5, 6, 7, 8), ('0, 15, 30, 15, 60, 75, 90, 105) ) plt.xlabel ('Angle') plt.ylabel ('Luminance') # plt.grid (x1) plt.show () the answer to the question about how python removes the interpolation of outliers and missing values is here. I hope the above content can help you to a certain extent, if you still have a lot of doubts to be solved, you can follow the industry information channel to learn more related knowledge.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.