How to implement AdaBoost algorithm by python 04/20 Update SLTechnology News&Howtos

How to implement AdaBoost algorithm by python

2025-04-20 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article mainly explains "python how to achieve AdaBoost algorithm", the content of the article is simple and clear, easy to learn and understand, now please follow the editor's ideas slowly in depth, together to study and learn "python how to achieve AdaBoost algorithm" bar!

By implementing the AdaBoost algorithm import numpy as npimport pandas as pdimport mathfrom math import logfrom math import expfrom sklearn.datasets import load_irisfrom sklearn.model_selection import train_test_splitdef create_data (): iris = load_iris () df = pd.DataFrame (iris.data, columns=iris.feature_names) df ['label'] = iris.target df.columns = [' sepal length', 'sepal width',' petal length', 'petal width' 'label'] data = np.array (df.iloc [: 100,0,1-1]]) for i in range (len (data)): if data [I,-1] = 0: data [I,-1] =-1 return data [:,: 2], data [:,-1] class AdaBoost: def _ init__ (self, n_estimators=50) Learning_rate=1.0): self.clf_num = n_estimators self.learning_rate = learning_rate def init_args (self, datasets, labels): self.X = datasets self.Y = labels self.M Self.N = datasets.shape # number of weak classifiers and set self.clf_sets = [] # initialization weights self.weights = [1.0 / self.M] * self.M # G (x) coefficient alpha self.alpha = [] def _ G (self, features, labels) Weights): M = len (features) error = 100000.0 # infinite best_v = 0.01D features features_min = min (features) features_max = max (features) n_step = (features_max-features_min + self.learning_rate) / / self.learning_rate # print ('n _ Step: {} '.format (n_step)) direct Compare_array = None, None for i in range (1 Int (n_step): v = features_min + self.learning_rate * i if v not in features: # misclassification calculation compare_array_positive = np.array ([1 if features [k] > v else-1 for k in range (m)]) weight_error_positive = sum ([ Weights [k] for k in range (m) if compare_array_ position [k]! = labels [k]]) compare_array_nagetive = np.array ([- 1 if features [k] > v else 1 for k in range (m)]) weight_error_nagetive = sum ([weights [k] for k in range (m) if compare_array_ nagetiv [k]! = labels [k]]) if weight_error_positive

< weight_error_nagetive: weight_error = weight_error_positive _compare_array = compare_array_positive direct = 'positive' else: weight_error = weight_error_nagetive _compare_array = compare_array_nagetive direct = 'nagetive' # print('v:{} error:{}'.format(v, weight_error)) if weight_error < error: error = weight_error compare_array = _compare_array best_v = v return best_v, direct, error, compare_array # 计算alpha def _alpha(self, error): return 0.5 * np.log((1 - error) / error) # 规范化因子 def _Z(self, weights, a, clf): return sum([ weights[i] * np.exp(-1 * a * self.Y[i] * clf[i]) for i in range(self.M) ]) # 权值更新 def _w(self, a, clf, Z): for i in range(self.M): self.weights[i] = self.weights[i] * np.exp( -1 * a * self.Y[i] * clf[i]) / Z # G(x)的线性组合 def _f(self, alpha, clf_sets): pass def G(self, x, v, direct): if direct == 'positive': return 1 if x >

V else-1 else: return-1 if x > v else 1 def fit (self, X, y): self.init_args (X, y) for epoch in range (self.clf_num): axis = 0 final_direct = 'null' best_clf_error, best_v, clf_result = 100000, None, None # according to feature dimension Select the for j in range (self.N) with the least error: features = self.X [:, j] # classification threshold Classification error, classification result v, direct, error, compare_array = self._G (features, self.Y, self.weights) if error

< best_clf_error: best_clf_error = error best_v = v final_direct = direct clf_result = compare_array axis = j # axis数字代表第几个属性列 # print('epoch:{}/{} feature:{} error:{} v:{}'.format(epoch, self.clf_num, j, error, best_v)) if best_clf_error == 0: break # 计算G(x)系数a a = self._alpha(best_clf_error) self.alpha.append(a) # 记录分类器 self.clf_sets.append((axis, best_v, final_direct)) # 规范化因子 Z = self._Z(self.weights, a, clf_result) # 权值更新 self._w(a, clf_result, Z) def predict(self, feature): result = 0.0 for i in range(len(self.clf_sets)): axis, clf_v, direct = self.clf_sets[i] f_input = feature[axis] result += self.alpha[i] * self.G(f_input, clf_v, direct) # sign return 1 if result >

0 else-1 def score (self, X_test, y_test): right_count = 0 for i in range (len (X_test)): feature = XTest [I] if self.predict (feature) = = y_test [I]: right_count + = 1 return right_count / len (X_test) X, y = create_data () X_train, X_test, y_train Y_test = train_test_split (X, y, test_size=0.2) clf = AdaBoost (n_estimators=3, learning_rate=0.5) clf.fit (X_train, y_train) print ("score: {}" .format (clf.score (X_test, y_test)

Results: sometimes 1.0,0.75, 0.6 and 0.4 note that this program may report an error when calculating the normalization factor: TypeError: 'NoneType' object is not subscriptable. The reason is that when dividing the data, v selection happens to result in empty on one side and full on the other. Because one side is empty, the parameter clf is none when calculating the normalization factor. At this time, we are using clf [I], which certainly won't work, so we reported this error.

Sklearn already has a package to call import numpy as npimport pandas as pdimport mathfrom math import logfrom math import expfrom sklearn.datasets import load_irisfrom sklearn.model_selection import train_test_splitfrom sklearn.ensemble import AdaBoostClassifierdef create_data (): iris = load_iris () df = pd.DataFrame (iris.data, columns=iris.feature_names) df ['label'] = iris.target df.columns = [' sepal length', 'sepal width',' petal length', 'petal width' 'label'] data = np.array (df.iloc [: 100,0,1-1]]) for i in range (len (data)): if data [I,-1] = 0: data [I,-1] =-1 return data [:,: 2], data [:,-1] X, y = create_data () X_train, X_test, y_train, y_test = train_test_split (X, y) Test_size=0.2) clf = AdaBoostClassifier (n_estimators=100, learning_rate=0.5) clf.fit (X_train, y_train) print ("score: {}" .format (clf.score (X_test, y_test) Thank you for reading The above is the content of "how to achieve the AdaBoost algorithm in python". After the study of this article, I believe you have a deeper understanding of how to achieve the AdaBoost algorithm in python, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.