In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-07 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly introduces "python Analog Logistic regression Model and maximum Entropy Model example Analysis". In daily operation, it is believed that many people have doubts about python Simulation Logistic regression Model and maximum Entropy Model. The editor consulted all kinds of data and sorted out simple and useful operation methods. I hope it will be helpful for you to answer the doubts of "python simulation logistic regression model and maximum entropy model example analysis". Next, please follow the editor to study!
Simulation of logistic regression model
Idea: use the new regression function y = 1 / (exp (- x)), where x is the classification function, that is, w1*x1 + w2*x2 + = 0. For each sample data, we calculate y once and calculate the error △ y, and then update the weight vector w, where α is the learning rate, △ y is the error of the current training data, and x [I]'is the transposition of the current training data.
In this example, the number of times is limited, and the error can also be limited.
From math import expimport numpy as npimport pandas as pdimport matplotlib.pyplot as pltfrom sklearn.datasets import load_irisfrom sklearn.model_selection import train_test_split# datadef create_data (): iris = load_iris () df = pd.DataFrame (iris.data, columns=iris.feature_names) df ['label'] = iris.target df.columns = [' sepal length', 'sepal width',' petal length', 'petal width',' label'] data = np.array (df.iloc [: [0,1,-1]]) # print (data) return data [:,: 2], data [:,-1] class LogisticReressionClassifier: def _ init__ (self, max_iter=200, learning_rate=0.01): self.max_iter = max_iter # maximum number of training for the entire data self.learning_rate = learning_rate # learning rate def sigmoid (self X): # regression model return 1 / (1 + exp (- x)) # collates the data, adding one column to each row and two columns # because of our linear classifier: w1*x1 + w2*x2 + baked 1.0 #, the original (x1, x2,) is extended to (x1, x2) 1) def data_matrix (self, X): data_mat = [] for d in X: data_mat.append ([1.0d]) return data_mat def fit (self, X, y): data_mat = self.data_matrix (X) # process training data # generate weight array # n row one column zero array The length of the number of rows is data_mat [0] # here is our w0 self.weights w2 self.weights = np.zeros ((len (data_mat [0]), 1), dtype=np.float32) for iter_ in range (self.max_iter): for i in range (len (X)): # traverses each X # the dot multiplication of the array returned by the dot function That is, matrix multiplication: one line multiplied by one column # in this case, the vector w * vector x is introduced into the regression model # returns the training value result = self.sigmoid (np.dot (data_mat [I], self.weights)) error = y [I]-result # error # transpose is a transpose function. Change weight # w = w + learning rate * error * vector x self.weights + = self.learning_rate * error * np.transpose ([data_ MatI]) print ('logistic regression model training completed (learning_rate= {}, max_iter= {})' .format (self.learning_rate, self.max_iter)) def score (self, X_test) Y_test): right = 0 X_test = self.data_matrix (X_test) for x, y in zip (X_test, y_test): result = np.dot (x, self.weights) if (result > 0 and y = = 1) or (result
< 0 and y == 0): right += 1 return right / len(X_test)X, y = create_data()X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)lr_clf = LogisticReressionClassifier()lr_clf.fit(X_train, y_train)print("评分:")print(lr_clf.score(X_test, y_test))x_points = np.arange(4, 8)# 原拟合函数为: w1*x1 + w2*x2 + b = 0# 即 w1*x + w2*y + w0 = 0y_ = -(lr_clf.weights[1]*x_points + lr_clf.weights[0])/lr_clf.weights[2]plt.plot(x_points, y_)plt.scatter(X[:50, 0], X[:50, 1], label='0')plt.scatter(X[50:, 0], X[50:, 1], label='1')plt.legend()plt.show() 结果如下: 逻辑斯谛回归模型训练完成(learning_rate=0.01,max_iter=200)评分:1.0 直接调用sklearn已有的逻辑斯蒂回归函数from math import expimport numpy as npimport pandas as pdimport matplotlib.pyplot as pltfrom sklearn.datasets import load_irisfrom sklearn.model_selection import train_test_splitfrom sklearn.linear_model import LogisticRegressiondef create_data(): iris = load_iris() df = pd.DataFrame(iris.data, columns=iris.feature_names) df['label'] = iris.target df.columns = ['sepal length', 'sepal width', 'petal length', 'petal width', 'label'] data = np.array(df.iloc[:100, [0, 1, -1]]) # print(data) return data[:, :2], data[:, -1]X, y = create_data()X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)clf = LogisticRegression(max_iter=200)clf.fit(X_train, y_train)print("socre:{}".format(clf.score(X_test, y_test)))print(clf.coef_, clf.intercept_)x_points = np.arange(4, 8)y_ = -(clf.coef_[0][0]*x_points + clf.intercept_)/clf.coef_[0][1]plt.plot(x_points, y_)plt.plot(X[:50, 0], X[:50, 1], 'bo', color='blue', label='0')plt.plot(X[50:, 0], X[50:, 1], 'bo', color='orange', label='1')plt.xlabel('sepal length')plt.ylabel('sepal width')plt.legend()plt.show() 结果: socre:1.0[[ 2.72989376 -2.5726044 ]] [-6.86599549] 最大熵模型。 最大熵原理:在满足约束条件的模型集合中选取熵最大的模型。 思想比较简单,但公式太多,结合课本公式使用更佳 import mathfrom copy import deepcopy# 深复制:将被复制的对象完全复制一份# 浅复制:将被复制的对象打一个标签,两者改变其一,另一个随着改变class MaxEntropy: def __init__(self, EPS=0.005): # 参数为收敛条件 self._samples = [] # 存储我们的训练数据 self._Y = set() # 标签集合,相当于去重后的y self._numXY = {} # key为(x,y),value为出现次数 self._N = 0 # 样本数 self._Ep_ = [] # 样本分布的特征期望值 self._xyID = {} # key记录(x,y),value记录id号 self._n = 0 # 所有特征键值(x,y)的个数 self._C = 0 # 最大特征数 self._IDxy = {} # key为ID,value为对应的(x,y) self._w = [] #存我们的w系数 self._EPS = EPS # 收敛条件 self._lastw = [] # 上一次w参数值 def loadData(self, dataset): self._samples = deepcopy(dataset) for items in self._samples: y = items[0] X = items[1:] self._Y.add(y) # 集合中y若已存在则会自动忽略 for x in X: if (x, y) in self._numXY: self._numXY[(x, y)] += 1 else: self._numXY[(x, y)] = 1 self._N = len(self._samples) self._n = len(self._numXY) self._C = max([len(sample) - 1 for sample in self._samples]) self._w = [0] * self._n # w参数初始化为n个0,其中n为所有特征值数 self._lastw = self._w[:] self._Ep_ = [0] * self._n # 计算特征函数fi关于经验分布的期望 # 其中i对应第几条 # xy对应(x, y) for i, xy in enumerate(self._numXY): self._Ep_[i] = self._numXY[xy] / self._N self._xyID[xy] = i self._IDxy[i] = xy def _Zx(self, X): # 计算每个Z(x)值。其中Z(x)为规范化因子。 zx = 0 for y in self._Y: ss = 0 for x in X: if (x, y) in self._numXY: ss += self._w[self._xyID[(x, y)]] zx += math.exp(ss) return zx def _model_pyx(self, y, X): # 计算每个P(y|x) zx = self._Zx(X) ss = 0 for x in X: if (x, y) in self._numXY: ss += self._w[self._xyID[(x, y)]] pyx = math.exp(ss) / zx return pyx def _model_ep(self, index): # 计算特征函数fi关于模型的期望 x, y = self._IDxy[index] ep = 0 for sample in self._samples: if x not in sample: continue pyx = self._model_pyx(y, sample) ep += pyx / self._N return ep def _convergence(self): # 判断是否全部收敛 for last, now in zip(self._lastw, self._w): if abs(last - now) >= self._EPS: return False return True def predict (self, X): # calculate prediction probability Z = self._Zx (X) result = {} for y in self._Y: ss = 0 for x in X: if (x Y) in self._numXY: ss + = self._ w [self. _ xyID [(x, y)]] pyx = math.exp (ss) / Z result [y] = pyx return result def train (self) Maxiter=1000): # training data for loop in range (maxiter): # maximum number of training self._lastw = self._w [:] for i in range (self._n): ep = self._model_ep (I) # calculate the model of feature I expect self._ w [I] + = math.log ( Self._Ep_ [I] / ep) / self._C # update parameter if self._convergence (): # determine whether it converges or not breakdataset = ['no' 'sunny',' hot', 'high',' FALSE'], ['no',' sunny', 'hot',' high', 'TRUE'], [' yes', 'overcast',' hot', 'high',' FALSE'], ['yes',' rainy', 'mild',' high', 'FALSE'], [' yes'] 'rainy',' cool', 'normal',' FALSE'], ['no',' rainy', 'cool',' normal', 'TRUE'], [' yes', 'overcast',' cool', 'normal',' TRUE'], ['no',' sunny', 'mild',' high', 'FALSE'], [' yes'] 'sunny',' cool', 'normal',' FALSE'], ['yes',' rainy', 'mild',' normal', 'FALSE'], [' yes', 'sunny',' mild', 'normal',' TRUE'], ['yes',' overcast', 'mild',' high', 'TRUE'], [' yes'] 'overcast',' hot', 'normal',' FALSE'], ['no',' rainy', 'mild',' high', 'TRUE'] maxent = MaxEntropy () x = [' overcast', 'mild',' high', 'FALSE'] maxent.loadData (dataset) maxent.train () print (' predict:', maxent.predict (x))
Results:
Predict: {'yes': 0.9999971802186581,' no': 2.819781341881656e-06} this is the end of the study on "python Simulation Logistic regression Model and maximum Entropy Model example Analysis", hoping to solve everyone's doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.