In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly introduces "python simulation naive Bayesian program example analysis". In daily operation, I believe many people have doubts in python simulation naive Bayesian program example analysis problems. The editor consulted all kinds of data and sorted out simple and easy-to-use operation methods. I hope it will be helpful to answer the doubts of "python simulation naive Bayesian program example analysis". Next, please follow the editor to study!
Naive Bayesian thought: the conditional probability formula P (Y _ (X)) = P (Y) P (X | Y) is used. P (Y) and P (X | Y) are obtained from the sample respectively, and then the probability of Y under X condition is estimated. The maximum probability corresponding to different Y is the classification of X we want. In other words, if we want to know the classification of X, then we can find out the P (Y) and P (X | Y) of different categories (that is, different Y) through the sample, and then calculate the probability of possible category Y under the condition of occurrence of X. the maximum probability is the probability we predicted.
Note that X usually corresponds to a lot of components, X = (X1 ~ X2,). At this time, Bayesian estimation assumes that the features used for classification are conditionally independent under the condition that the class is determined. So the above P (X | Y) calculation formula is:
Implementation of naive Bayesian Code
Import numpy as npimport pandas as pdimport mathimport matplotlib.pyplot as pltfrom sklearn.datasets import load_irisfrom sklearn.model_selection import train_test_splitclass NaiveBayes: def _ _ init__ (self): self.model = None # Mathematical expectation @ staticmethod def mean (X): return sum (X) / float (len (X)) # Standard deviation def stdev (self X): avg = self.mean (X) return math.sqrt (sum ([pow (x-avg, 2) for x in X]) / float (len (X)) # probability density function def gaussian_probability (self, x, mean, stdev): exponent = math.exp (- (math.pow (x-mean, 2) / (2 * math.pow (stdev) ) return (1 / (math.sqrt (2 * math.pi) * stdev)) * exponent # to calculate the mathematical expectation and standard deviation def summarize (self, train_data): a = list (zip (* train_data)) summaries = [(self.mean (I)) Self.stdev (I) for i in zip (* train_data)] # * train_data splits the train_data into n one-dimensional arrays # and compresses the one-dimensional array together. # Note: when compressed here, a total of four one-dimensional arrays are compressed, that is, the first dimension of each original array is compressed, and the second dimension of each original array is compressed # and then the mean and standard deviation of the four one-dimensional arrays are calculated respectively. That is to say, the digital feature return summaries # that responds to the four features deals with the def fit (self, X, y): labels = list (set (y)) # set deletes the duplicates, and list converts the set results into a list. Here labels= [0.0,1.0] data = {label: [] for label in labels} # is converted into a dictionary. Output {0.0: [], 1.0: []} for f, label in zip (X, y): data [label] .append (f) # add values belonging to this class to the above dictionary. That is, f self.model of type label = {label: self.summarize (value) for label, value in data.items () # from the above dictionary, a label and its corresponding data belonging to this label are calculated. # result format: {0: [(mean, standard deviation), (mean, standard deviation), (mean Standard deviation), (mean, standard deviation)], # 1: [(mean, standard deviation), (mean) The following four terms correspond to: the mean and standard deviation of the four features of the sample whose label is 0} return 'gaussianNB train deviation' # calculation probability def calculate_probabilities (self, input_data): probabilities = {} for label Value in self.model.items (): probabilities [label] = 1 for i in range (len (value)): mean, stdev = value [I] probabilities [label] * = self.gaussian_probability (input_data [I], mean, stdev) return probabilities # category def predict (self X_test): label = sorted (self.calculate_probabilities (X_test). Items (), key=lambda x: X [- 1]) [- 1] [0] return label def score (self, X_test, y_test): right = 0 for X, y in zip (X_test Y_test): label = self.predict (X) if label = = y: right + = 1 if right / float (len (X_test)) = = 1.0: return "perfect!" Else: return right / float (len (X_test)) def create_data (): iris = load_iris () df = pd.DataFrame (data=iris.data, columns=iris.feature_names) df ['label'] = iris.target df.columns = [' sepal length', 'sepal width',' petal length', 'petal width',' label'] data= np.array (df.iloc [: 100,:]) return data [:,:-1] Data [:,-1], dfX, y, DF = create_data () X_train, X_test, y_train, y_test = train_test_split (X, y, test_size=0.3) model = NaiveBayes () model.fit (X_train, y_train) print (model.score (X_test, y_test))
The result is ideal.
Directly use the existing packages in sklearn for simulation
Import numpy as npimport pandas as pdfrom sklearn.datasets import load_irisfrom sklearn.model_selection import train_test_splitfrom sklearn.naive_bayes import GaussianNBfrom sklearn.naive_bayes import BernoulliNB, MultinomialNB # Bernoulli Model and polynomial Model # datadef create_data (): iris = load_iris () df = pd.DataFrame (iris.data, columns=iris.feature_names) df ['label'] = iris.target df.columns = [' sepal length', 'sepal width',' petal length' 'petal width',' label'] data = np.array (df.iloc [: 100,:]) return data [:,:-1], data [:-1] X, y = create_data () X_train, X_test, y_train, y_test = train_test_split (X, y, test_size=0.3) clf = GaussianNB () clf.fit (X_train, y_train) print ("GaussianNB:") print (clf.score (X_test) Y_test)) print (clf.predict ([4.4,3.2,1.3,0.2]) clf2 = BernoulliNB () clf2.fit (X_train, y_train) print ("\ nBernoulliNB:") print (clf2.score (X_test, y_test)) print (clf2.predict ([4.4,3.2,1.3,0.2])) clf3 = MultinomialNB () clf3.fit (X_train) Y_train) print ("\ nMultinomialNB:") print (clf3.score (X_test, y_test)) print (clf3.predict ([4.4,3.2,1.3,0.2]))
Output result
GaussianNB: 1.0[0.] BernoulliNB: 0.4666666666666667[1.] MultinomialNB: 1.0[0.]
It can be seen that Gaussian model and polynomial model predict well, but the prediction result of Bernoulli model is poor.
Reason: the data do not conform to the Bernoulli distribution.
At this point, the study of "python simulation naive Bayesian program example analysis" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.