What is the principle of machine learning naive Bayesian classifier? 04/20 Update SLTechnology News&Howtos

What is the principle of machine learning naive Bayesian classifier?

2025-04-20 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces the relevant knowledge of "what is the principle of machine learning naive Bayesian classifier". The editor shows you the operation process through an actual case, and the operation method is simple, fast and practical. I hope this article "what is the principle of machine learning naive Bayesian classifier" can help you solve the problem.

Background introduction

What is a classifier?

Classifier is a machine learning model, which is used to distinguish different objects based on some features.

The principle of naive Bayesian classifier:

Naive Bayesian classifier is a probabilistic machine learning model for classification tasks. The classifier is based on Bayesian theorem.

Bayesian theorem:

Using Bayesian theorem, we can find the possibility that B occurs and An occurs. In this case, B is the evidence and An is the hypothesis. The assumption here is that the predictive variables / features are independent. In other words, the existence of a particular function does not affect other functions. Therefore, it is called simplicity.

For example:

Let's use an example to get better intuition. Think about playing golf. The dataset is shown below.

According to the characteristics of the day, we classify whether the day is suitable for playing golf. Columns represent these functions, and rows represent individual entries. If we take the first row of the data set, we can observe that it is not suitable to play golf if the foreground is rainy, the temperature is high, the humidity is high and the wind is not strong. We make two assumptions here, and as mentioned above, we believe that these predictors are independent. That is, if the temperature is high, it does not necessarily indicate high humidity. Another assumption made here is that all predictive variables have an equal impact on the results. That is, windy days are not more important in deciding whether to play golf or not.

According to this example, the Bayesian theorem can be rewritten as follows:

The variable y is a category variable (playing golf), which indicates whether it is suitable to play golf or there are no given conditions. The variable X represents a parameter / feature.

X is given as:

Here is the xylene 1, the law is 2. X _ n represents these characteristics, that is, they can be mapped to appearance, temperature, humidity and strong winds. By replacing X and using chain rule extension, we get:

You can now get each value by looking at the dataset and replacing it with an equation. For all entries in the dataset, the denominator does not change, but remains the same. Therefore, the denominator can be removed and the proportion can be introduced.

In our example, the class variable (y) has only two results, yes or no. In some cases, the classification may be multiple. Therefore, we need to find the y class with the highest probability.

Using the above function, we can get the class of a given prediction variable.

Types of naive Bayesian classifiers:

Polynomial naive Bayes:

This is mainly used for document classification, that is, whether documents belong to sports, politics, technology and other categories. The feature / predictive words used by the classifier are the frequency of words that appear in the document.

Bernoulli naive Bayes:

This is similar to polynomial naive Bayes, but the predictive variable is a Boolean variable. The parameters we use to predict class variables take only yes or no values, for example, whether words appear in the text.

Gauss naive Bayes:

When the prediction variables take continuous values and are not discrete values, we assume that these values are sampled from the Gaussian distribution.

Gaussian distribution (normal distribution)

Because of the change in the way values are displayed in the dataset, the conditional probability formula is:

Conclusion:

Naive Bayesian algorithm is mainly used for emotional analysis (NLP problem), spam filtering, recommendation system and so on. They are fast and easy to implement, but the biggest disadvantage is that the prediction variables need to be independent. In most real life, predictive variables are interdependent, which will hinder the performance of the classifier.

This is the end of the introduction to "what is the principle of machine learning naive Bayesian classifier". Thank you for your reading. If you want to know more about the industry, you can follow the industry information channel. The editor will update different knowledge points for you every day.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.