Naive Bayes: help AI product manager "small step fast run, fast iteration" 07/04 Update SLTechnology News&Howtos

Naive Bayes: help AI product manager "small step fast run, fast iteration"

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

Bayesian Theorem is believed to have been touched by many people. What is the charm of this theorem, which seems to only belong to the field of mathematics, to the AI product manager?

We often encounter scenes like this. When chatting with a friend, you may not know what he is going to say at first, but after he has said a word, you can guess what he is going to say next. The more information a friend gives, the more we can infer what he wants to say, which is the way of thinking stated in Bayesian Theorem.

Bayesian theorem is widely used because it accords with the natural law of human cognition.

We are not born to know the inherent laws of everything. Most of the time, we are faced with situations where information is insufficient and uncertain. At this time, we can only make a decision with limited resources and make amendments in the light of follow-up developments.

1. Naive Bayes' debut

Bayesian classification is the general name of a class of classification algorithms, which are based on "Bayesian theorem" and on the premise of "independent hypothesis of feature conditions". Naive Bayesian classification is the most common classification method in Bayesian classification, and it is also one of the most classical machine learning algorithms.

Dealing with problems directly and efficiently in many scenarios, so it has a wide range of applications in many fields, such as spam filtering, text classification and spelling error correction. At the same time, for product managers, Bayesian classification is a good starting point for the study of natural language processing.

Naive Bayesian classification is a very simple classification algorithm. It is said that it is very simple because its solution is very simple. That is to say, for the given item to be classified, it is considered that the item to be classified belongs to which category when solving the probability of each category under the condition of this occurrence.

For example, if we see a dark-skinned foreign friend on the street, let you guess where the foreign friend comes from. Nine times out of ten you will guess that it is from Africa, because Africans account for the largest proportion of dark-skinned people, although dark-skinned foreigners may also be Americans or Asians. But when there is no other available information to help us judge, we will choose the category with the highest probability, which is the basic idea of naive Bayes.

It is worth noting that naive Bayesian classification is not a wild guess, nor is it without any theoretical basis. It is a classification algorithm based on Bayesian theory and independent hypothesis of feature conditions.

In order to understand the principle of the algorithm, we first need to understand what is "characteristic condition independence hypothesis" and "Bayesian theorem", and Bayesian theorem involves the concepts of "prior probability", "posterior probability" and "conditional probability".

As shown in the following figure, although there are many concepts, they are relatively easy to understand, which we will describe in detail one by one.

The independent hypothesis of feature condition is the basis of Bayesian classification, which means to assume that each feature in the sample is not related to other features.

For example, in the case of predicting overdue credit card customers, we will make a comprehensive judgment based on the characteristics of different aspects, such as the customer's monthly income, credit card quota, RV and so on. Two seemingly unrelated things may actually be intrinsically linked, just like the butterfly effect. In general, banks also grant higher credit card quotas to higher-income customers.

At the same time, high income also means that the customer is more able to buy real estate, so there is a certain dependence between these characteristics, and some characteristics are determined by other characteristics.

However, in the naive Bayesian algorithm, we will ignore the internal relationship between these characteristics, and directly think that there is no relationship between customers' monthly income, real estate and credit card quota, the three are independent characteristics.

Next, we focus on what is "theoretical probability" and "conditional probability", as well as the difference between "prior probability" and "posterior probability".

Second, true and false probability

First of all, let's do a little experiment.

Suppose a coin of uniform texture is thrown into the air, in theory, because the front and back of the coin are evenly textured, there is a 50% chance that the coin will fall face up or back up when landing. This probability does not change with the increase or decrease in the number of throws. even if you throw 10 times, the result will be face up, and the probability that the next time it will be face up is still 50%.

But in the actual test, if we flip a coin 100 times, the number of heads and tails is not usually exactly 50 times. There may be 40 heads up and 60 tails up, as well as 35 heads and 65 tails.

Only when we keep tossing and tossing thousands of times will the number of heads and tails of coins tend to be equal.

Therefore, when we say that there is a 50% probability of heads facing up and tails facing up, the probability referred to is theoretical objective probability. This ideal probability will be achieved only when the number of throws is close to countless times. Under the theoretical probability, although the coin is flipped 10 times, the probability that the first five times are all heads up and the sixth time is heads up is still 50%.

But in practice, people who have flipped a coin have the feeling that if they face up five times in a row, it is highly likely that the next time it will be negative. How big is it? Is there any way to find out the actual probability?

To solve this problem, a mathematician named Thomas ThomasBayes invented a method to calculate the probability that "another event occurs under known conditions". This method requires us to estimate a subjective a priori probability first, and then adjust it according to the results observed later. As the number of adjustments increases, the real probability becomes more and more accurate.

How do you understand this sentence?

Let's explain the meaning of this sentence through an example of taking the subway. There are 18 stops on Shenzhen Metro Line 1 from Chegongmiao to the terminal. Every morning, Kobayashi passes through five stops from Chegong Temple to work in High-tech Park, as shown in the following picture:

One morning during the rush hour, Kobayashi was blocked by the standing crowd and could not hear the announcement with headphones, so he did not know whether the train had arrived at the high-tech park station.

If he leaves the station directly when the train arrives at the next station, in theory, the probability that he happens to arrive at the high-tech park station is only 1Comp18, and the probability of leaving the opposite station is very small. At this time, Xiao Lin happened to see a colleague in the crowd. He was walking out of the platform.

Kobayashi thought that although he didn't know where his colleague was going, he was obviously more likely to go to the company during the morning rush hour. So after getting this valid information, Kobayashi followed him out of the station and arrived at the high-tech park station-- this way of thinking is the way of thinking expounded by Bayesian theorem.

Third, introduce Bayesian theorem.

In probability theory and statistics, Bayesian theorem describes the possibility of an event, which is speculated on the basis of knowing some information related to the event in advance.

It is assumed that the incidence of cancer is related to everyone's age. If we use the Bayesian theorem, when we know a person's age, it can be used to more accurately assess the probability that he or she will get cancer. In other words, Bayesian theory is to calculate the probability of another event according to the probability of one event.

Mathematically, Bayesian theory can be expressed as follows:

P (B) indicates the probability of occurrence of event B, that is, the probability of Kobayashi to the high-tech park station; P (A) represents the probability of occurrence of event A, that is, the outbound probability of Kobayashi's colleagues; P (B | A) indicates the probability that event B will occur when event A has already occurred, that is, the probability that Kobayashi will arrive at the high-tech park station when his colleague leaves the station. P (A | B) indicates the probability that event A will occur when event B has already occurred, that is, the probability that Kobayashi arrives at the high-tech park station and his colleagues leave the station.

At this point, let's take a look at the Bayesian theorem, which illustrates the relationship between the conditional probabilities of two swaps, which are associated by joint probabilities. In this case, if you know the value of P (A | B), you can calculate the value of P (B | A).

So the Bayesian formula actually illustrates such a thing, as shown in the following figure:

We can use Venn's diagram to deepen our understanding of Bayesian theorem, as shown in the following figure:

In the above example, Xiao Lin happened to see his colleagues leaving the station during the morning rush hour, which represented the emergence of new information. Just as the known black spot in the image above has fallen into region A, since most of the area An intersects with area B, it is more likely that the black spot is also in area B. The result we want to get is actually P (B | A), that is, we want to know what the probability of this random event will be after taking into account some existing factors.

With reference to this probability result, we can make targeted decisions on many things. We need to know P (B), P (A | B) and P (A) at the same time to calculate the target value P (B | A), but the value of P (A) seems to be difficult to obtain.

If you think about it, there seems to be no correlation between P (A) and P (B). They are independent events themselves. No matter whether the value of P (B) is large or small, P (A) is a fixed denominator. That is to say, the possibility of calculating various values of P (A) will not affect the relative size of each result, so the value of P (A) can be ignored.

Suppose that the value of P (A) is mrecovery. the possible value of P (B) is b1, b2 or b3. It is known that:

Then when calculating P (B | A), you will get the results respectively:

And because the sum of P (b1 | A), P (b2 | A) and P (b3 | A) must be 1, ox+py+qz=m can be obtained. It doesn't matter if you don't know the value of m, because the value of ox,py,qz can be calculated, and m will know. The rest of the work is to calculate P (B) and P (A | B), and these two probabilities must be estimated from the data sets we have.

There is an episode about the Bayesian algorithm. After the Bayesian algorithm was invented, no one paid attention to it for nearly 200 years.

Because classical statistics can completely solve the simple probability problem that can be explained objectively at that time, and compared with the Bayesian algorithm that relies on subjective judgment, it is obvious that people at that time are more willing to accept classical statistics based on objective facts. they are more willing to accept the fact that no matter how many times a coin is flipped upside down is 50%.

However, there are still many complex problems in our life that can not predict the probability, such as typhoon, earthquake law and so on. When faced with complex problems, classical statistics are often unable to obtain enough sample data to infer the overall law. It cannot be said that the probability of predicting a typhoon every day is 50%. There are only two cases: coming or not.

The sparsity of data makes Bayesian theorem hit a brick wall frequently. With the rapid development of modern computer technology, a large number of data operations is no longer a difficult thing, Bayesian algorithm has been re-valued by people.

What is the use of Bayesian Theorem

At this point, some readers may ask, although Bayesian theorem simulates the process of human thinking, what kind of problems can it help us solve? Let's start with a classic case that is almost bound to be mentioned when it comes to the Bayesian theorem.

In the field of disease detection, assuming that the infection rate of a disease in all populations is 0.1%, the hospital's existing technology can detect the disease with an accuracy of 99%. In other words, there is a 99% chance that someone is known to be sick, while a normal person has a 99% chance of being tested. If a person is randomly selected from the crowd to be tested, and the test result given by the hospital is positive, what is the actual probability of this person getting sick?

Perhaps many readers will blurt out "99%". But the real probability of getting sick is much lower than this because many readers confuse a priori probability with a posteriori probability.

If you use A to indicate that the person has the disease, and B to indicate that the hospital test result is positive, then P (B | A) = 99% means "the probability that the hospital has detected positive when a person is known to be sick". What we are asking now is "for the randomly selected person, the probability of illness if the test result is known to be positive", that is, P (A | B), which is calculated to be P (A | B) = 9%. So even if it is tested positive by the hospital, the actual probability of the disease is actually less than 10%, and it is very likely to be a false positive. Therefore, it is necessary to revisit and introduce new information in order to be more sure of the diagnosis.

From the above examples, we can see that in life, we often confuse a priori probability with a posteriori probability, so as to get the wrong judgment. Bayesian theorem is to help us sort out the logical relationship between the successive conditions of probability and get a more accurate probability.

In fact, the core idea set forth in this theorem has great implications for the way product managers think:

On the one hand, we need to figure out what the a priori probability is in the requirement scenario. What is the posterior probability? Don't be blinded by the appearance of the data

On the other hand, we can use Bayesian theorem to build a thinking framework in which we need to constantly adjust our view of something and form a more stable and correct view only after a series of new things have been confirmed.

When we have new ideas in mind, in most cases, we can only judge whether a product is reliable on the basis of experience, and no one can tell how strong the response is when we put it into the market.

So many times we need to try, we need to make a simple version and put it on the market to quickly verify our ideas, and then constantly find ways to get "event B". Keep increasing the success rate of new products-so that our products can be successful.

Therefore, "small steps, fast iteration" is the best way to improve the fault tolerance rate.

Http://www.woshipm.com/ai/2850961.html

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.