How to calculate the area of Batman? The song of Makabaka sung by my roommate inspired me. 04/19 Update SLTechnology News&Howtos

How to calculate the area of Batman? The song of Makabaka sung by my roommate inspired me.

2025-04-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Shulou(Shulou.com)11/24 Report--

For most people, Bayesian statistics are probably just a concept you've heard of. As one of its iconic methods, Markov chain Monte Carlo method is more mysterious. Although this method involves a huge amount of computation, the basic principle behind it can be expressed intuitively. This is what this article wants to present to you.

So, what is the Markov chain Monte Carlo (Markov Chain Monte Carlo,MCMC) method? To put it simply:

MCMC method is a method that approximates the posterior distribution of the parameters of interest by random sampling in the probability space.

I don't understand? Don't be afraid, I will explain this simple expression in this article, but without any mathematical deduction!

First of all, some terms are introduced. The parameters we are interested in are some numbers, that is, what quantity we are interested in. Usually we use statistical methods to evaluate the parameters. For example, if we want to know the height of adults, the parameter we may be interested in is the average height (inches). Distribution is a mathematical representation of the possible values of parameters and the probability of values. The best-known example is the bell curve:

From M. W. Toews, if we use Bayesian statistics, we will have a better understanding of distribution. In addition to understanding the distribution simply as the value of a parameter and the possibility that these values take real values, Bayesian statistics also think that the distribution describes our expectations for a parameter, that is, how the data is expected and the corresponding possibility before seeing the real measured data. Therefore, the bell line above shows that we are quite sure that the value of the parameter is close to 0, but to some extent we think that the probability that the real value is greater than or less than 0 is the same.

In this way, human height does follow a normal curve, if we believe that the true average human height follows the following bell curve:

Obviously, people with the cognition shown in the picture may live in the land of giants because they think the average adult height is 6 feet 2 inches (about 188cm).

Let's imagine this person going to collect some data, and he found some people who are 5 to 6 feet tall. We can present the data as follows, and another normal curve provides the best explanation for the data.

In Bayesian statistics, the distribution that represents our expectation of a parameter is called a priori distribution (prior distribution), because this understanding exists before we see a set of real data. On the other hand, the likelihood distribution (likelihood distribution) summarizes the information presented to us by the observed data by presenting the range of parameters and the probability of values. Evaluating the parameter values that maximize the likelihood distribution is to answer the following question: which parameter values are the most likely to occur in the data we have observed? Without a priori knowledge, we may stop here.

However, the key of Bayesian statistics is to synthesize prior distribution and likelihood distribution to give a posteriori distribution (posterior distribution). This tells us what parameters are most likely to be observed if a priori cognition is taken into account. In our example, the posterior distribution looks like this:

In the image above, the red line represents a posterior distribution. You can think of it as some kind of average of a priori and likelihood distribution. Because the prior distribution is shorter and wider, it means that we expect the average human height to have greater "uncertainty". However, the likelihood distribution summarizes the data distributed in a narrow range, so it represents a stronger "certainty" for the real parameter values.

When the a priori and likelihood distributions are taken into account, the a posteriori distribution is very close to the likelihood distribution, that is, the fragile prior belief of the person who grew up among giants seems to be influenced by the data. Although this man still believes that the average human height is a little higher than the data tell us, he has been largely persuaded by the data.

In the case of two bell-shaped curves, it is easy to solve the posterior probability. The two can be combined with a simple equation. But what if our prior and likelihood distributions don't look so perfect? Sometimes it is more accurate to use a distribution of irregular shapes to describe our data. What if we need to use bimodal distribution to describe our data, and our prior distribution has an odd shape? For example, in the following example, the prior distribution is deliberately painted ugly:

Visualization rendered in Matplotlib, enhanced with MS Paint, as before, a posteriori distribution exists, which gives the possibility of a value for each parameter. But it is difficult to see what it looks like, and it is difficult to give an analytical solution. So the MCMC method came on stage.

The MCMC method allows us to estimate the shape of a posteriori distribution when it cannot be calculated directly. In order to understand how it works, I will first introduce the Monte Carlo simulation and then discuss the Markov chain.

Monte Carlo simulation is a method to estimate specific parameters by repeatedly generating random numbers. By using the generated random number and doing some calculations on it, we can get the estimated value of the parameter, which is impossible or very expensive to calculate directly.

Suppose we want to estimate the area of the following circle:

Because the circle is the inscribed circle of a square with a side length of ten inches, it is easy to calculate that its area is 78.5 square inches. To change the way of thinking, we can randomly scatter points in the square space. Then we count the proportion of the points in the circle and multiply it by the area of the square. The number obtained is very close to the area of the circle.

Because 15 of the 20 points fall inside the circle, it looks like the area of the circle is about 75 square inches. It seems that even with only 20 points, the Monte Carlo method can get a good answer.

Now, imagine a scenario where we need to calculate the graphic area corresponding to the Batman equation:

How to find the area of this shape? We never learned it! So it seems like a very difficult task. However, by randomly placing points in the rectangular space containing the figure, the Monte Carlo simulation can easily give us an approximate answer!

Monte Carlo simulation can not only be used to estimate the area of a complex shape. By generating many random numbers, they can be used to model a complex process. In fact, they are used to forecast the weather or to assess the likelihood of winning an election.

To understand the MCMC method, the second element we need to understand is the Markov chain. This is a series of events that are interrelated in probability. Each event is caused by a series of results, and each result determines what happens next based on a fixed set of odds.

An important feature of Markov chain is no memory: at the present moment, any information needed to predict the next event is known, and tracing back to history will not bring new information. Games such as "Chutes and Ladders" exhibit this memoryless or Markov attribute. Few events in the real world work in this way, but the Markov chain is still a powerful weapon for us to understand the world.

In the 19th century, it was observed that the bell curve was a common pattern in nature. (for example, we have noticed that human height follows a bell curve. When using a Galton board, people simulate the average value of repeated random events by throwing marbles on the board with nails, and the normal curve is reproduced in the distribution of marbles:

Russian mathematician and theologian Pavel Nekrasov believes that the bell curve and the more general law of large numbers are the products of children's games and trivial puzzles, in which each event is completely independent. He believes that interdependent events in the real world, such as human behavior, do not conform to beautiful mathematical patterns or distributions.

Andre Markov (Andrey Markov) is trying to prove that non-independent events may also fit a pattern, and the Markov chain is named after him. One of his representative jobs is to calculate thousands of two-character pairs in a Russian poem. Using this word pair, he calculated the conditional probability of each word. That is, given a preceding letter or white space, the next letter has a certain chance of being A, T, white space, or other characters. Using these probabilities, Markov has the ability to simulate an arbitrarily long character sequence. This is a Markov chain. Although the first few characters are largely determined by the choice of starting characters, Markov indicates that the distribution of characters takes long enough to form a pattern. Therefore, even interdependent events conform to an average as long as they are affected by a fixed probability.

To take a more practical example, imagine you live in a house with five rooms. You have a bedroom, bathroom, living room, dining room and kitchen. Let's collect some data: assuming which room you are in at any given point in time, we need to know which room you might enter next. For example, if you are in the kitchen, you have a 30% chance to stay in the kitchen, 30% chance to enter the dining room, 20% chance to enter the living room, 10% chance to enter the bathroom, and 10% chance to enter the bedroom. Using a set of probabilities for each room, we can build a prediction chain to predict which rooms you might be in next.

If we want to predict where someone in the house will be after entering the kitchen, it may be useful to predict the next few states. However, since our prediction is only based on the observation of a person's position in the house, we have reason to think that this prediction is not perfect. For example, if someone goes from the bedroom to the bathroom, they are more likely to go straight back to the bedroom than from the kitchen. So Markov attributes usually do not apply to the real world.

However, thousands of iterations with Markov chains do give you a long-term prediction of which room you might be in. More importantly, this prediction is completely independent of the room in which the person starts! Intuitively, this makes sense: in order to simulate and describe where a person might be in the long term or in general, his behavior at some point in the house is not important. Therefore, it is not entirely reasonable for Markov chain to predict the behavior of a random variable in several time steps, but as long as we know the probability that dominates its behavior, it can be used to calculate the long-term trend of the variable.

Having learned something about Monte Carlo simulations and Markov chains, I hope to visually show how the MCMC method works without mathematical deduction.

In retrospect, we are trying to estimate the posterior distribution of the parameters we are interested in, that is, the average human height.

I'm not a visualization expert, and obviously I'm not good at keeping my examples within common sense: my posterior distribution examples seriously overestimate the average human height.

We know that the information of a posteriori distribution is contained in the range of our prior distribution and likelihood distribution, but for whatever reason, we can not calculate it directly. Using the MCMC method, we will effectively take samples from a posteriori distribution, and then calculate statistical data such as the average of the samples taken.

First, using the MCMC method selects a random parameter value. The simulation will continue to generate random values (this is the Monte Carlo part), but follow some rules to determine what is a good parameter value. The trick is that we have a priori knowledge of the data, so for a pair of parameter values, you can evaluate which is the better parameter value by calculating which parameter values better interpret the data. If a randomly generated parameter value is better than the previous one, it is added to the parameter value chain, and its probability is determined by its "good" degree (this is the Markov chain part).

To explain this intuitively, let's recall that the height distributed over a certain value represents the probability of observing that value. Therefore, it can be considered that our parameter values (X axis) show high and low probability regions, shown on the Y axis. For a single parameter, the MCMC method first samples randomly along the x-axis:

The red point is the random parameter sample, because the random sample obeys the fixed probability, they tend to converge to the region where we are most interested in the parameter probability after a period of time.

The blue dot only represents a random sample after any point in time, at which time convergence is expected. Note: I stack the dots vertically purely to illustrate the point.

After convergence occurs, MCMC sampling produces a set of points, which are samples of a posteriori distribution. Draw a histogram around these points and calculate any statistics you want.

Any statistic calculated on the sample set generated by MCMC simulation is our best guess on the real a posteriori distribution.

The MCMC method can also be used to estimate the posterior distribution of more than one parameter (such as a person's height and weight). For n parameters, there are regions with high probability in n-dimensional space, in which some parameter values can better explain the observed data. Therefore, I think the MCMC method is a process of random sampling in a probability space to approach a posteriori distribution.

Review the question "what is the Markov chain Monte Carlo method?" A short answer to this question:

The MCMC method is a method to approximate the posterior distribution of the parameters of interest by random sampling in the probability space.

I hope I have explained this sentence clearly, that is, why you use MCMC methods and how they work. The inspiration for this article comes from my lecture on data science in Washington, USA. The purpose of that speech was to explain the Markov chain Monte Carlo method to a non-professional audience, and I tried to do the same in this article.

This article comes from the official account of Wechat: Institute of Physics, Chinese Academy of Sciences (ID:cas-iop), author: B, translator: Yun Kai Ye Luo, revision: Nothing

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.