How to understand EM algorithm 02/15 Update SLTechnology News&Howtos

How to understand EM algorithm

2026-02-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article introduces the relevant knowledge of "how to understand the EM algorithm". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

As we all know, maximum likelihood estimation is a widely used parameter estimation method. For example, I have some data on the height of northeastern people, and I know that the probability model of height is Gaussian distribution, then the method of maximizing likelihood function can be used to estimate the two parameters of Gaussian distribution, mean and variance. This method basically all probability textbooks will talk about, I will not say more, not clear please Baidu.

However, now I am faced with this situation. The data I have is a collection of the height of people from Sichuan and Northeast China. However, for each of the specific data, it is not clear whether it comes from "Northeast people" or "Sichuan people". I think if you draw the probability density of this data set, it will look like this:

All right, stop complaining. I've worked very hard to draw it like this.

In fact, the probability density function of this bimodal is modeled, called Gaussian mixture model (GMM).

However, we do not know the expression of p (x; θ), some students said that I know, ah, is not the above mixed Gaussian model? It's just that there are a little more parameters.

Think about it carefully, the θ in GMM is composed of Sichuan people and Northeast people. If you want to estimate the average height of Sichuan people, you will take Sichuan and Northeast people into account if you directly use GMM as the likelihood function, which is obviously not appropriate.

Another idea is to consider hidden variables. It would be nice if we already know which samples are from Sichuan and which are from the northeast. If Z is used to mark the population from which the sample comes from, Z is the hidden variable and the likelihood function that needs to be maximized becomes:

The equal sign holds if and only if X is a constant.

If f (X) is a concave function, the unequal sign is reversed.

With regard to this inequality, I am neither going to prove it nor explain it. I hope you will admit that it is correct.

Halfway to kill a Jensen inequality, it is necessary to use it to solve the above dilemma, otherwise what it does. If we can not directly maximize the likelihood function, then if we can find a tight lower bound of the likelihood function and optimize it all the time, and ensure that the total likelihood function can be increased all the time in each iteration, it is the same. How to say that ? Draw a picture and you'll see:

There are many formulas in these sentences. Instead of tapping them one by one, I will just take a screenshot of the content in my PPT:

With this step, let's take a look at the whole formula:

And because Q (z) is the distribution function of z, so:

Getting Q (z), which is done, means that Q (z) is p (zi | xi), or written as p (zi), which represents the probability that the first data comes from zi.

So the EM algorithm comes out, and it does this:

First, initialize the parameter theta

(1) E-Step: calculate the probability that each sample belongs to zi according to the parameter θ, that is, the probability that the height comes from Sichuan or Northeast China. This probability is Q

(2) M-Step: according to the calculated Q, the lower bound of the likelihood function with θ is obtained and the new parameter θ is obtained.

Repeat (1) and (2) until it converges, and you can see that ideologically, it's no different from the beginning, but it's hard to maximize the likelihood function directly, and the curve saves the nation.

As for why such iterations will ensure that the likelihood function is monotonous, that is, the proof of the convergence of the EM algorithm, I will not write it first and consider it later when I have time. It needs to be noted that the EM algorithm is convergent in general, but it is not guaranteed to converge to the global optimization, that is, it is possible to enter the local optimization. EM algorithm has been applied in mixed Gaussian model and hidden Markov model, and it is one of the ten famous data mining algorithms.

This is the end of the content of "how to understand EM algorithm". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.