Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

[machine learning] (3) degree of fit and maximum likelihood estimation

2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Network Security >

Share

Shulou(Shulou.com)06/01 Report--

After a general understanding of the classification of machine learning algorithms (supervised, unsupervised and reinforcement learning) and gradient algorithms, today we learn about the degree of fit and maximum likelihood estimation.

1. The fitting degree of the least square method

A typical application of supervised learning is the regression problem, and the basic one is linear regression, that is, a straight line is used to approach the training set. The least square method is to determine the function curve with the best fitting degree according to the existing training set samples. However, because what kind of curve to choose is artificially determined, and different curves have different properties, the fitting degree of different function models using least square method is different. Taking an m-sample house price and size data M as an example, we can choose linear regression (simulated by a straight line) or a cubic curve (with upper and lower peaks). But the best fitting may be a quadratic curve (parabola). For a training set whose distribution is similar to parabola, the linear fitting is obviously "underfitting", while the cubic curve is "overfitting", and the effect is not as good as that of parabola. Therefore, even if it is the regression of supervised learning, there is also a grasp of the degree of fit, which depends very much on the experience of the researchers themselves. The method of fitting this kind of function model by using the least square method is called parametric mathematics learning, and its main point is that there is already a judgment about the function model before training and learning (the number of parameters is determined). The training set is very complex, so it is difficult for us to directly assume a model, so the number of parameters may change dynamically with the sample set, and this kind of problem is called nonparametric learning. Our method is to use locally weighted regression.

2. Locally weighted regression

For the linear regression problem LR, for a given hypothetical function H (X, θ), our goal is to find θ such that the square of (H (X, θ)-Y) is minimized, that is, for the known training set M, the deviation between H (X, θ) and the sample is minimized, and finally θ is returned.

For the local weighted regression LWR, finding θ makes the value minimum, and the significance of the weight is that when we test a new sample value, some sample training sets closest to the test attribute will play a role, with a larger weight, while the sample value farther away from this position has less influence. Therefore, the method of local weighted regression is to fit only the training samples near the new numerical position each time, and each calculation needs to fit all the training sets.

Third, maximum likelihood probability

The above algorithm can be deduced by maximum likelihood probability, but it will not be proved here because it involves a lot of mathematical formulas. Take this opportunity to review the knowledge of maximum likelihood probability. Maximum likelihood probability can be used to solve the regression of nonparametric models. The main idea is that the probability function H (X, θ) with parameters is regarded as a function of θ. When X is known, it means that m samples are randomly selected from all samples, assuming that they are all independent. Then the probability of my random extraction of these m samples from a sample set should be their probability product P (θ). If there is such a functional hypothesis model, then the parameter θ in this model should maximize the value of P, that is, it is most possible to re-extract the m samples. Then use this likelihood estimate to replace the real theta.

What is said here is too simple. For more information, please refer to the article by CSDN blogger: summary of maximum likelihood estimation.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Network Security

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report