Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Over-fitting and solution in big data's Machine Learning

2025-01-30 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article will explain in detail the overfitting and solutions in big data machine learning. The content of the article is of high quality, so Xiaobian shares it with you for reference. I hope you have a certain understanding of relevant knowledge after reading this article.

What is overfitting?

Overfitting is a common problem for machine learning projects. What is overfitting?

Wikipedia:

In statistics, overfitting refers to the use of too many parameters in fitting a statistical model. Compared to the amount of data available, a ridiculous model can fit perfectly if it is complex enough. Overfitting can generally be seen as a violation of Occam's razor. When the degrees of freedom of the selectable parameters exceed the information content of the data, this results in the final (fitted) model using arbitrary parameters, which reduces or destroys the ability of the model to generalize more than it adapts to the data. The likelihood of overfitting depends not only on the number of parameters and data, but also on the consistency of the model architecture with the data. It also depends on the number of model errors compared to the expected noise or errors in the data.

The idea of overfitting is also important for machine learning. Usually a learning algorithm is trained by training examples. That is, examples of expected outcomes are knowable. Learners, on the other hand, are expected to arrive at correct results that predict other examples, and therefore should apply to generalized situations rather than just existing data used in training (according to its inductive bias). Learners, however, tend to adapt to overly specific but random features of the training data, especially if the learning process takes too long or too few examples. In the overfitting process, as the performance of the predicted training example results increases, the performance of the application to unknown data changes poorly.

In contrast to overfitting, which means using too many parameters to fit the data rather than the general case, another common phenomenon is using too few parameters to fit the data, which is called underfitting, or underfitting.

This does not expand the description of the phenomenon of underfitting, followed by complement. In general, it is over-learning, rote learning, prediction of training data is very accurate, but when faced with new problems, generalization ability is not good, unable to make correct predictions.

The green line represents the overfitted model and the black line represents the regularized model. Although the green line perfectly matches the training data, it is too dependent and has a higher error rate for new test data than the black line.

Zhihu

There is a post on Zhiwu: Describe "overfitting" in simple and understandable language?

Overfitting is actually a situation where machine learning fails to find the correct law, so to understand what overfitting is, you first have to understand why machine learning can find the correct law.

extent applicable

The problems encountered in practice, training and testing curves are as follows:

You can see that training losses have been falling, but test losses first fall and then rise.

solutions

In statistics and machine learning, to avoid overfitting, extra tricks are needed to indicate when there will be more training without leading to better generalization. Specifically, there are several ways:

Get more data;

Use of appropriate models;

Combining multiple models;

Bayesian method;

After increasing the training data, add earlytopping, and the curve is slightly better.

About big data machine learning overfitting and solutions to share here, I hope the above content can be of some help to everyone, you can learn more knowledge. If you think the article is good, you can share it so that more people can see it.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report