In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
Big data, what is the linear solution of the Communist Party of China? I believe many inexperienced people are at a loss about it. Therefore, this paper summarizes the causes and solutions of the problem. Through this article, I hope you can solve this problem.
Introduction
In linear regression analysis, the problem of collinearity is easy to occur in independent variables, that is, explanatory variables. When the collinearity problem occurs, the symbol of the regression coefficient may be completely opposite to the actual situation, and the independent variable that should be significant is not significant, but the independent variable that is not significant is significant; the collinearity problem will lead to serious deviation or even the opposite conclusion of the data study, so it is necessary to solve this problem.
The cause of collinearity
The multicollinearity problem means that the change of one explanatory variable causes the change of another explanatory variable. If there is a strong linear relationship between the independent variables x, other variables cannot be fixed, and the real relationship between x and y cannot be found.
Generally speaking, collinearity means that when the independent variable X (explanatory variable) affects the dependent variable Y (explained variable), there is a strong correlation between multiple Xs, that is, there is a strong substitution between X. this leads to the problem of collinearity.
Test of multiple collinearity
When regression analysis, directly check the VIF value, if all less than 10 (strictly 5), then the model does not have multiple collinearity problem, the model is well constructed; on the contrary, if VIF is greater than 10, the model construction is poor.
Correlation analysis can also be done directly. If the correlation coefficient of two independent variables X (explanatory variable) is greater than 0.7, there may be a strong collinearity problem.
Solution method
There are five solutions to the collinearity problem.
1. Remove collinear independent variables manually
First do the correlation analysis, if it is found that the correlation coefficient of two independent variables X (explanatory variable) is greater than 0.7, then remove an independent variable (explanatory variable), and then do regression analysis. However, there is a small problem with this method, that is, sometimes it is not desirable to remove an independent variable from the model at all. if there is such a situation, consider using stepwise regression to allow the software to remove it automatically. at the same time, a better way may be to use ridge regression for analysis.
two。 Stepwise regression method
Let the software automatically select and eliminate the independent variables, and the stepwise regression will automatically eliminate the collinear independent variables. The problem with this solution is that the algorithm may eliminate independent variables that you do not want to eliminate, and if such a situation occurs, it is best to use ridge regression for analysis.
3. Increase the sample size
Increasing the sample size is a way to explain the collinearity problem, but it may not be suitable in practice, because it takes cost and time to collect the sample size.
4. Ridge regression
The first and second solutions mentioned above are often used in practical research, but the problem is that if some independent variables do not want to be eliminated in the actual research, some independent variables are very important and can not be eliminated. At this time, only Ridge return may be the most suitable. Ridge regression is the most effective interpretation method to solve the collinearity problem at present, but the analysis of Ridge regression is relatively complex.
5. Merging variables by factor analysis
The way to explain the collinearity problem is that in theory, we can consider using factor analysis (or principal component analysis) and using mathematical transformation to reduce the dimension of the data and extract it into several components, that is, to concentrate the information. finally, the condensed information is used as independent variables (explanatory variables) into the model for analysis. This method of interpretation is feasible and effective in theory. However, there will be a problem in the actual research, that is, after the factor analysis (or principal component), it becomes component 1 and component 2, which is completely inconsistent with the actual research situation. as a result, the train of thought of the whole research will be changed, so this method is suitable for exploratory research, but not suitable for actual confirmatory research.
Treatment principle
1. Multicollinearity is common, mild multicollinearity problems can not be taken, if the VIF value is greater than 10 indicates that the collinearity is very serious, this situation needs to be dealt with, if the VIF value is below 5 does not need to be dealt with, if the VIF is between 5 and 10 depending on the situation.
two。 Serious multicollinearity problems can generally be found according to experience or through the analysis of regression results. Such as the influence coefficient symbol, the important explanatory variable t value is very low. Necessary measures should be taken according to different circumstances.
3. If the model is only used for prediction, the problem of multicollinearity can not be dealt with as long as the degree of fitting is good. When there is a multicollinearity model for prediction, it often does not affect the prediction results.
After reading the above, have you mastered big data's solution to the linearity of the CPC? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.