Data analysis: OLS regression analysis 07/02 Update SLTechnology News&Howtos

Data analysis: OLS regression analysis

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

There is a correlation between variables, for example, there is a relationship between height and weight, generally speaking, people are taller and heavier, and there is an uncertain correlation between height and weight. Regression analysis is a mathematical tool to study the correlation, which can help us to estimate the value of one variable from the value area of another variable.

OLS (least square method) is mainly used for parameter estimation of linear regression, its idea is very simple, that is to find some values that minimize the sum of squares between the actual value and the model estimation, and take it as the parameter estimation value. That is to say, the best function match of the data is found by minimizing the sum of the square of the error. The unknown data can be easily obtained by using the least square method, and the sum of squares between the obtained data and the actual data is minimized. The least square method can be used for curve fitting, and some other optimization problems can also be expressed by minimizing energy or maximizing entropy.

First, OLS regression

The OLS method predicts the response variable through a series of prediction variables (it can also be said to regression the response variable on the prediction variable). Linear regression refers to a regression model in which the parameter β is linear (that is, the parameter appears only in the form of a power):

Yt= α + β xt+ μ t N) represents the number of observations

Yt is called a dependent variable.

Xt is called an independent variable.

α and β are the parameters that need to be determined by least square method, or regression coefficient.

μ t is a random error term.

The basic principle of OLS linear regression: the optimal fitting curve should minimize the sum of squares (that is, the sum of squares of residuals, or RSS) of the distance from each point to the straight line.

The goal of OLS linear regression is to obtain the model parameters (intercept term and slope) by reducing the difference between the real value and the predicted value of the response variable, that is, to minimize the RSS.

In order to properly interpret the coefficients of the OLS model, the data must meet the following statistical assumptions:

Normality: for fixed values of independent variables, the values of dependent variables are positively distributed.

Independence: independence between individuals

Linear correlation: linear correlation between dependent variables and independent variables

Homoscedasticity: the variance of the dependent variable does not change with the level of the independent variable, that is, the variance of the dependent variable is constant

Second, use lm () to fit the regression model.

In R, the most basic function of fitting regression model is lm (), and the format is:

Lm (formula, data)

Symbolic comments in formula:

~ Segmentation symbol, with dependent variables on the left and independent variables on the right, for example, z~x+y, which indicates that z is predicted by x and y

+ Segmentation prediction variables

: represents the interaction of prediction variables, for example, z~x+y+x:y

* represents all possible interactions, for example, z~x*y expands to z~x+y+x:y

^ represents the number of interactive items, for example, z ~ (xroomy) ^ 2, expanded to z~x+y+x:y

. Indicates that it contains all variables except dependent variables. For example, if there are only three variables, x _ journal y and z, then the code zonal. Expand to z~x+y+x:y

-1 delete the intercept item and force the regression line to pass through the origin

I () interprets the expression in parentheses from an arithmetic point of view. For example, the fitting formula represented by z~y+I (x ^ 2) is z=a+by+cx2.

Function can apply mathematical functions to expressions, for example, log (z) ~ x _ y

For the fitted model (the object returned by the lm function), you can apply the following function to get more additional information about the model.

Summary () shows the detailed results of the fitting model

Coefficients () lists the parameters of the pinching model (intercept term intercept and slope)

Confint () provides the confidence interval of the model parameters.

Residuals () lists the residual values of the fitting model

Fitted () lists the predicted values of the fitting model

Anova () generates an analysis of variance table for fitting the model.

Predict () uses fitting model to predict response variables for new data.

Learning communication group 483787113; enter the group code cherry

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.