In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
1. introduced
Regression analysis is a subfield of supervised learning. Its purpose is to model the relationship between a certain number of features and a succession of target variables.
In regression problems, we try to give a quantitative answer that predicts house prices or how long someone watches videos.
2. Simple linear regression: fitting a line through data
Regression algorithms model the relationship between a single feature (explanatory variable x) and its corresponding value (target variable y) through a set of "points."
This is done by setting an arbitrary line and calculating the distance from that line to the data point. This distance, the vertical line, is also the residual or prediction error of the data.
Regression algorithms continually "pick" lines during each iteration to find the best-fit line, i.e., the line with the smallest error.
We can accomplish this task through several techniques.
2.1 Move the line
2.1.1 skills one
When there is a point and a line, our goal is to get the line closer to the point. Here we will use a parameter called "learning rate." The purpose of using learning rates is to make the line approach the point better.
In other words, the learning rate will determine the length of the distance between the line and the point in each iteration. It is usually represented by α.
2.1.2 Tip 2
Tip 2 is based on the premise that if there is a point (compared to the current point) closer to the line and the distance is smaller, the line will move to that point.
3. gradient descent
Suppose we have a set of points and we want to develop an algorithm to find the line that best fits the set of points. As mentioned earlier, the error is the distance from a straight line to a point.
We have to design different lines and calculate the error. This process is repeated over and over again, reducing errors until you get the perfect straight line. This perfect straight line has the smallest error.
To minimize errors, we will use gradient descent. With gradient descent, we can observe different directions of linear movement at each step to reduce errors.
Note: Gradient (f) is a vector field. When evaluated at a generic point in the f-field, it indicates the direction in which the f-field changes faster.
So gradient descent moves one step in the direction of negative gradient.
When the algorithm goes through enough steps, it will eventually reach a local or global minimum. It should be emphasized that if the learning rate is too high, the algorithm loses its minimum because its steps are too large. Of course, if the learning rate is too low, it will take an infinite amount of time to find this minimum.
4. mini-batch gradient descent
4.1 Batch Gradient Descent
We square all the data points, get some values to add to the model weights, add them, and update the weights with the sum of those values.
4.2 stochastic gradient descent
We can do gradient descent point by point.
4.3 Gradient Descent in Practical Applications
In practice, neither of these methods is used because they are computationally slow. The best way to perform linear regression is to divide the data into lots. Each batch has approximately the same number of points. Each batch is then used to update the weights. This method is known as the small batch gradient descent method.
5. high-dimensional
When we have an input column and an output column, we are dealing with a two-dimensional problem and the regression is a straight line. The prediction will be a value consisting of an argument and a constant.
If we have more input columns, that means more dimensions, and the output will no longer be a line, but a "value" beyond two dimensions (depending on the number of dimensions).
6. multiple linear regression
Independent variables are variables that we use to predict other variables. The variable we are trying to predict is called the dependent variable.
When we discover that the outcome we are trying to predict depends on more than just variables, we can build a more complex model to consider the problem. Provided they are relevant to the problem at hand. All in all, using more predictors can help us get better results.
As shown below, this is a simple linear regression:
The following graph is a multiple linear regression plot with two characteristics.
As we add more arguments, our problem is no longer confined to a two-dimensional plane. But the problem is that visualization is harder. But the core idea has not changed fundamentally.
7. Some Suggestions on Linear Regression
Linear regression is not always appropriate.
a) Linear regression works best when the data is linear:
It generates a straight line from the training data. If the relationships in the training data are not truly linear, you will need to make adjustments (transform the training data), add features, or use other models.
b) Linear regression is sensitive to outliers:
Linear regression tries to find the best line in the training data. If the dataset has some values that do not fit the general pattern, the linear regression model is heavily influenced by outliers. We have to be careful with these outliers and eliminate them in a reasonable way.
To handle outliers, I recommend using the Random Sample Consensus Algorithm (RNASAC), which fits the model to a subset of outliers in the data. The algorithm performs the following steps:
A random number of samples were selected as outliers and fitted to the model. All other data points are tested against the fitted model and data points belonging to the user-selected values are added. Repeat the fit of the model with the new points. Calculate the error of the fitted model to the outliers. If the performance meets the user's requirements or reaches a certain number of iterations, the algorithm ends. Otherwise, go back to step one. Repeat the above steps.
8. polynomial regression
Polynomial regression is a special case of multivariate linear regression analysis in which the relationship between the independent variable x and the dependent variable y is modeled as the nth polynomial of x. In other words, when our linear data distribution is complex, we use a linear model to generate a curve to fit the nonlinear data.
Independent (or explanatory) variables derived from polynomial expansions of predictor variables have been used to describe nonlinear phenomena such as the growth rate of organization members and the progression of disease epidemics.
9. regularization
Regularization is a common way to deal with overfitting. It is mainly realized through the following technologies:
Reduce the size of the model: Reduce the number of learnable parameters in the model, and their ability to learn. Our goal is to strike a balance between overcapacity and undercapacity. Unfortunately, there is no magic formula to determine this balance, which must be tested and evaluated by setting different numbers of parameters and observing its performance.
Add Weight Regularization: In general, the simpler the model, the better. Because a simple model has a low probability of overfitting. The usual approach is to constrain the complexity of the network by forcing its weights to take only small values and regularizing the distribution of the weights. This is achieved by adding costs associated with having larger weights to the loss function of the network. Cost comes in two ways:
L1 Regularization: Cost is proportional to the square of the weight coefficient value.
L2 regularization: Cost is proportional to the square of the weight coefficient value.
To determine which of these applies to our model, we recommend that you keep the following in mind and consider the specific nature of the problem:
λ parameter: It is the error calculated by regularization. If we have a large λ, then we are "penalizing" complexity and will end up with a simpler model. If we have a small λ, we end up with a complex model.
10. evaluation index
In order to track the performance of the model, we need to set some evaluation metrics. The measure of the evaluation metric is the error from the generated line to the real point, which minimizes the function by gradient descent.
When dealing with linear regression, you may face the following problems:
10.1 Mean absolute error:
Mean absolute error is the average of the absolute differences between real data points and predicted results. If we do this, each step of gradient descent reduces the mean absolute error value.
10.2 Mean Square Error:
Mean Square Error (MSE) is the average of the squared differences between actual data points and predicted results. The greater the penalty, the greater the distance.
If we follow this as a policy, each step of gradient descent will reduce MSE. This will be the preferred method for calculating the best fit line, also known as ordinary least squares.
10.3 Coefficient of certainty or R squared
The coefficient of determination can be understood as a normalized version of MSE, which provides a better explanation of model performance.
Technically,R squared is the fraction of variance captured by the model, in other words, it's variance. It is defined as:
11. other algorithms
Although this article focuses primarily on linear and multivariate regression models, almost every algorithm in Sci-kit learn, the popular machine learning library, works. Some of them even produce very good results.
Some examples:
Decision Tree Regression Random Forest Regression Support Vector Regression Lasso Algorithm Elastic Network Gradient Lifting Regression Ada Boost Regression
12. conclusion
In this article, we have covered the basics of regression models, how they work, common problems, and how to handle them. We also learned what the most common evaluation metrics are.
by Victor Roman
Source: towardsdatascience.com/supervised-learning-basics-of-linear-regression-1cbab48d0eba
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.