In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
This article is about how to use machine learning regression model to predict house prices in Python AI. The editor thinks it is very practical, so I share it with you. I hope you can get something after reading this article. Let's take a look at it.
First, regression prediction
Today we will take a specific look at the use of machine learning algorithms for regression prediction.
Regression prediction is mainly used to predict the continuous value attributes associated with the object to get numerical prediction data. The application scenarios of regression prediction include all kinds of price prediction, correlation response prediction and so on.
Next, we will use the sklearn module to demonstrate how to do regression forecasting with a Boston house price data set integrated in sklearn.
Second, Boston house price forecast 1. Introduce data set
The datasets built into sklearn are all under the datasets submodule, so we can import them directly:
After import, look at the contents of the dataset:
There are five keys. Literally speaking, we can find that data is all the data features in the data set, DESCR is the description information of the data set, target is the target value corresponding to the data set features, and feature_name is the name of the data feature.
Let's first look at the names of the data features:
We can find 13 features in the dataset, and it is not clear what each feature means, but we can find the specific meaning in the DESCR description:
Finally, you can know that the meaning of each data feature is as follows:
CRIM: per capita crime rate
ZN: the proportion of residential land exceeding 25, 000 square feet
INDUS: the proportion of non-retail business in cities and towns
CHAS: River demarcation
NOX: nitric oxide concentration
RM: average number of rooms in the house
Percentage of homeowners built before AGE:1940
DIS: the weighted distance from the five job centers in Boston
RAD: reachability Index of Radial Highway
TAX: full property tax rate per $10, 000 property
PTRATIO: the proportion of teachers in urban and rural areas
B: the proportion of blacks
LSTAT: the proportion of people at the lower level
MEDV: the median house price
Let's take a look at the specific data characteristics and target data values:
Data feature shape:
Overview of data characteristics:
Data features have a total of 506 rows and 13 columns, which correspond to 13 data features. And the target data value is exactly 506:
It's equivalent to the data that has been processed.
But for beginners, this kind of processed data doesn't look very intuitive. We can use Pandas to restore it to DataFrame form, and we can visually see what the dataset looks like:
For students who are familiar with using pandas, this way to see if this data set is much more intimate.
All right, we can go straight to the next step.
2.Segmentation of training data and test data
In order to test the effectiveness of the machine learning model we created and trained, it is necessary to divide the data set into training set and test set.
In sklearn, there is also a special interface method for dividing the training set and test set of the data set-train_test_split, which we first import:
Then pass in our dataset data and set the test set to 15%:
3. Select a regression algorithm estimator
In sklearn, all machine learning algorithms are presented in the form of "estimators", each estimator is a class, and the machine learning model is created by instantiating an estimator's class. For example, the algorithm estimator of linear regression:
Every algorithm estimator, whether supervised learning algorithm or unsupervised learning algorithm, has a fit () method that receives training data sets for training data, such as this:
At the same time, each algorithm estimator uses a predict () method to receive data for prediction, such as this:
In sklearn, the API design of all kinds of machine learning algorithms is divided according to the algorithms for different purposes. We can easily call the algorithms we want. Each supervised learning algorithm is composed of a separate sub-module, which contains the specific classes of the algorithm, such as generalized linear regression algorithm:
According to the purpose, some algorithms can be subdivided into estimators for regression and estimators for classification.
Here, we choose the estimator of the random forest algorithm for regression:
First, import the algorithm estimator:
Then, instantiate the random forest regression estimator, set the parameters of the algorithm and pass the training set into the training set:
After training the model, the predict () method is used to predict the training set:
In this way, we have the data that uses the random forest regression model to predict the test set, which is an one-dimensional array, which we can print directly to see:
Remember when we split the training test set, there was an array of y_test, right? It is the test set x_test data characteristics corresponding to the correct house price results, let's also take a look at its data:
How do you compare the accuracy of the values of the correct array and the predicted array? One stupid way to do this is to compare the two arrays to see the difference:
If the test data set is small, it does not seem to be a lot of work, and if the test set is large, there is no way. Fortunately, evaluation methods for models are also provided in sklearn, and all evaluation methods are integrated in the sklearn.metrics sub-module. For the regression model, we can use mean absolute error MAE and mean square error MSE as well as R2 scores to evaluate the regression model:
Then pass in the prediction array and the correct array in the evaluation method:
Finally, our average absolute error, mean square error and R2 score are obtained. Among them, the optimal values of R2 fraction theory, average absolute error theory and mean square error theory are 1, 0.0 and 0.0 respectively.
Well, what is the effect of this machine learning random forest regression model? You can make your own evaluation, or you can call other regression models to test to see which algorithm has a better prediction effect on this data set.
The above is how to use the machine learning regression model to predict house prices in Python AI. The editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.