A case study of Boston House Price data Analysis in Python artificial Intelligence 04/15 Update SLTechnology News&Howtos

A case study of Boston House Price data Analysis in Python artificial Intelligence

2025-04-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article mainly shows you "the case of Boston house price data analysis in Python artificial intelligence". The content is simple and clear. I hope it can help you solve your doubts. Let me lead you to study and study this article "the case of Boston house price data analysis in Python artificial intelligence".

1. Data Overview Analysis 1.1 data Overview

This time, we offer:

Train.csv, training set

Test.csv, test set

Submission.csv real house price document

Training set 404 rows of data, 14 columns, each row of data represents the house and the detailed information around the house, and the corresponding average house price has been given. House prices for 102 test data are required to be predicted.

1.2 data analysis

By learning the detailed information about the house and its surroundings, including the urban crime rate, the concentration of nitric oxide, the average number of rooms, the weighted distance to the central area, the average house price of self-housing, etc., train the model, based on the detailed information of the houses in and around the houses in an area, predict the average house price of self-housing in that area.

Regression problem, submit the test set of each data corresponding to the average house price. The evaluation index is mean square error (mse).

two。 Overall idea of the project 2.1 data reading

Data set: Boston room training set. Csv (404 data)

The dataset fields are as follows:

CRIM: per capita crime rate in cities and towns.

ZN: residential land exceeds 25000 sq.ft. The proportion of.

INDUS: the proportion of urban non-retail commercial land.

CHAS: Charles river null variable (1 if the boundary is a river; 0 otherwise).

NOX: the concentration of nitric oxide.

RM: the average number of rooms in the house.

The proportion of self-use houses built before AGE:1940.

DIS: the weighted distance to the five central areas of Boston.

RAD: the proximity index of radioactive roads.

TAX: full-value property tax rate for every $10000.

PTRATIO: teacher-student ratio in cities and towns.

Bazu 1000 (Bk-0.63) ^ 2, where Bk refers to the proportion of blacks in cities and towns.

LSTAT: the proportion of the population with lower status.

MEDV: the average house price of one's own house, in thousands of dollars.

2.2 Model preprocessing (1) data outlier processing

Firstly, the training set is divided into a sub-training set and a sub-test set, and the training set is sorted by train_data.sort_values, and the outlier samples corresponding to each feature are deleted in turn. The model is trained and tested by using the sub-training set and sub-test set, and the optimal number of deleted samples under this feature is determined.

(2) data normalization processing

Using sklearn.preprocessing. StandardScaler standardizes data sets and labels respectively.

2.3. Feature engineering

Random forest feature selection algorithm is used to eliminate insensitive features.

2.4. Model selection

Use GradientBoostingRegressor to integrate the regression model.

Gradient Boosting chooses the direction of gradient descent during iteration to ensure the best final result. The loss function is used to describe the "reliable" degree of the model. Assuming that the model has not been fitted, the larger the loss function, the higher the error rate of the model.

If our model can make the loss function continue to decline, it shows that our model is constantly improving, and the best way is to let the loss function decline in its gradient direction.

2.5. Model evaluation

The mean square error (MSE) scoring standard, MSE: Mean Squared Error was used. The mean square error refers to the expected value of the square of the difference between the estimated value and the true value of the parameter.

MSE can evaluate the change degree of data, and the smaller the value of MSE is, the better the accuracy of the prediction model is to describe the experimental data. The calculation formula is as follows:

Its MSE value on the test set is:

2.6. Model parameter adjustment

Adjust the parameters of n _ n_estimators:

The above is all the contents of this article "the case of Boston House Price data Analysis in Python artificial Intelligence". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.