How to predict missing values in Python 07/19 Update SLTechnology News&Howtos

How to predict missing values in Python

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

This article introduces how to predict the missing value in Python. The content is very detailed. Interested friends can use it for reference. I hope it will be helpful to you.

Import pandas as pddf = pd.read_csv ("winemag-data-130k-v2.csv")

Next, let's output the first five lines of data:

Print (df.head ())

Let's randomly select 500 records from this data. This will help speed up model training and testing, although readers can easily modify it:

Import pandas as pddf = pd.read_csv ("winemag-data-130k-v2.csv") .sample (nasty 500 https://my.oschina.net/u/4253699/blog/, random_state = 42)

Now, let's print the information corresponding to the data, which will give us an idea of which columns are missing values:

Print (df.info ())

Several columns have non-null values less than 500, which corresponds to the missing values. First, let's consider building a model that uses "points" to estimate the missing "price" value. First, let's print the correlation between "price" and "points":

Print ("Correlation:" https://my.oschina.net/u/4253699/blog/, df ['points'] .corr (df [' price']))

We see a weak positive correlation. Let's establish a linear regression model and use "points" to predict "price". First, let's import the "LinearRegresssion" module from "scikit learn":

From sklearn.linear_model import LinearRegression

Now, let's split the data for training and testing. We want to be able to predict missing values, but we should use the real value "price" to verify our prediction. Let's filter the missing values by selecting only positive price values:

Import numpy as np df_filter = df [df ['price'] > 0] .copy ()

We can also initialize a list that stores predicted and actual values:

Y_pred = [] y_true = []

We will use K-fold cross-validation to validate our model. Let's import the "KFolds" module from "scikit learn". We will use a 10% discount to verify our model:

From sklearn.model_selection import KFoldkf = KFold (n_splits=10 https://my.oschina.net/u/4253699/blog/, random_state = 42) for train_index https://my.oschina.net/u/4253699/blog/, test_index in kf.split (df_filter): df_test = df_ filter.iloc[ test _ index] df_train = df_ filter.iloc.train _ index]

We can now define our inputs and outputs:

For train_index https://my.oschina.net/u/4253699/blog/, test_index in kf.split (df_filter):... X_train = np.array (df_train ['points']). Reshape (- 1 https://my.oschina.net/u/4253699/blog/, 1) y_train = np.array (df_train [' price']). Reshape (- 1 https://my.oschina.net/u/4253699/blog/, 1) X_test = np.array (df_test ['points']). Reshape (- 1 https://my.oschina.net/u/4253699/blog/, 1) y_test = np.array (df_test ['price']) .reshape (- 1 https://my.oschina.net/u/4253699/blog/, 1)

And fit our linear regression model:

For train_index https://my.oschina.net/u/4253699/blog/, test_index in kf.split (df_filter):... Model = LinearRegression () model.fit (X_train https://my.oschina.net/u/4253699/blog/, y_train)

Now let's generate and store our predictions:

For train_index https://my.oschina.net/u/4253699/blog/, test_index in kf.split (df_filter):... Y_pred.append (model.predict (X_test) [0]) y_true.append (y_test [0])

Now let's evaluate the performance of the model. Let's evaluate the performance of the model with mean square error:

Print ("Mean Square Error:" https://my.oschina.net/u/4253699/blog/, mean_squared_error (y_true https://my.oschina.net/u/4253699/blog/, y_pred))

Not so good. We can improve this by adding a standard deviation to the average price of training:

Df_filter = df [df ['price']

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.