Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use Python Neural Network to predict Automobile Insurance Expenditure

2025-01-31 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/03 Report--

This article mainly explains "how to use Python neural network to predict automobile insurance expenditure". Interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn "how to use Python neural network to predict automobile insurance expenditure".

Developing neural network prediction models for new data sets can be challenging.

One approach is to first examine the dataset and develop ideas for possible models, then explore the learning dynamics of simple models on the dataset, and finally use robust testing tools to develop and adjust the model for the dataset. This process can be used to develop effective neural network models for classification and regression prediction modeling problems.

In this tutorial, you will discover how to develop a multi-layer Perceptron neural network model for Swedish auto insurance regression datasets. After completing this tutorial, you will know:

How to load and summarize the Swedish auto insurance dataset, and how to use the data preparation and model configuration recommended by the results.

How to explore the learning dynamics of the simple MLP model and the data transformation on the data set.

How to develop a reliable estimation of the model performance, adjust the model performance and predict the new data.

Overview of the tutorial

This tutorial is divided into four parts. They are:

Automobile insurance regression data set

The first MLP and learning motivation

Evaluate and adjust the MLP model

Final model and prediction

Automobile insurance regression data set

The first step is to define and explore the dataset. We will use the Auto Insurance standard regression dataset. The data set describes car insurance in Sweden. There is only one input variable, the number of claims, and the target variable is the total claim in thousands of Swedish kronor. The purpose is to predict the total payment amount given the number of claims.

You can learn more about datasets here:

Automobile Insurance data set (auto-insurance.csv)

Auto Insurance dataset details (auto-insurance.names)

You can see the first few rows of the dataset below.

108392.5 19,46.2 13,15.7 124422.2 40119.4

We can see that these values are numeric, ranging from tens to hundreds. This shows that when modeling using neural networks, a certain type of scaling is suitable for the data.

We can load the dataset as pandas DataFrame directly from URL; for example:

# load the dataset and summarize the shape from pandas import read_csv # define the location of the dataset url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/auto-insurance.csv' # load the dataset df = read_csv (url, header=None) # summarize shape print (df.shape)

Running the example loads the dataset directly from URL and reports the shape of the dataset.

In this case, we can confirm that the dataset has two variables (an input and an output) and that the dataset has 63 rows of data.

For neural networks, this is not a lot of data rows, which indicates that a small network (possibly with regularization) would be appropriate.

This also shows that using k-fold cross-validation is a good idea because it provides more reliable model performance estimates than train / test splits. and because a single model can fit in seconds rather than hours or days. The largest data set.

(63,2)

Next, we can learn more about the dataset by looking at summary statistics and data graphs.

# show summary statistics and plots of the dataset from pandas import read_csv from matplotlib import pyplot # define the location of the dataset url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/auto-insurance.csv' # load the dataset df = read_csv (url, header=None) # show summary statistics print (df.describe ()) # plot histograms df.hist () pyplot.show ()

Before running the example, load the data and then output summary statistics for each variable

We can see that the average of each variable is less than ten, ranging from 0 to hundreds. This confirms that scaling data may be a good idea.

0 1 count 63.000000 63.000000 mean 22.904762 98.187302 std 23.351946 87.327553 min 0.000000 0.000000 25% 7.500000 38.850000 50% 14.000000 73.400000 75% 29.000000 140.000000 max 124.000000 422.200000

Then create a histogram for each variable.

We can see that each variable has a similar distribution. It looks like a skewed Gaussian distribution or an exponential distribution.

We can use power transformations on each variable to reduce the skewness of the probability distribution, which may improve the performance of the model.

Now that we are familiar with the dataset, let's explore how to develop a neural network model.

The first MLP and learning motivation

We will use TensorFlow to develop a multilayer perceptron (MLP) model for the dataset. We don't know which model architecture for learning hyperparameters will be good or best for this data set, so we have to experiment and find out what works. Assuming that the dataset is small, a small batch might be a good idea, such as 8 or 16 rows. When getting started, using the Adam version of the random gradient descent method is a good idea because it automatically adapts to the learning rate and works well on most data sets. Before carefully evaluating the model, it is best to review the learning dynamics and adjust the model architecture and learning configuration until we have stable learning dynamics, and then make full use of the model.

We can do this by simply splitting the training / test data and looking at the learning curve. This will help us understand whether we are overlearning or underlearning; then we can adjust the configuration accordingly. First of all, we can divide the dataset into input and output variables, and then into the 67Tap 33 training and test set.

# split into input and output columns X, y = df.values [:,:-1], df.values [:,-1] # split into train and test datasets X_train, X_test, y_train, y_test = train_test_split (X, y, test_size=0.33)

Next, we can define a minimal MLP model. In this case, we will use a hidden layer of 10 nodes and an output layer (of any choice). We will use the ReLU activation feature and "he_normal" weight initialization in the hidden layer because they are a good practice.

The output of the model is linearly activated (inactive) and we will minimize the mean square error (MSE) loss.

# determine the number of input features n_features = X.shape [1] # define model model = Sequential () model.add (Dense (10, activation='relu', kernel_initializer='he_normal', input_shape= (n_features,)) model.add (Dense (1)) # compile the model model.compile (optimizer='adam', loss='mse')

We fit the model into 100 training periods (randomly selected) and batch to 8 because it is a very small data set. We are fitting the model on the raw data, and we think this may not be a good idea, but it is an important starting point.

# fit the model history = model.fit (X_train, y_train, epochs=100, batch_size=8, verbose=0, validation_data=)

At the end of the training, we will evaluate the performance of the model on the test dataset and report performance as mean absolute error (MAE), which I usually prefer MSE or RMSE.

# predict test set yhat = model.predict (X_test) # evaluate predictions score = mean_absolute_error (y_test, yhat) print ('MAE:% .3f'% score)

Finally, we will draw the learning curve of MSE loss on the training and test sets during training.

# plot learning curves pyplot.title ('Learning Curves') pyplot.xlabel (' Epoch') pyplot.ylabel ('Mean Squared Error') pyplot.plot (history.history [' loss'], label='train') pyplot.plot (history.history ['val_loss'], label='val') pyplot.legend () pyplot.show ()

To sum up, a complete example of evaluating our first MLP on the auto insurance dataset is listed below.

# fit a simple mlp model and review learning curves from pandas import read_csv from sklearn.model_selection import train_test_split from sklearn.metrics import mean_absolute_error from tensorflow.keras import Sequential from tensorflow.keras.layers import Dense from matplotlib import pyplot # load the dataset path = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/auto-insurance.csv' df = read_csv (path, header=None) # split into input and output columns X, y = df.values [:,:-1], df.values [: -1] # split into train and test datasets X_train, X_test, y_train, y_test = train_test_split (X, y, test_size=0.33) # determine the number of input features n_features = X.shape [1] # define model model = Sequential () model.add (Dense (10, activation='relu', kernel_initializer='he_normal', input_shape= (n_features,)) model.add (Dense (1)) # compile the model model.compile (optimizer='adam' Loss='mse') # fit the model history = model.fit (X_train, y_train, epochs=100, batch_size=8, verbose=0, validation_data=) # predict test set yhat = model.predict (X_test) # evaluate predictions score = mean_absolute_error (y_test Yhat) print ('MAE:% .3f' score) # plot learning curves pyplot.title ('Learning Curves') pyplot.xlabel (' Epoch') pyplot.ylabel ('Mean Squared Error') pyplot.plot (history.history [' loss'], label='train') pyplot.plot (history.history ['val_loss'], label='val') pyplot.legend () pyplot.show ()

Running the example first adapts the model to the training dataset, and then reports the MAE of the test dataset.

Note: your results may be different due to the randomness of the algorithm or evaluation program, or due to differences in numerical accuracy. Consider running the example several times and compare the average results.

In this case, we can see that the model implements about 33.2 MAE, which is a good benchmark for performance, and we may be able to improve it.

MAE: 33.233

Then create a diagram of MSE on the train and the test rig.

We can see that the model has good degree of fit and good convergence. Configuration of the model is a good starting point.

So far, learning motivation is very good, MAE is a rough estimate, should not be relied on.

We may be able to slightly increase the capacity of the model and look forward to similar learning dynamics. For example, we can add a second hidden layer with eight nodes (any choice) and double the number of training periods to 200.

# define model model = Sequential () model.add (Dense (10, activation='relu', kernel_initializer='he_normal', input_shape= (n_features,)) model.add (Dense (8, activation='relu', kernel_initializer='he_normal')) model.add (Dense (1)) # compile the model model.compile (optimizer='adam', loss='mse') # fit the model history = model.fit (X_train, y_train, epochs=200, batch_size=8, verbose=0, validation_data= (X_test Y_test))

A complete example is as follows:

# fit a deeper mlp model and review learning curves from pandas import read_csv from sklearn.model_selection import train_test_split from sklearn.metrics import mean_absolute_error from tensorflow.keras import Sequential from tensorflow.keras.layers import Dense from matplotlib import pyplot # load the dataset path = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/auto-insurance.csv' df = read_csv (path, header=None) # split into input and output columns X, y = df.values [:,:-1], df.values [: -1] # split into train and test datasets X_train, X_test, y_train, y_test = train_test_split (X, y, test_size=0.33) # determine the number of input features n_features = X.shape [1] # define model model = Sequential () model.add (Dense (10, activation='relu', kernel_initializer='he_normal', input_shape= (n_features,)) model.add (Dense (8, activation='relu') Kernel_initializer='he_normal')) model.add (Dense (1)) # compile the model model.compile (optimizer='adam', loss='mse') # fit the model history = model.fit (X_train, y_train, epochs=200, batch_size=8, verbose=0, validation_data=) # predict test set yhat = model.predict (X_test) # evaluate predictions score = mean_absolute_error (y_test Yhat) print ('MAE:% .3f' score) # plot learning curves pyplot.title ('Learning Curves') pyplot.xlabel (' Epoch') pyplot.ylabel ('Mean Squared Error') pyplot.plot (history.history [' loss'], label='train') pyplot.plot (history.history ['val_loss'], label='val') pyplot.legend () pyplot.show ()

Running the example first adapts the model to the training dataset, and then reports the MAE of the test dataset.

Note: your results may be different due to the randomness of the algorithm or evaluation program, or due to differences in numerical accuracy. Consider running the example several times and compare the average results.

In this case, we can see a slight improvement in MAE, about 27.9, although the high variance of the training / test split means that the assessment is unreliable.

MAE: 27.939

Then draw the learning curve of MSE training and test set. We can see that, as expected, the model achieves good fitting and convergence within a reasonable number of iterations.

Finally, we can try to convert the data and see how it affects learning motivation.

In this case, we will use power transformation to reduce the deviation of data distribution. This will also automatically standardize variables so that their average is zero and the standard deviation is 1, which is a good habit when using neural networks for modeling.

First, we must make sure that the target variable is a two-dimensional array.

# ensure that the target variable is a 2d array y_train, y_test = y_train.reshape ((len (y_train), 1)), y_test.reshape ((len (y_test), 1))

Next, we can apply PowerTransformer to the input and target variables.

This can be achieved by transforming the training data first, and then transforming the training and test sets.

This process is applied to input and output variables respectively to avoid data leakage.

# power transform input data pt1 = PowerTransformer () pt1.fit (X_train) X_train = pt1.transform (X_train) X_test = pt1.transform (X_test) # power transform output data pt2 = PowerTransformer () pt2.fit (y_train) y_train = pt2.transform (y_train) y_test = pt2.transform (y_test)

The data is then used to fit the model.

Later, the transformation can be inverted according to the prediction made by the model and the expected target value in the test set, and we can calculate the MAE at the correct ratio as before.

# inverse transforms on target variable y_test = pt2.inverse_transform (y_test) yhat = pt2.inverse_transform (yhat)

Taken together, the following is a complete example of fitting and evaluating MLP using the transformed data and creating a learning curve for the model.

# fit a mlp model with data transforms and review learning curves from pandas import read_csv from sklearn.model_selection import train_test_split from sklearn.metrics import mean_absolute_error from sklearn.preprocessing import PowerTransformer from tensorflow.keras import Sequential from tensorflow.keras.layers import Dense from matplotlib import pyplot # load the dataset path = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/auto-insurance.csv' df = read_csv (path, header=None) # split into input and output columns X, y = df.values [:,:-1] Df.values [:,-1] # split into train and test datasets X_train, X_test, y_train, y_test = train_test_split (X, y, test_size=0.33) # ensure that the target variable is a 2d array y_train, y_test = y_train.reshape ((len (y_train), 1)), y_test.reshape ((len (y_test)) ) # power transform input data pt1 = PowerTransformer () pt1.fit (X_train) X_train = pt1.transform (X_train) X_test = pt1.transform (X_test) # power transform output data pt2 = PowerTransformer () pt2.fit (y_train) y_train = pt2.transform (y_train) y_test = pt2.transform (y_test) # determine the number of input features n_features = X.shape [1] # define model model = Sequential () model.add (Dense (10) Activation='relu', kernel_initializer='he_normal', input_shape= (n_features,)) model.add (Dense (8, activation='relu', kernel_initializer='he_normal')) model.add (Dense (1)) # compile the model model.compile (optimizer='adam', loss='mse') # fit the model history = model.fit (X_train, y_train, epochs=200, batch_size=8, verbose=0, validation_data= (X_test Y_test) # predict test set yhat = model.predict (X_test) # inverse transforms on target variable y_test = pt2.inverse_transform (y_test) yhat = pt2.inverse_transform (yhat) # evaluate predictions score = mean_absolute_error (y_test, yhat) print ('MAE:% .3f'% score) # plot learning curves pyplot.title ('Learning Curves') pyplot.xlabel (' Epoch') pyplot.ylabel ('Mean Squared Error') pyplot.plot (history.history [' loss'] Label='train') pyplot.plot (history.history ['val_loss'], label='val') pyplot.legend () pyplot.show ()

Running the example first adapts the model to the training dataset, and then reports the MAE of the test dataset.

Note: your results may be different due to the randomness of the algorithm or evaluation program, or due to differences in numerical accuracy. Consider running the example several times and compare the average results.

In this case, the model can achieve a reasonable MAE score, although the performance is worse than previously reported. We will ignore the model performance for the time being.

MAE: 34.320

The graph of the learning curve is created, which shows that the model achieves a reasonable fitting and has enough time to converge.

Now that we have some understanding of the learning dynamics of a simple MLP model with or without data transformation, we can take a look at evaluating the performance of the model and adjusting the configuration of the model.

Evaluate and adjust the MLP model

The k-fold cross-validation process can provide a more reliable estimate of MLP performance, although it can be very slow. This is because k models must be fitted and evaluated. When the dataset size is small (for example, auto insurance datasets), this is not a problem. We can use the KFold class to create splits and manually enumerate each fold, fit the model, evaluate it, and then report the average evaluation score at the end of the process.

# prepare cross validation kfold = KFold (10) # enumerate splits scores = list () for train_ix, test_ix in kfold.split (X, y): # fit and evaluate the model... ... # summarize all scores print ('Mean MAE:% .3f (% .3f)'% (mean (scores), std (scores)

We can use this framework to develop reliable estimates of the performance of MLP models through a series of different data preparation, model architecture, and learning configurations.

Importantly, before using k-fold cross-validation to evaluate performance, we first have an understanding of the learning dynamics of the dataset model in the previous section. If we start to adjust the model directly, we may get good results, but if not, we may not know why, such as model excess or deficiency.

If we make major changes to the model again, it is best to return and confirm that the model is converging properly.

A complete example of this framework for evaluating the basic MLP model in the previous section is listed below.

# k-fold cross-validation of base model for the auto insurance regression dataset from numpy import mean from numpy import std from pandas import read_csv from sklearn.model_selection import KFold from sklearn.metrics import mean_absolute_error from tensorflow.keras import Sequential from tensorflow.keras.layers import Dense from matplotlib import pyplot # load the dataset path = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/auto-insurance.csv' df = read_csv (path, header=None) # split into input and output columns X, y = df.values [:,:-1] Df.values [:,-1] # prepare cross validation kfold = KFold (10) # enumerate splits scores = list () for train_ix, test_ix in kfold.split (X, y): # split data X_train, X_test, y_train, y_test = X [ix], X [ix], y [train _ ix] Y [test _ ix] # determine the number of input features n_features = X.shape [1] # define model model = Sequential () model.add (Dense (10, activation='relu', kernel_initializer='he_normal', input_shape= (n_features,)) model.add (Dense (1)) # compile the model model.compile (optimizer='adam', loss='mse') # fit the model model.fit (X_train, y_train, epochs=100, batch_size=8 Verbose=0) # predict test set yhat = model.predict (X_test) # evaluate predictions score = mean_absolute_error (y_test, yhat) print ('>% .3f'% score) scores.append (score) # summarize all scores print ('Mean MAE:% .3f (% .3f)'% (mean (scores), std (scores)

Running the example reports model performance at each iteration of the evaluation process and reports the average and standard deviation of MAE at the end of the run.

Note: your results may be different due to the randomness of the algorithm or evaluation program, or due to differences in numerical accuracy. Consider running the example several times and compare the average results.

In this case, we can see that the MAE of the MLP model is about 38.913.

We will use this result as a benchmark to see if better performance can be achieved.

> 27.314 > 69.577 > 20.891 > 14.810 > 13.412 > 69.540 > 25.612 > 49.508 > 35.769 > 62.696Mean MAE: 38.91321.056)

First, let's try to evaluate a deeper model on the original dataset to see if it performs better than the benchmark model.

# define model model = Sequential () model.add (Dense (10, activation='relu', kernel_initializer='he_normal', input_shape= (n_features,)) model.add (Dense (8, activation='relu', kernel_initializer='he_normal')) model.add (Dense (1)) # compile the model model.compile (optimizer='adam', loss='mse') # fit the model model.fit (X_train, y_train, epochs=200, batch_size=8, verbose=0)

A complete example is as follows:

# k-fold cross-validation of deeper model for the auto insurance regression dataset from numpy import mean from numpy import std from pandas import read_csv from sklearn.model_selection import KFold from sklearn.metrics import mean_absolute_error from tensorflow.keras import Sequential from tensorflow.keras.layers import Dense from matplotlib import pyplot # load the dataset path = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/auto-insurance.csv' df = read_csv (path, header=None) # split into input and output columns X, y = df.values [:,:-1] Df.values [:,-1] # prepare cross validation kfold = KFold (10) # enumerate splits scores = list () for train_ix, test_ix in kfold.split (X, y): # split data X_train, X_test, y_train, y_test = X [ix], X [ix], y [train _ ix] Y [test _ ix] # determine the number of input features n_features = X.shape [1] # define model model = Sequential () model.add (Dense (10, activation='relu', kernel_initializer='he_normal', input_shape= (n_features,)) model.add (Dense (8, activation='relu', kernel_initializer='he_normal')) model.add (Dense (1)) # compile the model model.compile (optimizer='adam' Loss='mse') # fit the model model.fit (X_train, y_train, epochs=200, batch_size=8, verbose=0) # predict test set yhat = model.predict (X_test) # evaluate predictions score = mean_absolute_error (y_test, yhat) print (>% .3f'% score) scores.append (score) # summarize all scores print ('Mean MAE:% .3f (% .3f)'% (mean (scores), std (scores)

The average and standard deviation of MAE at the end of the run report.

Note: your results may be different due to the randomness of the algorithm or evaluation program, or due to differences in numerical accuracy. Consider running the example several times and compare the average results.

In this case, we can see that the MAE obtained by the MLP model is about 35.384, which is slightly better than the baseline model with a MAE of about 38.913.

Mean MAE: 35.384 (14.951)

Next, let's try to use the same model that powers the input and target variables as in the previous section.

A complete example is listed below.

# k-fold cross-validation of deeper model with data transforms from numpy import mean from numpy import std from pandas import read_csv from sklearn.model_selection import KFold from sklearn.metrics import mean_absolute_error from sklearn.preprocessing import PowerTransformer from tensorflow.keras import Sequential from tensorflow.keras.layers import Dense from matplotlib import pyplot # load the dataset path = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/auto-insurance.csv' df = read_csv (path, header=None) # split into input and output columns X, y = df.values [: :-1], df.values [:,-1] # prepare cross validation kfold = KFold (10) # enumerate splits scores = list () for train_ix, test_ix in kfold.split (X, y): # split data X_train, X_test, y_train, y_test = X [train _ ix], X [test _ ix], y [train _ ix], y [test _ ix] # ensure target is a 2d array y_train Y_test = y_train.reshape ((len (y_train), 1)), y_test.reshape ((len (y_test)) 1)) # prepare input data pt1 = PowerTransformer () pt1.fit (X_train) X_train = pt1.transform (X_train) X_test = pt1.transform (X_test) # prepare target pt2 = PowerTransformer () pt2.fit (y_train) y_train = pt2.transform (y_train) y_test = pt2.transform (y_test) # determine the number of input features n_features = X.shape [1] # define Model model = Sequential () model.add (Dense (10) Activation='relu', kernel_initializer='he_normal', input_shape= (n_features,)) model.add (Dense (8, activation='relu', kernel_initializer='he_normal')) model.add (Dense (1)) # compile the model model.compile (optimizer='adam', loss='mse') # fit the model model.fit (X_train, y_train, epochs=200, batch_size=8 Verbose=0) # predict test set yhat = model.predict (X_test) # inverse transforms y_test = pt2.inverse_transform (y_test) yhat = pt2.inverse_transform (yhat) # evaluate predictions score = mean_absolute_error (y_test, yhat) print (>% .3f'% score) scores.append (score) # summarize all scores print ('Mean MAE:% .3f (% .3f)'% (mean (scores), std (scores)

The average and standard deviation of MAE at the end of the run report.

Note: your results may be different due to the randomness of the algorithm or evaluation program, or due to differences in numerical accuracy. Consider running the example several times and compare the average results.

In this case, we can see that the MAE obtained by the MLP model is about 37.371, which is better than the benchmark model, but not better than the deeper benchmark model.

Maybe this shift is not as useful as we initially thought.

Mean MAE: 37.371 (29.326)

Another transformation is to normalize input variables and target variables.

This means that the value of each variable is scaled to the range of [0Phone1]. We can use MinMaxScaler to do this. For example:

# prepare input data pt1 = MinMaxScaler () pt1.fit (X_train) X_train = pt1.transform (X_train) X_test = pt1.transform (X_test) # prepare target pt2 = MinMaxScaler () pt2.fit (y_train) y_train = pt2.transform (y_train) y_test = pt2.transform (y_test)

Taken together, a complete example of evaluating a deeper MLP using data normalization is listed below.

# k-fold cross-validation of deeper model with normalization transforms from numpy import mean from numpy import std from pandas import read_csv from sklearn.model_selection import KFold from sklearn.metrics import mean_absolute_error from sklearn.preprocessing import MinMaxScaler from tensorflow.keras import Sequential from tensorflow.keras.layers import Dense from matplotlib import pyplot # load the dataset path = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/auto-insurance.csv' df = read_csv (path, header=None) # split into input and output columns X, y = df.values [: :-1], df.values [:,-1] # prepare cross validation kfold = KFold (10) # enumerate splits scores = list () for train_ix, test_ix in kfold.split (X, y): # split data X_train, X_test, y_train, y_test = X [train _ ix], X [test _ ix], y [train _ ix], y [test _ ix] # ensure target is a 2d array y_train Y_test = y_train.reshape ((len (y_train), 1)), y_test.reshape ((len (y_test)) 1)) # prepare input data pt1 = MinMaxScaler () pt1.fit (X_train) X_train = pt1.transform (X_train) X_test = pt1.transform (X_test) # prepare target pt2 = MinMaxScaler () pt2.fit (y_train) y_train = pt2.transform (y_train) y_test = pt2.transform (y_test) # determine the number of input features n_features = X.shape [1] # define Model model = Sequential () model.add (Dense (10) Activation='relu', kernel_initializer='he_normal', input_shape= (n_features,)) model.add (Dense (8, activation='relu', kernel_initializer='he_normal')) model.add (Dense (1)) # compile the model model.compile (optimizer='adam', loss='mse') # fit the model model.fit (X_train, y_train, epochs=200, batch_size=8 Verbose=0) # predict test set yhat = model.predict (X_test) # inverse transforms y_test = pt2.inverse_transform (y_test) yhat = pt2.inverse_transform (yhat) # evaluate predictions score = mean_absolute_error (y_test, yhat) print (>% .3f'% score) scores.append (score) # summarize all scores print ('Mean MAE:% .3f (% .3f)'% (mean (scores), std (scores)

The average and standard deviation of MAE at the end of the run report.

Note: your results may be different due to the randomness of the algorithm or evaluation program, or due to differences in numerical accuracy. Consider running the example several times and compare the average results.

In this case, we can see that the MAE obtained by the MLP model is about 30.388, which is better than any other configuration we have tried so far.

Mean MAE: 30.388 (14.258)

We can continue to test the alternative configuration of the model architecture (more or fewer nodes or layers), learn about hyperparameters (more or less batch processing), and data transformations.

I keep it as an exercise; let me know what you found. Can you get better results?

Post your results in the comments below, and I'd love to see what you get.

Next, let's look at how to fit the final model and use it for prediction.

Final model and prediction

After selecting the model configuration, we can train the final model on all available data and use it to predict the new data. In this case, we will use a deeper model of data standardization as the final model. This means that if we want to save the model to a file, we must save the model itself (for prediction), the transformation of input data (for new input data), and the transformation of target variables (for new prediction). Although we can do it on the entire dataset instead of the training subset of the dataset, we can still prepare the data and fit the model as before.

# split into input and output columns X, y = df.values [:,:-1], df.values [:,-1] # ensure target is a 2d array yy = y.reshape ((len (y) ) # prepare input data pt1 = MinMaxScaler () pt1.fit (X) X = pt1.transform (X) # prepare target pt2 = MinMaxScaler () pt2.fit (y) y = pt2.transform (y) # determine the number of input features n_features = X.shape [1] # define model model = Sequential () model.add (Dense (10, activation='relu', kernel_initializer='he_normal', input_shape= (n_features,)) model.add (Dense (8, activation='relu') Kernel_initializer='he_normal')) model.add (Dense (1)) # compile the model model.compile (optimizer='adam', loss='mse')

We can then use this model to predict the new data. First, we can define a new row of data, which is just a variable of the dataset.

# define a row of new data row = [13]

We can then convert this new data to prepare for use as input to the model.

# transform the input data X_new = pt1.transform ([row])

Then we can make predictions.

# make prediction yhat = model.predict (X_new)

Then reverse the predicted transformation so that we can use or interpret the results in the correct proportion.

# invert transform on prediction yhat = pt2.inverse_transform (yhat)

In this case, we will only report the forecast.

# report prediction print ('f (% s) =% .3f'% (row, yhat [0]))

To sum up, the following is a complete example of fitting the final model for a car insurance dataset and using it to predict new data.

# fit a final model and make predictions on new data. From pandas import read_csv from sklearn.model_selection import KFold from sklearn.metrics import mean_absolute_error from sklearn.preprocessing import MinMaxScaler from tensorflow.keras import Sequential from tensorflow.keras.layers import Dense # load the dataset path = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/auto-insurance.csv' df = read_csv (path, header=None) # split into input and output columns X, y = df.values [:,:-1], df.values [: -1] # ensure target is a 2d array yy = y.reshape ((len (y), 1)) # prepare input data pt1 = MinMaxScaler () pt1.fit (X) X = pt1.transform (X) # prepare target pt2 = MinMaxScaler () pt2.fit (y) y = pt2.transform (y) # determine the number of input features n_features = X.shape [1] # define model model = Sequential () model.add (Dense (10, activation='relu', kernel_initializer='he_normal') Input_shape= (n_features,)) model.add (Dense (8, activation='relu', kernel_initializer='he_normal')) model.add (Dense (1)) # compile the model model.compile (optimizer='adam', loss='mse') # fit the model model.fit (X, y, epochs=200, batch_size=8 Verbose=0) # define a row of new data row = [13] # transform the input data X_new = pt1.transform ([row]) # make prediction yhat = model.predict (X_new) # invert transform on prediction yhat = pt2.inverse_transform (yhat) # report prediction print ('f (% s) =% .3f'% (row, yhat [0]))

Running the example makes the model fit the entire dataset and predicts new data for a single row.

Note: your results may be different due to the randomness of the algorithm or evaluation program, or due to differences in numerical accuracy. Consider running the example several times and compare the average results.

In this case, we can see that input 13 leads to output 62 (thousands of Swedish kronor).

F ([13]) = 62.595 so far, I believe you have a deeper understanding of "how to use Python neural network to predict automobile insurance expenditure". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report