In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
This article is about how to use the scikit-learn machine learning database to make predictions. The editor thinks it is very practical, so I share it with you. I hope you can get something after reading this article.
Scikit-learn is a Python-based machine learning library, you can select the appropriate model in the scikit-learn library, use it to train data sets and predict new data sets.
For beginners, there is a common puzzle:
How to use the models in the scikit-learn library to make predictions
Cut the crap and let's get started!
Picasso's works: love, fame and tragedy 1. Choice Model
Model selection is the first step in machine learning.
You can use K-fold cross-validation or split training / test sets to process data sets and use them to train models. This is done in order to enable the trained model to predict the new data set.
It is also necessary to judge whether the problem is a classification problem or a regression problem. The classification problem predicts categories and labels, which are generally divided into two categories, that is, 0mem1), such as whether it rains or not. Regression problems predict continuous values, such as the price of a stock.
Second, how to use the classification model
The classification problem means that the model learns the mapping relationship between input features and output tags, and then predicts the tags for the new input. To identify spam, for example, enter the text, time, title and other features of the email, while output two tags: spam and non-spam. The model can make a prediction by training the data set and learning the relationship between features and tags.
Here is a simple LogisticRegression (logical regression) model code example for binary classification problems.
Although we use the LogisticRegression (logical regression) classification model to solve the problem, other classification models in scikit-learn are also applicable.
# Import LogisticRegression method
From sklearn.linear_model import LogisticRegression
# Import data Generator
From sklearn.datasets.samples_generator import make_blobs
# generate 2-D data with 2 categories
X, y = make_blobs (n_samples=100, centers=2, n_features=2, random_state=1)
# training model
Model = LogisticRegression ()
Model.fit (X, y)
Note: make_blobs is a clustered data generator
In particular, two kinds of classified prediction models, category prediction and probability prediction, are introduced.
1. Category prediction
Category prediction: after the model is given and the data instance is trained, the category of the new data instance is predicted by the predict () function of scikit-learn.
For example, there are one or more data instances in the Xnew array, which can be passed to the predict () function to predict the category of each instance.
Xnew = [[...], [...]]
Ynew = model.predict (Xnew)
Enter the code:
# Category forecasting cases
From sklearn.linear_model import LogisticRegression
From sklearn.datasets.samples_generator import make_blobs
# generate a dataset with 100 real columns, that is, 100 rows, and 2 target categories: (0prime1)
X, y = make_blobs (n_samples=100, centers=2, n_features=2, random_state=1)
# fitting model
Model = LogisticRegression ()
Model.fit (X, y)
# generate a new forecast dataset with 3 instances. The new instance here can be one or more
Xnew, _ = make_blobs (n_samples=3, centers=2, n_features=2, random_state=1)
# start forecasting
Ynew = model.predict (Xnew)
# display the forecast results of the category
Print ('forecast category:')
For i in range (len (Xnew)):
Print ("X% s, Predicted=%s"% (Xnew [I], ynewn [I]))
# Show the real categories of the dataset
Print ('real category:')
For i in range (len (Xnew)):
Print ("X% s, Predicted=%s"% (Xnew [I], _ [I]))
Output result:
Forecast category:
X = [- 0.79415228 2.10495117], Predicted=0
X = [- 8.25290074-4.71455545], Predicted=1
X = [- 2.18773166 3.33352125], Predicted=0
Real category:
X = [- 0.79415228 2.10495117], Real=0
X = [- 8.25290074-4.71455545], Real=1
X = [- 2.18773166 3.33352125], Real=0
As you can see, the predicted value is the same as the real value, indicating that the accuracy is 100%.
Tips for string category labels
Sometimes, the category of a dataset may be a string, such as (yes, no), (hot, cold), etc., but the model does not accept string input and output, and must convert the string category to the form of an integer, such as (1d0) corresponding (yes, no).
Scikit-learn provides the LabelEncoder function to convert a string to an integer.
2. Probability prediction
Another classification model is to predict the probability that the data instance belongs to each category. If there are two categories (0Pol 1), then the probability that the output value is 0 and 1 is predicted.
For example, there are one or more data instances in the Xnew array, which can be passed to the predict_proba () function to predict the category of each instance.
Xnew = [[...], [...]]
Ynew = model.predict_proba (Xnew)
Probabilistic prediction is only applicable to models that can make probabilistic prediction, and most (but not all) models can do it.
The following example uses a trained model to predict the probability of each instance in the Xnew array.
Enter the code:
# probabilistic prediction cases
From sklearn.linear_model import LogisticRegression
From sklearn.datasets.samples_generator import make_blobs
# generate a dataset with 100 real columns, that is, 100 rows, and 2 target categories: (0prime1)
X, y = make_blobs (n_samples=100, centers=2, n_features=2, random_state=1)
# training model
Model = LogisticRegression ()
Model.fit (X, y)
# generate a new prediction set with 3 instances or 3 rows
Xnew, _ = make_blobs (n_samples=3, centers=2, n_features=2, random_state=1)
# start forecasting
Ynew = model.predict_proba (Xnew)
# show the predicted category probability, generating a probability of 0 and a probability of 1, respectively
Print ('predicted class probability:')
For i in range (len (Xnew)):
Print ("X% s, Predicted=%s"% (Xnew [I], ynewn [I]))
Print ('real category:')
For i in range (len (Xnew)):
Print ("X% s, Predicted=%s"% (Xnew [I], _ [I]))
Output result:
Predicted category probability:
X = [- 0.79415228 2.10495117], Predicted= [0.94556472 0.05443528]
X = [- 8.25290074-4.71455545], Predicted= [3.60980873e-04 9.99639019e-01]
X = [- 2.18773166 3.33352125], Predicted= [0.98437415 0.01562585]
Real category:
X = [- 0.79415228 2.10495117], Real=0
X = [- 8.25290074-4.71455545], Real=1
X = [- 2.18773166 3.33352125], Real=0
The output of probability prediction can be understood as: output the probability of each category, there are as many probability values as there are categories.
Regression prediction
Regression prediction, like classified prediction, is a kind of supervised learning. By training the given example, that is, the training set, the model learns the mapping relationship between the input features and the output values, such as the output value is 0.1, 0.4, 0.8.
The following code uses the most common LinearRegress linear regression prediction model, but you can also practice it with all other regression models.
Enter the code:
# case of linear regression prediction
# Import related methods
From sklearn.linear_model import LinearRegression
From sklearn.datasets import make_regression
# generate a random regression training dataset with 100 real columns or 100 rows
X, y = make_regression (n_samples=100, n_features=2, noise=0.1, random_state=1)
# fitting model
Model = LinearRegression ()
Model.fit (X, y)
# generate a new prediction set with 3 instances or 3 rows
Xnew, _ = make_regression (n_samples=3, n_features=2, noise=0.1, random_state=1)
# start forecasting
Ynew = model.predict (Xnew)
# show the predicted value
Print ('Forecast:')
For i in range (len (Xnew)):
Print ("X% s, Predicted=%s"% (Xnew [I], ynewn [I]))
# show the real value
Print ('True value:')
For i in range (len (Xnew)):
Print ("X% s, Real=%s"% (Xnew [I], _ [I]))
Note: the make_regression function is a random regression dataset generator
Output result:
Predicted value:
X = [- 1.07296862-0.52817175], Predicted=-80.24979831685631
X = [- 0.61175641 1.62434536], Predicted=120.64928064345101
X = [- 2.3015387 0.86540763], Predicted=0.5518357031232064
True value:
X = [- 1.07296862-0.52817175], Real=-95.68750948023445
X = [- 0.61175641 1.62434536], Real=26.204828091429512
X = [- 2.3015387 0.86540763], Real=-121.28229571474058
The editor uses the classification model and regression model in the scikit-learn library to make the prediction, and explains the difference between the two prediction models. You can also explore other related functions and implement the cases in this paper.
The above is how to use the scikit-learn machine learning database to make predictions. The editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.