In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly shows you "how to achieve support vector machine data classification and regression prediction in Python", the content is easy to understand, clear, hope to help you solve your doubts, the following let the editor lead you to study and learn "how to achieve support vector machine data classification and regression prediction in Python" this article.
Support vector machines are often used for data classification and can also be used for data regression prediction.
1 、 Question?
We often encounter the problem of giving you some data that belongs to two categories (such as sub-figure 1). We need a linear classifier to separate the data, and there are many classifications (such as sub-figure 2). Now there is a problem, two classifiers, which is better? In order to judge whether it is good or bad, we need to introduce a criterion: a good classifier can not only separate the existing data set well, but also divide the known data into two parts. suppose that there is a new data belonging to red data points (such as the green triangle in sub-figure 3), you can see that the black line will misdivide the new data set, while the blue line will not. * * so how to judge the robustness of the two lines? * * at this point, we introduce an important concept-the maximum interval (which depicts the boundary between the current classifier and the dataset) (such as the shaded part of sub-figure 4). We can see that the maximum interval of blue lines is larger than that of black lines. so choose the blue line as our classifier. (see sub-figure 5) is the classifier optimal at this time? Or is there a better classifier with a larger interval? Some (such as sub-figure 6) introduce SVM to find the optimal classifier.
2 、 Answer! -- SVMimport numpy as npimport matplotlib.pyplot as pltfrom sklearn.datasets import make_blobsX, y = make_blobs (n_samples=60, centers=2, random_state=0, cluster_std=0.4) x_fit = np.linspace (0,3) # use the SVMfrom sklearn.svm import SVC# SVM function clf = SVC (kernel='linear') clf.fit (X Y) # Best function w = clf.coef_ [0] a =-w [0] / w [1] yroomp = a*x_fit-(clf.intercept_ [0]) / w [1] # the lower boundary of the maximum margin b_down = clf.support_vectors_ [0] y_down = a*x_fit + b_down [1]-a * b_down [0] # the maximum margin of the previous b_up = clf. Support_vectors_ [- 1] y_up = a * x_fit + b_up [1]-a * b_up [0] # draw scatter plot X Y = make_blobs (n_samples=60, centers=2, random_state=0, cluster_std=0.4) plt.scatter (X [:, 0], X [:, 1], plt.plot (x_fit, yellowp,'- c') # draw the plt.fill_between (x_fit, y_down, y_up, edgecolor='none', color='#AAAAAA', alpha=0.4) # draw the support vector plt.scatter (clf.support_vectors_ [: 0], clf.support_vectors_ [:, 1], edgecolor='b', slots 80, facecolors='none')
Running result
The points with edges are the points closest to the current classifier, which are called support vectors. Support vector machine is the principle for us to choose among many possible classifiers, thus ensuring a higher generalization of known data sets.
3. Soft interval
In many cases, it is not easy to find the maximum interval when we get the data that is not as clear as the above (as shown in the figure below). So there is a soft interval, and we allow individual data to appear in the interval band as opposed to the hard interval. We know that if there is no principle to constrain, there will be many classifiers that satisfy the soft interval. So we need to punish the wrong data. The SVM function has a parameter C that is the penalty parameter. The smaller the penalty parameter, the greater the tolerance.
-- where C is set to 1
#%% soft interval X, y = make_blobs (n_samples=60, centers=2, random_state=0, cluster_std=0.9) x_fit = np.linspace (- 2,4) # penalty parameter: Cruise 1 Clf = SVC (Category 1, kernel='linear') clf.fit (X Y) # Best function w = clf.coef_ [0] a =-w [0] / w [1] y_great = a*x_fit-(clf.intercept_ [0]) / w [1] # maximum margin lower boundary b_down = clf.support_vectors_ [0] y_down = a*x_fit + b_down [1]-a * b_down [0] # maximum margin upper boundary b_up = clf .support _ vectors_ [- 1] y_up = a * x_fit + b_up [1]-a * b_up [0] # draw scatter plt.scatter (X [: 0], X [:, 1], y_great, 50, cmap=plt.cm.Paired) # draw function plt.plot (x_fit, y_great,'- c') # draw margin plt.fill_between (x_fit, y_down, y_up, edgecolor='none', color='#AAAAAA', alpha=0.4) # draw support vector plt.scatter (clf.support_vectors_ [:, 0], clf.support_vectors_ [:, 1], edgecolor='b', support 80 Facecolors='none')
Running result
-- when C is set to 0.2, SVM is more inclusive and therefore compatible with more misdivided samples. The results are as follows:
4. Hyperplane
Sometimes, the data we get is like this (as shown in the following figure). At this time, the data in two-dimensional space (low-dimensional) can be mapped to three-dimensional space (high-dimensional). At this time, the data can be divided by a hyperplane, so the purpose of mapping is to use the ability of SVM to find hyperplanes in high-dimensional space.
#% hyperplane from sklearn.datasets import make_circles# scatter plot X, y = make_circles (100,100, factor=.1, noise=.1, random_state=2019) plt.scatter (X [:, 0], X [:, 1], Carey, swarm 50, cmap=plt.cm.Paired) # data mapping r = np.exp (- (X [:, 0] * * 2 + X [:, 1] * * 2)) ax = plt.subplot (projection='3d') ax.scatter3D (X [:, 0] X [:, 1], r, cymbiy, spore 50, cmap=plt.cm.Paired) ax.set_xlabel ('x') ax.set_ylabel ('y') ax.set_zlabel ('z') xroom1, ytun1 = np.meshgrid (np.linspace (- 1,1), np.linspace (- 1,1)) z = 0.01*x_1 + 0.01*y_1 + 0.5ax.plot_surface (xylene 1, yellow1, z, alpha=0.3)
Running result
The classification of this situation is realized by using Gaussian kernel function.
#% use Gaussian kernel functions to achieve this classification: kernel='rbf'# drawing X, y = make_circles (100,100, factor=.1, noise=.1, random_state=2019) plt.scatter (X [:, 0], X [:, 1], kernel='rbf', swarm 50, cmap=plt.cm.Paired) clf = SVC (kernel='rbf') clf.fit (X, y) ax = plt.gca () x = np.linspace (- 1,1) y = np.linspace (- 1,1) x Xi 1 = np.meshgrid (x, y) P = np.zeros_like (x, y) for I, xi in enumerate (x): for j, yj in enumerate (y): P [I, j] = clf.decision_function ([[xi, yj]]) ax.contour (x, y) 1, P, colors='k', levels= [- 1,0,0.9], alpha=0.5,linestyles= ['-','-' '-]) plt.scatter (clf.support_vectors_ [:, 0], clf.support_vectors_ [:, 1], edgecolor='b',s=80, facecolors='none')
Running result
These are all the contents of the article "how to achieve support vector machine data classification and regression prediction in Python". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 258
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.