Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to understand SVM

2025-03-29 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)05/31 Report--

This article mainly introduces the relevant knowledge of "how to understand SVM". The editor shows you the operation process through actual cases, the operation method is simple and fast, and it is practical. I hope this article "how to understand SVM" can help you solve the problem.

SVM classifies two points. There are two known sample points. If the SVM model is used, the decision boundary is line g, whose slope is the vertical direction of the slope of the two known sample points and passes through the midpoint of the two points.

This line g is considered by SVM to be the best boundary line for classifying two sample points.

SVM classifies multiple points

Does the optimal decision boundary change by adding more sample points, but consciously bringing them into line with the above distribution? No. Although there are many sample points, SVM believes that the two points that play a supporting role, support vector is them, the name got, of course, so the decision boundary has not changed. All these are directly observed. How does the computer do this?

It also inspires us that when SVM establishes a decision boundary, it only cares about the two sample points closest to the decision boundary, and then takes the decision edge g farthest from both of them, and thinks that g is the best decision boundary.

With the above foundation, the structure of the SVM objective function is almost known: max (min ()), SVM adds a constraint, and the benefit is that the objective function is more concise:

Arg max 1 / | | w | |

S.t., yallowf (x) > = 1

Note that this more concise objective function must satisfy the above constraints, which are symbiotic and indispensable.

The maximum value is converted to the minimum value. In machine learning, those who encounter the maximum value of the objective function will be transformed into the minimum value, and the conventional routine, SVM is no exception.

It is also very simple, the denominator is the smallest, the original formula can be the largest, that is:

Arg min 0.5 * | | w | | ^ 2

S.t., y * f (x) > = 1

There is no special reason why the objective function has a coefficient of 0.5, but it is convenient to simplify the derivative.

This is a common quadratic programming problem, and there are many methods to solve it, such as Lagrangian method, Lemke method, interior point method, efficient set method, ellipsoidal algorithm and so on.

The Lagrangian method is used to solve the above objective function of SVM, we can consult the data to understand this method, and KKT is also used in it, which is transformed into the problem of finding the minimum value of alfa_i b, then the maximum value of alfa_i, and then the parameters w and b, which is over.

SVM also considers soft spacing and kernel function problems. The noise appeared. As shown in the following figure, there is a circled noise point in the lower right corner. Where is the decision boundary?

If the decision boundary is like this, you can see that it is not a good decision boundary, because the noise point is the wrong point and should not be used as a support vector.

Overcoming noise: relaxation factor. SVM relaxes the constraints appropriately by relaxing yi * f (xi) > = 1 to yi * f (xi) > = 1-ei. This interval ei is the soft interval. Why subtract ei instead of adding ei, because the former may make more sample points hold, for example, in the first image, as a positive support vector point, it may not satisfy yi * f (xi) > = 1, but it may satisfy yi * f (xi) > = 1-ei, so that even if the noise points appear, we may still get the first classification decision boundary.

Ei is called relaxation factor in SVM, and control factor C is used in SVM to control ei. When C is very large, ei plays a very small role, that is, relaxation is very small; when C is very small, ei plays a great role, and the role of relaxation may be stronger.

The above introduced the SVM parameter solution and the soft interval part, which are not the most ingenious part of SVM. The real beauty of SVM is that it transforms a linear inseparable problem in low-dimensional space into a linear separable problem in high-latitude space. For example, a data set has three features, which is linearly separable in this space, but after it is transformed into nine features, it may be linearly separable in this space. As shown in the following figure, linearly inseparable in two-dimensional space If you convert it to three dimensions, you can get a linearly separable surface.

How do you do this? Kernel functions map the original features from low-dimensional to high-dimensional.

After the data is mapped to the high-dimensional space, does the complexity of the solution increase sharply? No, it won't. The result of calculating the inner product of the sample point in the low-dimensional space only needs to spend O (1) time complexity to directly transform the result of the inner product in the high-dimensional space.

Talk about SVM.

How do you get objective functions and constraints?

It's easier to understand how SVM's objective functions and constraints come from, so write it down and share it with you. If g (x) = wx+b, the distance from the sample point to g (x) is:

| | g (x) | / | | w |

When establishing the decision boundary, SVM only cares about the two sample points closest to the decision boundary, and then takes the decision edge farthest from them and converts it into a mathematical formula as follows:

Max (min (| g (x) | / | | w | |))

Simplify it to:

Max (1 / | | w | |)

S.t. Yi*g (xi) > = 1

How did you come up with that?

If set | g (x) | > = 1, then min (| g (x) | / | | w |) = 1 / | w | |, further, max (min (| g (x) | / | | w | |)), which can be simplified to:

Max (1 / | | w | |)

So, | g (x) | > = 1, how to simplify it to yi * g (xi) > = 1? Notice that yi is the label value of sample I, which is either 1 or-1. When g (x) > = 0, it represents a positive example, that is, yi = 1, when g (x)

< 0,代表负例,即 yi = -1,因此,|g(x)| = yi * g(x) >

= 1.

OK. The next step is to solve the optimization problems with the following optimization objectives and constraints:

Max (1 / | | w | |)

S.t. Yi*g (xi) > = 1

This is the end of the introduction to "how to understand SVM". Thank you for reading. If you want to know more about the industry, you can follow the industry information channel. The editor will update different knowledge points for you every day.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report