Preliminary solution and Analysis of SVM in Machine Learning (1): maximum distance 07/01 Update SLTechnology News&Howtos

Preliminary solution and Analysis of SVM in Machine Learning (1): maximum distance

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/03 Report--

During this period of time, I have been reading Zhou Zhihua's "Machine Learning". In the process of reading, I sometimes search for articles written by other people. by contrast, what Professor Zhou said is still quite profound, but when I saw the chapter SVM a few days ago, I felt very obscure. My first feeling was that it was quite abstract, especially for people like myself, whose IQ is not very high, when it comes to high-dimensional vectors. The understanding of the model is quite confused, especially for that geometric distance (or the maximum interval), has been ambiguous, seemingly incomprehensible, I have also read the SVM articles written by others, many do not need to explain the maximum interval model d = 1 / | | w | | Why the molecule is 1 instead of | f (x) |. After thinking hard, I gave an explanation suitable for my own understanding.

The existing n-dimensional dataset Dx= {x1, x2, x3,..., xi,..., xn}, where the category yi ∈ Y of the sample xi is D, and D = {D1, D2,..., di,..., dm} = {(x1), (x2), (x3),..., (xj,yj), (xm,ym)}. The sample distribution is shown in the following figure:

Try to classify these two types of data points (vectors) with a "straight line" (hyperplane), which is located so that the samples on both sides belong to different classes. Obviously, there are many such lines or hyperplanes, so how to choose and define the best classification hyperplane?

For the best hyperplane, the thick black line in the middle intuitively is the best, because it makes the two kinds of sample points farthest away from it, that is, it can make the classification model the most robust, unlike other hyperplane models. the disturbance of the sample (which can be understood as "noise") has little tolerance. Obviously, the "straight line" can be represented by w and b parameters, where w = (w 1, w 2, w 3,..., wi, wn).

The hyperplane is:

Wx + b = 0

Where x = (xi1; xi2; xi3;...; xij; xim); bit column vector, can also be written as w ^ T X + b = 0, then w is the column vector, here it is convenient for formula editing to use the first representation, the meaning is the same. Obviously, w and b are unknown parameters, which are associated with linear programming and generalized. If the sample point (vector) xi satisfies wxi+ b > 0, then xi belongs to + 1 class, that is, yi = + 1; if wxi+ b

< 0 ; 则xi属于-1类，即yi = -1；定义函数f(xi) = wxi + b ,又定义g（xi,yi）= g(di) = yi * (wxi + b) = yi * f(xi) ，显然，yi与f(xi)总是同号的，故g(xi,yi) >

0 .

Back to the point, why define the functions f and g, because we have to use these two functions to obtain the weight vector w and the threshold b, in order to determine the best classification of the hyperplane, then the question is, what conditions does the "optimal" hyperplane need to meet, that is to say, under what conditions, the hyperplane is "optimal"? This translates into an optimization problem. In the discussion just now, we have assumed that the hyperplane L:

Wx + b = 0; then for a given dataset D = {D1, D2,..., di,..., dn} = {(x1 xi,yi), (xn,yn)} There must be at least one positive sample point di = (xi, + 1) and at least one negative sample dj = (xj,-1). For each of them, di is the nearest sample point (vector) "to" L in the positive sample, and the nearest distance is the li homology in the negative sample. In the negative sample, dj is the nearest sample point to L, and the nearest distance is the distance formula from the midpoint to the straight line in lj associative two-dimensional space. At the same time, a promotion is carried out to define a distance γ = li + lj = (| wxi + b | + | wxj + b |) / | | w |. Obviously, L is the hyperplane we want only when γ is maximum. That is:

Max γ

So here comes the problem again. In terms of the form of the above-mentioned γ definition, it does not seem to be easy to solve the maximum value, which I defined by myself, in the week.

In Machine Learning, directly order:

Wxi+ b ≥ 1 (yi = + 1)

{

Wxi+ b ≤-1 (yi =-1)

Then, define the distance d = 2 / | | w | |, I wonder why wxi+ b ≥ 1 (yi = + 1) and wxi+ b ≤-1 (yi =-1) are geometrically parallel to the hyperplane wxi+ b = 0 "parallel" and "distance" is 1, which really puzzles me, so I began to refer to some blogs written by other people. Many people don't understand how this 2 came about, and I didn't find an article that makes it easy for me to understand. Instead, I found a good http://www.blogjava.net/zhenandaci/archive/2009/02/13/254519.html about SVM, but I didn't make it clear that the distance was d = 2 / | | w | |, so I had to think about it myself.

I found that the distances to the hyperplane of li and lj mentioned above must be di and dj for a given sample set.

Determined, that is, in di = wxi + b, although w and b are unknown parameters, di determines, in the same way dj = wxj + b, dj determines, might as well set

DiLj = di + dj

Then the DiLj is also certain, so:

γ = DiLj / | | w | |

For max γ max (1 / | | w | |), this is equivalent to changing di = | wxi + b | / | w |, dj = | wxj + b | / | w | normalized to di' = 1 / | | w |, dj' = 1 / | | w | |, so:

Max 2 / | | w | |

S.t. G (di) = g ((xi,yi)) ≥ 1 (I = 1,2,3,..., m)

Where | | w | | Norm, generally speaking, refers to the length of the vector, w = √ W1 ^ 2 + W2 ^ 2 +. + wn ^ 2, then the above model is equivalent to:

Min 1 take 2 * | | w | | ^ 2

S.t. G (di) = g ((xi,yi)) ≥ 1 (I = 1,2,3,..., m)

It's easy to understand.

The sample point (vector) on wx + b = ±1 is called "support vector". It seems that the model makes more use of "support vector". I am not so sure (to be continued).

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.