Support Vector machines are a very recent technique used for making decision boundaries. This machine learning method has been created by Russian scientists Vladimir Vapnik and Hava Siegelmann in early 90's.
In this article we will see the concept on which support vector machines and an implementation on Scikit-Learn with Python.
Suppose we have a space where we have samples of data points belonging to two classes (Orange and green points in our case)
The question is how to divide the orange points from the green points. The first idea of SVM is that we want to draw a straight line. BUT WHICH LINE ?
As we see in the last figure, there an infinity of lines that solve the problem. But which one to choose?
Well we would like to draw a straight line inside the widest street that separates the orange samples from the green samples.
So the approach is to try to put the line inside the street in such a way that the street between the two classes be as wide as possible.
Decision rule:
The decision rule will be used to choose which class a data point belongs to. Since in SVM we have a straight line that separates the two classes, our decision will be:
wX + b ⟨ 0 : Green sample
wX + b ⟩ 0 : Orange sample
Where X means the data point that we want to classify, and 'w' and 'b' are two parameters of the equation of the line that separates the two classes. Now we should put some constraints in order to solve the problem.
In order to avoid some complicated algebra, we will assume these two constraints
min ||w||²
yi (w Xi + b) -1 = 0 for in Xi on borders
where yi = 1 for orange class and -1 for the green class.
The first constraints comes from the fact that the width of the separating street is inversely proportional to the norm of w. So we would like to minimize its norm to have the maximum width of the street.
Amazing, keep on these good articles
RépondreSupprimer