SVMs Don't Learn Patterns: They Find the Best Boundary Between Them

svm support-vector-machine classification hyperplane supervised-learning

Support Vector Machines approach classification differently from everything I've studied so far. Decision Trees ask questions recursively. Logistic Regression estimates probabilities. SVMs do something geometrically clean: find the line (or plane, or hyperplane) that separates classes with the maximum possible margin.

The hyperplane is a decision boundary described by:

\mathbf{w} \cdot \mathbf{x} + b = 0

In 2D this is a line. In 3D it's a plane. In higher dimensions it's a hyperplane. Everything on one side gets one class label, everything on the other side gets the other.

There are infinitely many hyperplanes that could separate two linearly separable classes. SVM picks the one that maximizes the margin, the gap between the hyperplane and the nearest data points from each class.

The data points that sit exactly on the margin boundary are the support vectors. These are the only points that determine where the hyperplane ends up: remove any other point and the hyperplane doesn't move.

The objective balances margin maximization against penalizing misclassifications. Hinge Loss handles the penalty: if a point is correctly classified outside the margin, loss = 0. If it's inside the margin (or misclassified), loss $> 0$ .