Support Vector Machines

Basic SVM Formulation

Formulating a Decision Rule

\[ \overrightarrow w \cdot \overrightarrow u \ge c \]
\[ \overrightarrow w \cdot \overrightarrow u + b \ge 0 \tag{1} \]

Calculate an Equation for the Margins

\[ \overrightarrow w \cdot \overrightarrow x_+ + b \ge 1 \]
\[ \overrightarrow w \cdot \overrightarrow x_- + b \le -1 \]
\[ y_i (\overrightarrow w \cdot \overrightarrow x_i + b) \ge 1 \implies y_i (\overrightarrow w \cdot \overrightarrow x_i + b) -1 \ge 0 \]
\[ y_i (\overrightarrow w \cdot \overrightarrow x_i + b) -1 = 0 \tag{2} \]

Calculating the Width of our Decision Boundary/Margins

\[ \text{width} = \hat w \cdot (\overrightarrow{x_+} - \overrightarrow{x_-}) \]
\[ \text{where: } \hat w = \frac{\overrightarrow w}{||w||} \]

\[ x_+ = \frac{1 - b}{\overrightarrow w}, x_- = \frac{1 + b}{\overrightarrow w} \]
\[ \text{width} = \hat w \cdot (\frac{2}{\overrightarrow w}) = \frac{\overrightarrow w}{||w||} \cdot \frac{2}{\overrightarrow w} = \frac{2}{||w||} \]
\[ \operatorname*{argmin}_w \frac12||w||^2 \tag{3} \]

Optimization

\[ L = \frac12||w||^2 - \sum_i\alpha_i[y_i (\overrightarrow w \cdot \overrightarrow x_i + b) -1] \]
\[ \frac{\partial L}{\partial w} = 0 \]
\[ \frac{\partial L}{\partial w} = w- \sum_i\alpha_iy_ix_i \]
\[ w = \sum_i\alpha_iy_ix_i \tag{4} \]
\[ \frac{\partial L}{\partial b} = 0 \]
\[ \frac{\partial L}{\partial b} = \sum_i\alpha_iy_i \]
\[ 0 = \sum_i\alpha_iy_i \]
\[ L = \frac12||w||^2 - \sum_i\alpha_i[y_i (\overrightarrow w \cdot \overrightarrow x_i + b) -1] \]
\[ L = \frac12(\sum_i\alpha_iy_ix_i)(\sum_j\alpha_jy_jx_j) - \sum_i\alpha_iy_i\overrightarrow x_i (\sum_j\alpha_jy_jx_j)- b\sum_i\alpha_iy_i + \sum_i\alpha_i \]
\[ \text{since: } \sum_i\alpha_iy_i = 0 \]
\[ L = \frac12(\sum_i\alpha_iy_ix_i)(\sum_j\alpha_jy_jx_j) - \sum_i\alpha_iy_i\overrightarrow x_i (\sum_j\alpha_jy_jx_j) + \sum_i\alpha_i \]
\[ L = \sum_i\alpha_i - \frac12(\sum_i\alpha_iy_ix_i)(\sum_j\alpha_jy_jx_j) \]
\[ L = \sum_i\alpha_i - \frac12\sum_i\sum_j\alpha_iy_i\alpha_jy_j(x_i \cdot x_j) \tag{5} \]

Bringing all of this Together

\[ \overrightarrow w \cdot \overrightarrow u + b \ge 0 \tag{1} \]
\[ y_i (\overrightarrow w \cdot \overrightarrow u + b) -1 = 0 \tag{2} \]

\[ \operatorname*{argmin}_w \frac12||w||^2 \tag{3} \]
\[ w = \sum_i\alpha_iy_ix_i \tag{4} \]
\[ \sum_i\alpha_iy_i(x_i \cdot \overrightarrow u) + b \ge 0 \tag{6} \]

Key Takeaways

The Kernel Function

\[ L = \sum_i\alpha_i - \frac12\sum_i\sum_j\alpha_iy_i\alpha_jy_j(\phi(x_i) \cdot \phi(x_j)) \tag5 \]
\[ \sum_i\alpha_iy_i(\phi(x_i) \cdot \phi(\overrightarrow u)) + b \ge 0 \tag6 \]
\[ K(\overrightarrow x_i, \overrightarrow x_j)=\phi(x_i) \cdot \phi(x_j) \]
\[ L = \sum_i\alpha_i - \frac12\sum_i\sum_j\alpha_iy_i\alpha_jy_jK(\overrightarrow x_i, \overrightarrow x_j) \tag5 \]
\[ \sum_i\alpha_iy_iK(\overrightarrow x_i, \overrightarrow x_j) + b \ge 0 \tag6 \]

Polynomial kernel:

\[ K(\overrightarrow x_i, \overrightarrow x_j) = (x_i \cdot x_j + r)^n \]

Exponential/Radial Basis Kernel

\[ K(\overrightarrow x_i, \overrightarrow x_j) = \exp(-\frac{x_i - x_j}{\sigma}) \]

Resources

Previous:Soft K-Means Clustering