close
close
kernel svm from scratch python

kernel svm from scratch python

3 min read 23-11-2024
kernel svm from scratch python

Implementing Kernel SVM from Scratch in Python

Support Vector Machines (SVMs) are powerful supervised learning models renowned for their effectiveness in classification and regression tasks. Kernel SVMs extend the capabilities of linear SVMs by enabling them to handle non-linearly separable data. This article will guide you through the implementation of a Kernel SVM from scratch in Python, focusing on the crucial concepts and steps involved. We'll be using the Gaussian Radial Basis Function (RBF) kernel for demonstration.

Understanding Kernel SVMs

The core idea behind Kernel SVMs is to map the data points into a higher-dimensional feature space where they become linearly separable. Instead of explicitly performing this mapping (which can be computationally expensive), we use a kernel function – a shortcut that calculates the dot product in the high-dimensional space without actually transforming the data. The RBF kernel is a popular choice due to its ability to handle complex non-linear relationships:

K(x_i, x_j) = exp(-gamma * ||x_i - x_j||^2)

where:

  • x_i and x_j are data points.
  • gamma is a hyperparameter controlling the width of the Gaussian function.

Implementation Steps

  1. Data Preparation: We'll start with a simple example dataset. For this tutorial, we'll create synthetic data using scikit-learn's make_moons function to generate non-linearly separable data.

  2. RBF Kernel Function: We define a function to compute the RBF kernel for pairs of data points.

  3. Kernel Matrix: We construct the kernel matrix, a matrix where each element (i, j) represents the kernel function applied to data points x_i and x_j.

  4. Optimization: This is the most computationally intensive part. We need to solve a quadratic programming problem to find the optimal Lagrange multipliers (α). Since we're building this from scratch, we'll use a simplified approach suitable for smaller datasets (avoiding dedicated QP solvers like those found in libraries like cvxopt). This simplified approach uses gradient descent. Note that for larger datasets, using a dedicated QP solver is highly recommended.

  5. Prediction: Once we have the optimal Lagrange multipliers, we can predict the class of new data points.

Python Code

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_moons

# 1. Data Preparation
X, y = make_moons(n_samples=100, noise=0.1, random_state=42)
y = np.where(y == 0, -1, 1)  # Map labels to -1 and 1

# 2. RBF Kernel Function
def rbf_kernel(x1, x2, gamma):
    return np.exp(-gamma * np.linalg.norm(x1 - x2)**2)

# 3. Kernel Matrix
gamma = 1
n_samples = X.shape[0]
kernel_matrix = np.zeros((n_samples, n_samples))
for i in range(n_samples):
    for j in range(n_samples):
        kernel_matrix[i, j] = rbf_kernel(X[i], X[j], gamma)

# 4. Simplified Optimization (Gradient Descent) -  NOT ROBUST FOR LARGE DATASETS!
alpha = np.zeros(n_samples)
learning_rate = 0.01
iterations = 1000
for _ in range(iterations):
    gradient = np.zeros(n_samples)
    for i in range(n_samples):
        gradient[i] = 1 - y[i] * np.sum(alpha * y * kernel_matrix[:,i])
    alpha += learning_rate * gradient
    alpha = np.clip(alpha, 0, 1) # Ensuring alpha remains between 0 and 1

# 5. Prediction
b = np.mean(y - np.dot(alpha * y, kernel_matrix))
def predict(x):
    prediction = np.sum(alpha * y * np.array([rbf_kernel(xi, x, gamma) for xi in X])) + b
    return 1 if prediction > 0 else -1

# Visualization
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.1), np.arange(y_min, y_max, 0.1))
Z = np.array([predict(np.array([x, y])) for x, y in zip(xx.ravel(), yy.ravel())])
Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, alpha=0.4)
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Paired)
plt.title("Kernel SVM (RBF Kernel) - Simplified Implementation")
plt.show()

Important Note: The optimization step using gradient descent in this example is a vastly simplified version and is not suitable for large datasets or real-world applications. For production-ready code, use established QP solvers like those in cvxopt or rely on optimized implementations within libraries like scikit-learn. This example serves as an educational tool to illustrate the core concepts of Kernel SVMs. Remember to install the necessary libraries (numpy, matplotlib, scikit-learn).

Related Posts


Latest Posts


Popular Posts