pythonnumpyscipymatplotlibsupport-vector-machineskernel-trickmulticlass-classification

Implemented Support Vector Machine (SVM) Algorithms

Project Overview This project is an educational and highly technical implementation of Support Vector Machines (SVM) developed entirely "from scratch" in Python. It avoids high-level machine learning libraries like scikit-learn to demonstrate the underlying mathematics and optimization processes.

The implementation strictly relies on numpy for vector/matrix operations, scipy (scipy.optimize.minimize via SLSQP) for solving the quadratic programming and Lagrangian dual problems, and matplotlib for visualizing decision boundaries and support vectors.

📂 Architecture & Core Modules The project is modularized into several iterations, progressively scaling the complexity of the SVM algorithm:

hard_margin/ (Linear Model) The baseline algorithm designed for perfectly linearly separable datasets. It solves the primal/dual formulation to find the optimal hyperplane without error tolerance.
soft_margin/ (Smoothed Model) An evolution of the hard margin model that introduces slack variables ($\epsilon$) and a regularization parameter ($C$). This allows the model to handle overlapping or noisy data that isn't perfectly separable by tolerating certain margin violations.
dual_kernel/ (Kernel Trick) This module solves the SVM problem using its Dual formulation to apply the Kernel Trick. It implements non-linear transformations (such as Gaussian/RBF and Polynomial kernels) to accurately classify complex distributions like concentric rings or crossed data, calculating the Gram matrix dynamically.
multiclass_classifier/ (Multiclass Architectures) Since native SVM is strictly a binary classifier, this module builds on the dual_kernel to support $N$-class datasets using two standard strategies:

One-vs-All (OvA): Trains $N$ binary models comparing each class against the rest. One-vs-One (OvO): Trains $\frac{N(N-1)}{2}$ binary models using a voting system. Includes automated execution time benchmarking (time.perf_counter) to compare the computational complexity of both strategies. ⚙️ Shared Core (core/) To adhere to the DRY (Don't Repeat Yourself) principle, the project extracts all shared utilities into a central core/ package:

datasets.py: Generates synthetic datasets (e.g., rings, blobs) for testing different scenarios. kernels.py: Contains the mathematical definitions for Linear, Polynomial, and Gaussian (RBF) kernel functions. visualization.py: Standardized plotting functions to map data points, highlight support vectors, and draw complex non-linear decision boundaries. 🖼️ Assets (assets/) When any of the main scripts are executed, the project automatically generates high-quality 2D plots and graphs, saving them grouped by algorithm into the assets/ directory to serve as visual proof of the mathematical implementations.

View Code

Like What You See?