Project Overview This project is an educational and highly technical implementation of Support Vector Machines (SVM) developed entirely "from scratch" in Python. It avoids high-level machine learning libraries like scikit-learn to demonstrate the underlying mathematics and optimization processes.
The implementation strictly relies on numpy for vector/matrix operations, scipy (scipy.optimize.minimize via SLSQP) for solving the quadratic programming and Lagrangian dual problems, and matplotlib for visualizing decision boundaries and support vectors.
📂 Architecture & Core Modules The project is modularized into several iterations, progressively scaling the complexity of the SVM algorithm:
hard_margin/ (Linear Model) The baseline algorithm designed for perfectly linearly separable datasets. It solves the primal/dual formulation to find the optimal hyperplane without error tolerance.
soft_margin/ (Smoothed Model) An evolution of the hard margin model that introduces slack variables ($\epsilon$) and a regularization parameter ($C$). This allows the model to handle overlapping or noisy data that isn't perfectly separable by tolerating certain margin violations.
dual_kernel/ (Kernel Trick) This module solves the SVM problem using its Dual formulation to apply the Kernel Trick. It implements non-linear transformations (such as Gaussian/RBF and Polynomial kernels) to accurately classify complex distributions like concentric rings or crossed data, calculating the Gram matrix dynamically.
multiclass_classifier/ (Multiclass Architectures) Since native SVM is strictly a binary classifier, this module builds on the dual_kernel to support $N$-class datasets using two standard strategies:
One-vs-All (OvA): Trains $N$ binary models comparing each class against the rest. One-vs-One (OvO): Trains $\frac{N(N-1)}{2}$ binary models using a voting system. Includes automated execution time benchmarking (time.perf_counter) to compare the computational complexity of both strategies. ⚙️ Shared Core (core/) To adhere to the DRY (Don't Repeat Yourself) principle, the project extracts all shared utilities into a central core/ package:
datasets.py: Generates synthetic datasets (e.g., rings, blobs) for testing different scenarios. kernels.py: Contains the mathematical definitions for Linear, Polynomial, and Gaussian (RBF) kernel functions. visualization.py: Standardized plotting functions to map data points, highlight support vectors, and draw complex non-linear decision boundaries. 🖼️ Assets (assets/) When any of the main scripts are executed, the project automatically generates high-quality 2D plots and graphs, saving them grouped by algorithm into the assets/ directory to serve as visual proof of the mathematical implementations.
I'm open to new opportunities. Let's discuss how I can bring this level of engineering to your team.
Let's Talk