General kernel models are predictors of the form
The algorithm is based on Projected dual-preconditioned Stochastic Gradient Descent. A derivation is given in this paper
Title: Toward Large Kernel Models (2023)
Authors: Amirhesam Abedsoltan, Mikhail Belkin, Parthe Pandit.
TL;DR: Recent studies indicate kernel machines can perform similarly or better than deep neural networks (DNNs) on small datasets. The interest in kernel machines has been additionally bolstered by their equivalence to wide neural networks. However, a key feature of DNNs is their ability to scale the model size and training data size independently, whereas in traditional kernel machines the model size is tied to the data size. EigenPro3 provides a way forward for constructing large-scale general kernel models that decouple the model and training data.
pip install git+https://github.com/EigenPro/EigenPro3.git
Tested on:
- pytorch
$\ge$ 1.13 (not installed along with this package) - CUDA
$\geq$ 11.6
Set an environment variable export DATA_DIR='/path/to/dataset/'
where cifar-10-batches-py
can be downloaded.
from torchvision.datasets import CIFAR10
import os
CIFAR10(os.environ['DATA_DIR'], train=True, download=True)
import torch
from eigenpro3.utils import accuracy, load_dataset
from eigenpro3.datasets import CustomDataset
from eigenpro3.models import KernelModel
from eigenpro3.kernels import laplacian, ntk_relu
p = 5000 # model size
if torch.cuda.is_available():
DEVICES = [torch.device(f'cuda:{i}') for i in range(torch.cuda.device_count())]
else:
DEVICES = [torch.device('cpu')]
kernel_fn = lambda x, z: laplacian(x, z, bandwidth=20.0)
# kernel_fn = lambda x, z: ntk_relu(x, z, depth=2)
n_classes, (X_train, y_train), (X_test, y_test) = load_dataset('cifar10')
centers = X_train[torch.randperm(X_train.shape[0])[:p]]
testloader = torch.utils.data.DataLoader(
CustomDataset(X_test, y_test.argmax(-1)), batch_size=512,
shuffle=False, pin_memory=True)
model = KernelModel(n_classes, centers, kernel_fn, X=X_train, y=y_train, devices=DEVICES)
model.fit(model.train_loaders, testloader, score_fn=accuracy, epochs=20)
If you want to train your kernel model 1 batch at a time time, you can use your own dataloader and call the fit_batch
method for the model
object.
Refer to the demo FashionMNIST_batched.ipynb where you can use your own dataloader. A similar method fit_epoch
is also provided.
EigenPro2 can only train models of the form
EigenPro3 trains a smaller model of size
EigenPro3 solves the optimization problem,
It applies a dual preconditioner, one for the model and one for the data. It applies a projected-preconditioned SGD