This repository explores various methods to generate heavy tails in the weight matrix spectrum of neural networks without the influence of gradient noise. We specifically train shallow neural networks using full-batch Gradient Descent (GD) or Adam optimizer with large learning rates over multiple steps.
To get started, set up your virtual environment and install the required dependencies:
$ python3.9 -m venv .venv
$ source .venv/bin/activate
$ pip install -r requirements.txt
Investigate the properties of weights, features, overlap matrices, and more for a single configuration:
(.venv) $ python main.py configs/main.yml
To run with a learning rate schedule:
(.venv) $ python main.py configs/main_lr_schedule.yml
Conduct experiments with multiple runs to plot losses, Kernel Target Alignment (KTA), and Power Law (PL) Alphas for different learning rates and optimizers:
(.venv) $ python bulk_lr.py configs/bulk_lr.yml
Perform experiments with multiple runs to plot the losses for different parameter settings:
(.venv) $ python bulk_losses.py configs/bulk_losses_vary_n.yml
(.venv) $ python bulk_losses.py configs/bulk_losses_vary_reg_lambda.yml
(.venv) $ python bulk_losses.py configs/bulk_losses_vary_label_noise_std.yml
(.venv) $ python bulk_losses.py configs/bulk_losses_vary_step_lr_gamma.yml
The outputs of the experiments are stored in the out/
directory, named according to a hash value based on the experiment context.
@misc{kothapalli2024crafting,
title={Crafting Heavy-Tails in Weight Matrix Spectrum without Gradient Noise},
author={Vignesh Kothapalli and Tianyu Pang and Shenyang Deng and Zongmin Liu and Yaoqing Yang},
year={2024},
eprint={2406.04657},
archivePrefix={arXiv},
}