Reference implementation for the paper Relative gradient optimization of the Jacobian term in unsupervised deep learning
A commented implementation of a training example over the MNIST dataset is found in the following notebooks (JAX and PyTorch versions):
To reproduce the results reported in the paper, see below.
To install required python packages:
pip install -r requirements.txt
MNIST and CIFAR are automatically downloaded when needed.
To download and setup the needed UCI datasets, follow the instructions in and put the individual datasets folders inside experiments/storage
To train the model in the paper, run the command:
$ cd experiments
$ python <args>
Here is the full list of optional arguments:
--epochs number of epochs of training
--batch_size size of the batches
--lr adam: learning rate
--log_every interval between logs
--seed random generator seed
--dataset one among: MNIST | CIFAR | POWER | GAS | HEPMASS | MINIBOONE | BSDS300 | TOY
--toy_name one among: sine | moons | smile | 2spirals | checkerboard | rings | trimodal
--num_layers define a model with `num_layers - 1` linear layers and 1 final affine layer
--nonlinearity nonlinear activation function
--alpha angular coefficient for the left side of RELU-type activation functions
--look_ahead stop training if no improvement has been observed for `look_ahead` epochs
--log_dir directory in which to save the model (ending in '/')
--bias whether to include the bias in the linear layers or not
--trick whether to apply the relative trick to the gradients or not
--generation whether to perform data generation during training
To train the model over the MNIST dataset specifying a set of hyperparameters, run:
$ python --dataset MNIST
--log_dir mnist_run1/
--batch_size 10
--lr 1e-4
--bias True
--num_layers 2
To train over the sine
toy distribution:
$ python --dataset TOY --toy_name sine
To evaluate a trained model, run:
$ cd experiments
$ python --dataset MNIST --log_dir mnist_run1/
Test set evaluation: -1375.2 +- 1.4
Our model achieves the following performance for the density estimation task on the tested datasets:
0.065 +- 0.013 | 6.978 +- 0.020 | -21.958 +- 0.019 | -13.372 +- 0.450 | 151.12 +- 0.28 | -1375.2 +- 1.4 |