Skip to content

borbysh/coolmomentum

Repository files navigation

CoolMomentum Optimizer for Deep Neural Networks

Stochastic Optimization by Langevin Dynamics with Simulated Annealing

This repository contains implementations for CoolMomentum: A Method for Stochastic Optimization by Langevin Dynamics with Simulated Annealing (published in Scientific Reports) in TensorFlow and PyTorch.

Usage

In TensorFlow:

from coolmomentum_tf import Coolmomentum                           
opt=Coolmomentum(learning_rate=0.01, rho_0=0.99, alpha=0.99997)
model.compile(loss='categorical_crossentropy',
              optimizer=opt,
              metrics=['accuracy'])

• learning_rate is a squared timestep "dt^2". Default learning_rate=0.01.
• rho_0 is an initial value of the momentum coefficient. Default rho_0=0.99.
• alpha is a cooling rate, being a Simulated Annealing parameter. Calculated as alpha=(1-rho_0)^(1/S), where S is a total number of iterations. If alpha=1 the momentum coefficient is constant and Simulated Annealing is not applied. Then the optimizer behaves like simple Momentum.

Benchmarking Coolmomentum on CIFAR-10 with ResNet-34

Requirements: Python 3.6+, Pytorch 1.3+, tqdm

The benchmarking was done by modification of this code by Istvan Fehervari and running:

• python train_cifar10.py

The results obtained are compared against those calculated by Istvan Fehervari for other popular optimizers:

Train Loss Train accuracy Test accuracy

Benchmarking Coolmomentum on Imagenet with Efficientnet-B0

The benchmarking was done by modification of this code by Tensorflow Team and running:

python3 main_cool.py
--tpu=${TPU_NAME}
--data_dir=${DATA_DIR}
--model_dir=${MODEL_DIR}
--model_name='efficientnet-b0'
--skip_host_call=true
--train_batch_size=2048
--train_steps=218948

The result obtained is compared to that of the original code by Tensorflow Team:

Top accuracies

Comparison with Adam and SGD optimizers on CIFAR-10 with ResNet-20

Requirements: Python 3.6+, Tensorflow 2.x

The comparison was done by modification of this Keras example and running

• python resnet_adam200.py
• python resnet_sgd200.py
• python resnet_cool200.py

Training results Training results

Rescaled temperature:

Training results

Comparison with SGD optimizer on the Penn Treebank dataset with LSTM

Requirements: Python 3.6+, Pytorch 1.3+

In PyTorch:

The comparison was done using this code, by running

python main.py --batch_size 20 --data data/penn --dropouti 0.4 --dropouth 0.25 --seed 141 --epoch 500

For the honest comparison of SGD and CoolMomemtum the ASGD optimizer was not used.

SGD was replaced by CoolMomentum with commands

from coolmom_pytorch import SGD		
optimizer = SGD(params, lr=0.1, momentum=0.99,  weight_decay=args.wdecay, beta=0.9999998018)

Training results

References:

Kirkpatrick, Scott, C. Daniel Gelatt, and Mario P. Vecchi. "Optimization by simulated annealing." Science 220.4598 (1983): 671-680.

Ma, Y. A., Chen, Y., Jin, C., Flammarion, N., & Jordan, M. I. "Sampling can be faster than optimization". Proceedings of the National Academy of Sciences, 116 (2019) 20881-20885.

About

Implementation for the CoolMomentum Optimizer.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages