CoolMomentum Optimizer for Deep Neural Networks

Stochastic Optimization by Langevin Dynamics with Simulated Annealing

This repository contains implementations for CoolMomentum: A Method for Stochastic Optimization by Langevin Dynamics with Simulated Annealing (published in Scientific Reports) in TensorFlow and PyTorch.

Usage

In TensorFlow:

from coolmomentum_tf import Coolmomentum                           
opt=Coolmomentum(learning_rate=0.01, rho_0=0.99, alpha=0.99997)
model.compile(loss='categorical_crossentropy',
              optimizer=opt,
              metrics=['accuracy'])

• learning_rate is a squared timestep "dt^2". Default learning_rate=0.01.
• rho_0 is an initial value of the momentum coefficient. Default rho_0=0.99.
• alpha is a cooling rate, being a Simulated Annealing parameter. Calculated as alpha=(1-rho_0)^(1/S), where S is a total number of iterations. If alpha=1 the momentum coefficient is constant and Simulated Annealing is not applied. Then the optimizer behaves like simple Momentum.

Benchmarking Coolmomentum on CIFAR-10 with ResNet-34

Requirements: Python 3.6+, Pytorch 1.3+, tqdm

The benchmarking was done by modification of this code by Istvan Fehervari and running:

• python train_cifar10.py

The results obtained are compared against those calculated by Istvan Fehervari for other popular optimizers:

Benchmarking Coolmomentum on Imagenet with Efficientnet-B0

The benchmarking was done by modification of this code by Tensorflow Team and running:

python3 main_cool.py
--tpu=${TPU_NAME}
--data_dir=${DATA_DIR}
--model_dir=${MODEL_DIR}
--model_name='efficientnet-b0'
--skip_host_call=true
--train_batch_size=2048
--train_steps=218948

The result obtained is compared to that of the original code by Tensorflow Team:

Comparison with Adam and SGD optimizers on CIFAR-10 with ResNet-20

Requirements: Python 3.6+, Tensorflow 2.x

The comparison was done by modification of this Keras example and running

• python resnet_adam200.py
• python resnet_sgd200.py
• python resnet_cool200.py

Rescaled temperature:

Comparison with SGD optimizer on the Penn Treebank dataset with LSTM

Requirements: Python 3.6+, Pytorch 1.3+

In PyTorch:

The comparison was done using this code, by running

python main.py --batch_size 20 --data data/penn --dropouti 0.4 --dropouth 0.25 --seed 141 --epoch 500

For the honest comparison of SGD and CoolMomemtum the ASGD optimizer was not used.

SGD was replaced by CoolMomentum with commands

from coolmom_pytorch import SGD		
optimizer = SGD(params, lr=0.1, momentum=0.99,  weight_decay=args.wdecay, beta=0.9999998018)

References:

Kirkpatrick, Scott, C. Daniel Gelatt, and Mario P. Vecchi. "Optimization by simulated annealing." Science 220.4598 (1983): 671-680.

Ma, Y. A., Chen, Y., Jin, C., Flammarion, N., & Jordan, M. I. "Sampling can be faster than optimization". Proceedings of the National Academy of Sciences, 116 (2019) 20881-20885.

Name		Name	Last commit message	Last commit date
Latest commit History 93 Commits
models		models
optimizers		optimizers
Fig_1_a.png		Fig_1_a.png
Fig_1_b.png		Fig_1_b.png
Fig_1_c.png		Fig_1_c.png
Figure_1_a.png		Figure_1_a.png
Figure_1_b.png		Figure_1_b.png
Figure_1_c.png		Figure_1_c.png
Figure_2.png		Figure_2.png
Figure_LSTM.png		Figure_LSTM.png
README.md		README.md
coolmom_pytorch.py		coolmom_pytorch.py
coolmomentum_tf.py		coolmomentum_tf.py
log_writer.py		log_writer.py
main_cool.py		main_cool.py
resnet_adam200.py		resnet_adam200.py
resnet_cool200.py		resnet_cool200.py
resnet_sgd200.py		resnet_sgd200.py
train_cifar10.py		train_cifar10.py
utils_cool.py		utils_cool.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CoolMomentum Optimizer for Deep Neural Networks

Stochastic Optimization by Langevin Dynamics with Simulated Annealing

Usage

Benchmarking Coolmomentum on CIFAR-10 with ResNet-34

Benchmarking Coolmomentum on Imagenet with Efficientnet-B0

Comparison with Adam and SGD optimizers on CIFAR-10 with ResNet-20

Comparison with SGD optimizer on the Penn Treebank dataset with LSTM

References:

About

Releases

Packages

Languages

borbysh/coolmomentum

Folders and files

Latest commit

History

Repository files navigation

CoolMomentum Optimizer for Deep Neural Networks

Stochastic Optimization by Langevin Dynamics with Simulated Annealing

Usage

Benchmarking Coolmomentum on CIFAR-10 with ResNet-34

Benchmarking Coolmomentum on Imagenet with Efficientnet-B0

Comparison with Adam and SGD optimizers on CIFAR-10 with ResNet-20

Comparison with SGD optimizer on the Penn Treebank dataset with LSTM

References:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages