This repository contains implementations for CoolMomentum: A Method for Stochastic Optimization by Langevin Dynamics with Simulated Annealing (published in Scientific Reports) in TensorFlow and PyTorch.
In TensorFlow:
from coolmomentum_tf import Coolmomentum
opt=Coolmomentum(learning_rate=0.01, rho_0=0.99, alpha=0.99997)
model.compile(loss='categorical_crossentropy',
optimizer=opt,
metrics=['accuracy'])
• learning_rate is a squared timestep "dt^2". Default learning_rate=0.01.
• rho_0 is an initial value of the momentum coefficient. Default rho_0=0.99.
• alpha is a cooling rate, being a Simulated Annealing parameter. Calculated as alpha=(1-rho_0)^(1/S),
where S is a total number of iterations. If alpha=1 the momentum coefficient is constant
and Simulated Annealing is not applied. Then the optimizer behaves like simple Momentum.
Requirements: Python 3.6+, Pytorch 1.3+, tqdm
The benchmarking was done by modification of this code by Istvan Fehervari and running:
• python train_cifar10.py
The results obtained are compared against those calculated by Istvan Fehervari for other popular optimizers:
The benchmarking was done by modification of this code by Tensorflow Team and running:
python3 main_cool.py
--tpu=${TPU_NAME}
--data_dir=${DATA_DIR}
--model_dir=${MODEL_DIR}
--model_name='efficientnet-b0'
--skip_host_call=true
--train_batch_size=2048
--train_steps=218948
The result obtained is compared to that of the original code by Tensorflow Team:
Requirements: Python 3.6+, Tensorflow 2.x
The comparison was done by modification of this Keras example and running
• python resnet_adam200.py
• python resnet_sgd200.py
• python resnet_cool200.py
Rescaled temperature:
Requirements: Python 3.6+, Pytorch 1.3+
In PyTorch:
The comparison was done using this code, by running
python main.py --batch_size 20 --data data/penn --dropouti 0.4 --dropouth 0.25 --seed 141 --epoch 500
For the honest comparison of SGD and CoolMomemtum the ASGD optimizer was not used.
SGD was replaced by CoolMomentum with commands
from coolmom_pytorch import SGD
optimizer = SGD(params, lr=0.1, momentum=0.99, weight_decay=args.wdecay, beta=0.9999998018)
Kirkpatrick, Scott, C. Daniel Gelatt, and Mario P. Vecchi. "Optimization by simulated annealing." Science 220.4598 (1983): 671-680.
Ma, Y. A., Chen, Y., Jin, C., Flammarion, N., & Jordan, M. I. "Sampling can be faster than optimization". Proceedings of the National Academy of Sciences, 116 (2019) 20881-20885.