Pytorch implementation of Neural Combinatorial Optimization with Reinforcement Learning
The neural network consists in LSTM encoder decoder with an attention module connecting the decoder to the encoder. The model is trained by Policy Gradient.Given a list of cities and the distances between each pair of cities, what is the shortest possible route that visits each city exactly once and returns to the origin city? Wikipedia
- pytorch
- matplotlib
- or-gym
pip install or-gym
This repository is tested ...
- Windows 10
- Python 3.6
- Pytorch 1.7.1
We did not use the reward function in the paper.
We used the sum of rewards from the emulator. but the emulator doesn't go back to the starting city.
After the game, the distance from the last city and the starting city is added to the total reward.
python tsp_trainer.py --mode active-search
python tsp_trainer.py --mode actor-critic
python tsp_tester.py --seq_len 20 --actor_dir ./result/actor.pth
--lr 3e-4
--embedding_size 128
--hidden_size 128
--grad_clip 1.5
--decay 0.01
--n_glimpses 2
--tanh_exploration 10
--beta 0.99
--episode 1000
--seq_len10
--mode active-search
--result_dir ./result
--actor_dir ./result/actor.pth #use tsp_tester
- RL (Actor-critic)
- Pointer Network