This repository contains a PyTorch implementation of Twin Delayed Deep Deterministic Policy Gradients (TD3), a reinforcement learning algorithm that addresses some of the key challenges associated with continuous control tasks. The TD3 algorithm builds on the foundation of Deep Deterministic Policy Gradients (DDPG) by introducing several improvements to enhance stability and performance. One of the primary motivations behind TD3 is to mitigate the overestimation bias in Q-learning, which can lead to suboptimal policies. To achieve this, the authors proposed using a pair of critic networks to provide more accurate Q-value estimates. Additionally, TD3 employs a delayed policy update strategy, which reduces the variance in policy updates and helps in achieving more robust learning. Finally, the introduction of target policy smoothing adds noise to the target action, which reduces the likelihood of policy exploitation due to function approximation errors.
🤔 it kinds seems like the catastrophic drops in average score are occuring at regular intervals... could this be a function of the parameter updates?
I'm also not convinced I'm handling the action's correctly for envs with action bounds | x | > 1.
Install the required dependencies using the following command:
pip install -r requirements.txt
You can run the algorithm on any supported Gymnasium environment. For example:
python main.py --env 'LunarLanderContinuous-v2'
No hyperparameter tuning was conducted for the various environments. This was an intentional choice to compare the generalization of the algorithm to different tasks. For this reason, the agent successfully learn in some cases, and in others was still training after 10,000 epochs.
Pendulum-v1 |
LunarLanderContinuous-v2 |
MountainCarContinuous-v0 |
BipedalWalker-v3 |
Hopper-v4 |
Humanoid-v4 |
Ant-v4 |
HalfCheetah-v4 |
HumanoidStandup-v4 |
InvertedDoublePendulum-v4 |
InvertedPendulum-v4 |
Pusher-v4 |
Reacher-v4 |
Swimmer-v3 |
Walker2d-v4 |
Special thanks to Phil Tabor, an excellent teacher! I highly recommend his Youtube channel.