Release v2.5.0
New Algorithm
Cal-QL has been added to d3rlpy in v2.5.0! Please check a reproduction script here. To support faithful reproduction, SparseRewardTransitionPicker
has been also added, which is used in the reproduction script.
Custom Algorithm Example
One of the frequent questions is "How can I implement a custom algorithm on top of d3rlpy?". Now, the new example script has been added to answer this question. Based on this example, you can build your own algorithm while you can utilize a whole training pipeline provided by d3rlpy. Please check the script here.
Enhancement
- Exporting Decision Transformer models as TorchScript and ONNX has been implemented. You can use this feature via
save_policy
method in the same way as you use with Q-learning algorithms. - Tuple observation support has been added to PyTorch/ONNX export.
- Modified return-to-go calculation for Q-learning algorithms and skip this calculation if return-to-go is not necessary.
n_updates
option has been added tofit_online
method to control update-to-data (UTD) ratio.write_at_termination
option has been added toReplayBuffer
.
Bugfix
- Action scaling has been fixed for D4RL datasets.
- Default replay buffer creation at
fix_online
method has been fixed.