Release v2.3.0
Distributed data parallel training
Distributed data parallel training with multiple nodes and GPUs has been one of the most demanded feature. Now, it's finally available! It's extremely easy to use this feature.
Example:
# train.py
from typing import Dict
import d3rlpy
def main() -> None:
# GPU version:
# rank = d3rlpy.distributed.init_process_group("nccl")
rank = d3rlpy.distributed.init_process_group("gloo")
print(f"Start running on rank={rank}.")
# GPU version:
# device = f"cuda:{rank}"
device = "cpu:0"
# setup algorithm
cql = d3rlpy.algos.CQLConfig(
actor_learning_rate=1e-3,
critic_learning_rate=1e-3,
alpha_learning_rate=1e-3,
).create(device=device)
# prepare dataset
dataset, env = d3rlpy.datasets.get_pendulum()
# disable logging on rank != 0 workers
logger_adapter: d3rlpy.logging.LoggerAdapterFactory
evaluators: Dict[str, d3rlpy.metrics.EvaluatorProtocol]
if rank == 0:
evaluators = {"environment": d3rlpy.metrics.EnvironmentEvaluator(env)}
logger_adapter = d3rlpy.logging.FileAdapterFactory()
else:
evaluators = {}
logger_adapter = d3rlpy.logging.NoopAdapterFactory()
# start training
cql.fit(
dataset,
n_steps=10000,
n_steps_per_epoch=1000,
evaluators=evaluators,
logger_adapter=logger_adapter,
show_progress=rank == 0,
enable_ddp=True,
)
d3rlpy.distributed.destroy_process_group()
if __name__ == "__main__":
main()
You need to use torchrun
command to start training, which should be already installed once you install PyTorch.
$ torchrun \
--nnodes=1 \
--nproc_per_node=3 \
--rdzv_id=100 \
--rdzv_backend=c10d \
--rdzv_endpoint=localhost:29400 \
train.py
In this case, 3 processes will be launched and start training loop. DecisionTransformer
-based algorithms also support this distributed training feature.
The example is also available here
Minari support (thanks, @grahamannett !)
Minari is an OSS library to provide a standard format of offline reinforcement learning datasets. Now, d3rlpy provides an easy access to this library.
You can install Minari via d3rlpy CLI.
$ d3rlpy install minari
Example:
import d3rlpy
dataset, env = d3rlpy.datasets.get_minari("antmaze-umaze-v0")
iql = d3rlpy.algos.IQLConfig(
actor_learning_rate=3e-4,
critic_learning_rate=3e-4,
batch_size=256,
weight_temp=10.0,
max_weight=100.0,
expectile=0.9,
reward_scaler=d3rlpy.preprocessing.ConstantShiftRewardScaler(shift=-1),
).create(device="cpu:0")
iql.fit(
dataset,
n_steps=1000000,
n_steps_per_epoch=100000,
evaluators={"environment": d3rlpy.metrics.EnvironmentEvaluator(env)},
)
Minimize redundant computes
From this version, calculation of some algorithms are optimized to remove redundant inference. Therefore, especially algorithms with dual optimization such as SAC
and CQL
became extremely faster than the previous version.
Enhancements
GoalConcatWrapper
has been added to support goal-conditioned environments.return_to_go
has been added toTransition
andTransitionMiniBatch
MixedReplayBuffer
has been added to sample two experiences from multiple buffers with arbitrary ratio.initial_temperature
supports 0 atDiscreteSAC
.
Bugfix
- Getting started page has been fixed.