Release v2.4.0
Tuple observations
In v2.4.0, d3rlpy supports tuple observations.
import numpy as np
import d3rlpy
observations = [np.random.random((1000, 100)), np.random.random((1000, 32))]
actions = np.random.random((1000, 4))
rewards = np.random.random((1000, 1))
terminals = np.random.randint(2, size=(1000, 1))
dataset = d3rlpy.dataset.MDPDataset(
observations=observations,
actions=actions,
rewards=rewards,
terminals=terminals,
)
You can find an example script here
Enhancements
logging_steps
andlogging_strategy
options have been added tofit
andfit_online
methods (thanks, @claudius-kienle )- Logging with WanDB has been supported. (thanks, @claudius-kienle )
- Goal-conditioned envs in Minari have been supported.
Bugfix
- Fix errors for distributed training.
- OPE documentation has been fixed.