Release v2.4.0

takuseno released this 18 Feb 03:57

· 88 commits to master since this release

Tuple observations

In v2.4.0, d3rlpy supports tuple observations.

import numpy as np
import d3rlpy

observations = [np.random.random((1000, 100)), np.random.random((1000, 32))]
actions = np.random.random((1000, 4))
rewards = np.random.random((1000, 1))
terminals = np.random.randint(2, size=(1000, 1))
dataset = d3rlpy.dataset.MDPDataset(
    observations=observations,
    actions=actions,
    rewards=rewards,
    terminals=terminals,
)

You can find an example script here

Enhancements

logging_steps and logging_strategy options have been added to fit and fit_online methods (thanks, @claudius-kienle )
Logging with WanDB has been supported. (thanks, @claudius-kienle )
Goal-conditioned envs in Minari have been supported.

Bugfix

Fix errors for distributed training.
OPE documentation has been fixed.

Contributors

claudius-kienle

Assets 2