Releases · takuseno/d3rlpy

27 Jan 15:10

takuseno

0d91df6

Release v0.60

logo

New logo images are made for d3rlpy 🎉

standard	inverted

ActionScaler

ActionScaler provides action scaling pre/post-processing for continuous control algorithms. Previously actions must be in between [-1.0, 1.0]. From now on, you don't need to care about the range of actions.

from d3rlpy.cql import CQL

cql = CQL(action_scaler='min_max')  # just pass action_scaler argument

handling timeout episodes

Episodes terminated by timeouts should not be clipped at bootstrapping. From this version, you can specify episode boundaries as well as the terminal flags.

from d3rlpy.dataset import MDPDataset

observations = ...
actions = ...
rewards = ...
terminals = ... # this indicates the environmental termination
episode_terminals = ... # this indicates episode boundaries

datasets = MDPDataset(observations, actions, rewards, terminals, episode_terminals)

# if episode_terminals are omitted, terminals will be used to specify episode boundaries
# datasets = MDPDataset(observations, actions, rewards, terminals)

In online training, you can specify this option via timelimit_aware flag.

from d3rlpy.sac import SAC

env = gym.make('Hopper-v2') # make sure if the environment is wrapped by gym.wrappers.Timelimit

sac = SAC()
sac.fit_online(env, timelimit_aware=True) # this flag is True by default

reference: https://arxiv.org/abs/1712.00378

batch online training

When training with computationally expensive environments such as robotics simulators or rich 3D games, it will take a long time to finish due to the slow environment steps.
To solve this, d3rlpy supports batch online training.

from d3rlpy.algos import SAC
from d3rlpy.envs import AsyncBatchEnv

if __name__ == '__main__':  # this is necessary if you use AsyncBatchEnv
    env = AsyncBatchEnv([lambda: gym.make('Hopper-v2') for _ in range(10)])  # distributing 10 environments in different processes

    sac = SAC(use_gpu=True)
    sac.fit_batch_online(env) # train with 10 environments concurrently

docker image

Pre-built d3rlpy docker image is available in DockerHub.

$ docker run -it --gpus all --name d3rlpy takuseno/d3rlpy:latest bash

enhancements

BEAR algorithm is updated based on the official implementation
- new mmd_kernel option is available
to_mdp_dataset method is added to ReplayBuffer
ConstantEpsilonGreedy explorer is added
d3rlpy.envs.ChannelFirst wrapper is added (thanks for reporting, @feyza-droid )
new dataset utility function d3rlpy.datasets.get_d4rl is added
- this is handling timeouts inside the function
offline RL paper reproduction codes are added
smoothed moving average plot at d3rlpy plot CLI function (thanks, @pstansell )
user-friendly messages for assertion errors
better memory consumption
save_interval argument is added to fit_online

bugfix

core dumps are fixed in Google Colaboratory tutorials
typos in some documentations (thanks for reporting, @pstansell )

Assets 11

10 Jan 01:25

takuseno

v0.51

6a25c04

Release v0.51

minor fix

add typing-extensions depdency
update MANIFEST.in

Assets 2

09 Jan 10:48

takuseno

v0.50

a86ad8d

Release v0.50

typing

Now, d3rlpy is fully type-annotated not only for the better use of this library but also for the better contribution experiences.

mypy and pylint check the type consistency and code quality.
due to a lot of changes to add type annotations, there might be degradation that is not detected by linters.

CLI

v0.50 introduces the new command-line interface, d3rlpy command that helps you to do more without any efforts. For now, d3rlpy provides the following commands.

# plot CSV data
$ d3rlpy plot d3rlpy_logs/XXX/YYY.csv

# plot CSV data
$ d3rlpy plot-all d3rlpy_logs/XXX

# export the save model as inference formats (e.g. ONNX, TorchScript)
$ d3rlpy export d3rlpy_logs/XXX/model_YYY.pt

enhancements

faster CPU to GPU transfer
- this change makes online training x2 faster
make IQN Q function more precise based on the paper

documentation

Add doc about SB3 integration ( thanks, @araffin )

Assets 11

20 Dec 14:13

takuseno

v0.41

502d85f

Release v0.41

Algorithm

Policy in Latent Action Space (PLAS)
- https://arxiv.org/abs/2011.07213

Off-Policy Evaluation

Off-policy evaluation (OPE) is a method to evaluate policy performance only with the offline dataset.

# train policy
from d3rlpy.algos import CQL
from d3rlpy.datasets import get_pybullet
dataset, env = get_pybullet('hopper-bullet-mixed-v0')
cql = CQL()
cql.fit(dataset.episodes)

# Off-Policy Evaluation
from d3rlpy.ope import FQE
from d3rlpy.metrics.scorer import soft_opc_scorer
from d3rlpy.metrics.scorer import initial_state_value_estimation_scorer
fqe = FQE(algo=cql)
fqe.fit(dataset.episodes,
           eval_episodes=dataset.episodes
           scorers={
               'soft_opc': soft_opc_scorer(1000),
               'init_value': initial_state_value_estimation_scorer
           })

Fitted Q-Evaluation
- https://arxiv.org/abs/2007.09055

Q Function Factory

d3rlpy provides flexible controls over Q functions through Q function factory. Following this change, the previous q_func_type argument was renamed to q_func_factory.

from d3rlpy.algos import DQN
from d3rlpy.q_functions import QRQFunctionFactory

# initialize Q function factory
q_func_factory = QRQFunctionFactory(n_quantiles=32)

# give it to algorithm object
dqn = DQN(q_func_factory=q_func_factory)

You can pass Q function name as string too.

dqn = DQN(q_func_factory='qr')

You can also make your own Q function factory. Currently, these are the supported Q function factory.

EncoderFactory

DenseNet architecture (only for vector observation)
- https://arxiv.org/abs/2010.09163

from d3rlpy.algos import DQN

dqn = DQN(encoder_factory='dense')

N-step TD calculation

d3rlpy supports N-step TD calculation for ALL algorithms. You can pass n_steps arugment to configure this parameters.

from d3rlpy.algos import DQN

dqn = DQN(n_steps=5) # n_steps=1 by default

Paper reproduction scripts

d3rlpy supports many algorithms including online and offline paradigms. Originally, d3rlpy is designed for industrial practitioners. But, academic research is still important to push deep reinforcement learning forward. Currently, there are online DQN-variant reproduction codes.

The evaluation results will be also available soon.

enhancements

build_with_dataset and build_with_env methods are added to algorithm objects
shuffle flag is added to fit method (thanks, @jamartinh )

Assets 11

26 Nov 15:28

takuseno

v0.40

af7e6bd

Release v0.40

Algorithms

Support the discrete version of Soft Actor-Critic
- https://arxiv.org/abs/1910.07207
fit_online has n_steps argument instead of n_epochs for the complete reproduction of the papers.

OptimizerFactory

d3rlpy provides more flexible controls for optimizer configuration via OptimizerFactory.

from d3rlpy.optimizers import AdamFactory
from d3rlpy.algos import DQN

dqn = DQN(optim_factory=AdamFactory(weight_decay=1e-4))

See more at https://d3rlpy.readthedocs.io/en/v0.40/references/optimizers.html .

EncoderFactory

d3rlpy provides more flexible controls for the neural network architecture via EncoderFactory.

from d3rlpy.algos import DQN
from d3rlpy.encoders import VectorEncoderFactory

# encoder factory
encoder_factory = VectorEncoderFactory(hidden_units=[300, 400], activation='tanh')

# set OptimizerFactory
dqn = DQN(encoder_factory=encoder_factory)

Also you can build your own encoders.

import torch
import torch.nn as nn

from d3rlpy.encoders import EncoderFactory

# your own neural network
class CustomEncoder(nn.Module):
    def __init__(self, obsevation_shape, feature_size):
        self.feature_size = feature_size
        self.fc1 = nn.Linear(observation_shape[0], 64)
        self.fc2 = nn.Linear(64, feature_size)

    def forward(self, x):
        h = torch.relu(self.fc1(x))
        h = torch.relu(self.fc2(h))
        return h

    # THIS IS IMPORTANT!
    def get_feature_size(self):
        return self.feature_size

# your own encoder factory
class CustomEncoderFactory(EncoderFactory):
    TYPE = 'custom' # this is necessary

    def __init__(self, feature_size):
        self.feature_size = feature_size

    def create(self, observation_shape, action_size=None, discrete_action=False):
        return CustomEncoder(observation_shape, self.feature_size)

    def get_params(self, deep=False):
        return {
            'feature_size': self.feature_size
        }

dqn = DQN(encoder_factory=CustomEncoderFactory(feature_size=64))

See more at https://d3rlpy.readthedocs.io/en/v0.40/references/network_architectures.html .

Stable Baselines 3 wrapper

Now d3rlpy is partially compatible with Stable Baselines 3.
- https://github.com/takuseno/d3rlpy/blob/master/d3rlpy/wrappers/sb3.py
More documentations will be available soon.

bugfix

fix the memory leak problem at fit_online.
- Now, you can train online algorithms with the big replay buffer size for the image observation.
fix preprocessing at CQL.
fix ColorJitter augmentation.

installation

PyPi

From this version, d3rlpy officially supports Windows.
The binary packages for each platform are built in GitHub Actions. And they are uploaded, which means that you don't have to install Cython to install this package from PyPi.

Anaconda

From previous version, d3rlpy is available in conda-forge.

Assets 11

31 Oct 07:59

takuseno

v0.32

83f1ddd

Release v0.32

This version introduces hotfix.

⚠️ Fix the significant bug in the case of online training with image observation.

Assets 2

28 Oct 14:40

takuseno

v0.31

505bcc2

Release v0.31

This version introduces minor changes.

Move n_epochs arguments to fit method.
Fix scikit-learn compatibility issues.
Fix zero-division error during online training.

Assets 2

27 Oct 08:38

takuseno

v0.30

06d7e93

Release version v0.30

Algorithm

Support Advantage-Weighted Actor-Critic (AWAC)
- https://arxiv.org/abs/2006.09359
fit_online method is available as a convenient alias to d3rlpy.online.iterators.train function.
unnormalizing action problem is fixed at AWR.

Metrics

The following metrics are available.
- initial_state_value_estimation_scorer
  - https://arxiv.org/abs/1906.01624
- soft_opc_scorer
  - https://arxiv.org/abs/2007.09055

⚠️ MDPDataset

d3rlpy.dataset module is now implemented with Cython in order to speed up memory copies.
Following operations are significantly faster than the previous version.
- creating TransitionMiniBatch object
- frame stacking via n_frames argument
- lambda return calculation at AWR algorithms
This change approximately makes Atari training 6% faster.

Assets 2

08 Sep 04:16

takuseno

v0.23

dfc5adf

Release version v0.23

Algorithm

Support Advantage-Weighted Regression (AWR)
- https://arxiv.org/abs/1910.00177
n_frames option is added to all algorithms
- n_frames option controls frame stacking for image observation
eval_results_ property is added to all algorithms
- evaluation results can be retrieved from eval_results_ after training.

MDPDataset

prev_transition and next_transition properties are added to d3rlpy.dataset.Transition.
- these properties are used for frame stacking and Monte-Carlo returns calculation at AWR.

Document

new tutorial page is added

Assets 2

28 Aug 01:22

takuseno

v0.22

1c254b6

Release version v0.22

Support ONNX export

Now, the trained policy can be exported as ONNX as well as TorchScript

cql.save_policy('policy.onnx', as_onnx=True)

Support more data augmentations

data augmentations for vector obsrevation
ColorJitter augmentation for image observation

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

logo

ActionScaler

handling timeout episodes

batch online training

docker image

enhancements

bugfix

minor fix

typing

CLI

enhancements

documentation

Algorithm

Off-Policy Evaluation

Q Function Factory

EncoderFactory

N-step TD calculation

Paper reproduction scripts

enhancements

Algorithms

OptimizerFactory

EncoderFactory

Stable Baselines 3 wrapper

bugfix

installation

PyPi

Anaconda

Algorithm

Metrics

⚠️ MDPDataset

Algorithm

MDPDataset

Document

Support ONNX export

Support more data augmentations

Releases: takuseno/d3rlpy

Release v0.60

logo

ActionScaler

handling timeout episodes

batch online training

docker image

enhancements

bugfix

Release v0.51

minor fix

Release v0.50

typing

CLI

enhancements

documentation

Release v0.41

Algorithm

Off-Policy Evaluation

Q Function Factory

EncoderFactory

N-step TD calculation

Paper reproduction scripts

enhancements

Release v0.40

Algorithms

OptimizerFactory

EncoderFactory

Stable Baselines 3 wrapper

bugfix

installation

PyPi

Anaconda

Release v0.32

Release v0.31

Release version v0.30

Algorithm

Metrics

⚠️ MDPDataset

Release version v0.23

Algorithm

MDPDataset

Document

Release version v0.22

Support ONNX export

Support more data augmentations