Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update readme #284

Merged
merged 1 commit into from
Dec 20, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 15 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,8 @@ Currently, the features supported by OpenRL include:

- Reinforcement learning training support for natural language tasks (such as dialogue)

- Support [DeepSpeed](https://github.com/microsoft/DeepSpeed)

- Support [Arena](https://openrl-docs.readthedocs.io/en/latest/arena/index.html) , which allows convenient evaluation of
various agents (even submissions for [JiDi](https://openrl-docs.readthedocs.io/en/latest/arena/index.html#performing-local-evaluation-of-agents-submitted-to-the-jidi-platform-using-openrl)) in a competitive environment.

Expand Down Expand Up @@ -160,19 +162,19 @@ Here we provide a table for the comparison of OpenRL and existing popular RL lib
OpenRL employs a modular design and high-level abstraction, allowing users to accomplish training for various tasks
through a unified and user-friendly interface.

| Library | NLP/RLHF | Multi-agent | Self-Play Training | Offline RL | Bilingual Document |
|:------------------------------------------------------------------:|:------------------:|:--------------------:|:--------------------:|:------------------:|:------------------:|
| **[OpenRL](https://github.com/OpenRL-Lab/openrl)** | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
| [Stable Baselines3](https://github.com/DLR-RM/stable-baselines3) | :x: | :x: | :x: | :x: | :x: |
| [Ray/RLlib](https://github.com/ray-project/ray/tree/master/rllib/) | :x: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :x: |
| [DI-engine](https://github.com/opendilab/DI-engine/) | :x: | :heavy_check_mark: | not fullly supported | :heavy_check_mark: | :heavy_check_mark: |
| [Tianshou](https://github.com/thu-ml/tianshou) | :x: | not fullly supported | not fullly supported | :heavy_check_mark: | :heavy_check_mark: |
| [MARLlib](https://github.com/Replicable-MARL/MARLlib) | :x: | :heavy_check_mark: | not fullly supported | :x: | :x: |
| [MAPPO Benchmark](https://github.com/marlbenchmark/on-policy) | :x: | :heavy_check_mark: | :x: | :x: | :x: |
| [RL4LMs](https://github.com/allenai/RL4LMs) | :heavy_check_mark: | :x: | :x: | :x: | :x: |
| [trlx](https://github.com/CarperAI/trlx) | :heavy_check_mark: | :x: | :x: | :x: | :x: |
| [trl](https://github.com/huggingface/trl) | :heavy_check_mark: | :x: | :x: | :x: | :x: |
| [TimeChamber](https://github.com/inspirai/TimeChamber) | :x: | :x: | :heavy_check_mark: | :x: | :x: |
| Library | NLP/RLHF | Multi-agent | Self-Play Training | Offline RL | [DeepSpeed](https://github.com/microsoft/DeepSpeed) |
|:------------------------------------------------------------------:|:------------------:|:--------------------:|:--------------------:|:------------------:|:--------------------:|
| **[OpenRL](https://github.com/OpenRL-Lab/openrl)** | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
| [Stable Baselines3](https://github.com/DLR-RM/stable-baselines3) | :x: | :x: | :x: | :x: | :x: |
| [Ray/RLlib](https://github.com/ray-project/ray/tree/master/rllib/) | :x: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :x: |
| [DI-engine](https://github.com/opendilab/DI-engine/) | :x: | :heavy_check_mark: | not fullly supported | :heavy_check_mark: | :x: |
| [Tianshou](https://github.com/thu-ml/tianshou) | :x: | not fullly supported | not fullly supported | :heavy_check_mark: | :x: |
| [MARLlib](https://github.com/Replicable-MARL/MARLlib) | :x: | :heavy_check_mark: | not fullly supported | :x: | :x: |
| [MAPPO Benchmark](https://github.com/marlbenchmark/on-policy) | :x: | :heavy_check_mark: | :x: | :x: | :x: |
| [RL4LMs](https://github.com/allenai/RL4LMs) | :heavy_check_mark: | :x: | :x: | :x: | :x: |
| [trlx](https://github.com/CarperAI/trlx) | :heavy_check_mark: | :x: | :x: | :x: | :heavy_check_mark: |
| [trl](https://github.com/huggingface/trl) | :heavy_check_mark: | :x: | :x: | :x: | :heavy_check_mark: |
| [TimeChamber](https://github.com/inspirai/TimeChamber) | :x: | :x: | :heavy_check_mark: | :x: | :x: |

## Installation

Expand Down
11 changes: 6 additions & 5 deletions README_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@ OpenRL基于PyTorch进行开发,目标是为强化学习研究社区提供一
- 支持通过专家数据进行离线强化学习训练
- 支持自博弈训练
- 支持自然语言任务(如对话任务)的强化学习训练
- 支持[DeepSpeed](https://github.com/microsoft/DeepSpeed)
- 支持[竞技场](https://openrl-docs.readthedocs.io/zh/latest/arena/index.html)功能,可以在多智能体对抗性环境中方便地对各种智能体(甚至是[及第平台](https://openrl-docs.readthedocs.io/zh/latest/arena/index.html#openrl)上提交的智能体)进行评测。
- 支持从[Hugging Face](https://huggingface.co/)上导入模型和数据。支持加载Hugging Face上[Stable-baselines3的模型](https://openrl-docs.readthedocs.io/zh/latest/sb3/index.html)来进行测试和训练。
- 提供用户自有环境接入OpenRL的[详细教程](https://openrl-docs.readthedocs.io/zh/latest/custom_env/index.html).
Expand Down Expand Up @@ -128,18 +129,18 @@ OpenRL-Lab将持续维护和更新OpenRL,欢迎大家加入我们的[开源社

这里我们提供了一个表格,比较了OpenRL和其他常用的强化学习库。 OpenRL采用模块化设计和高层次的抽象,使得用户可以通过统一的简单易用的接口完成各种任务的训练。

| 强化学习库 | 自然语言任务/RLHF | 多智能体训练 | 自博弈训练 | 离线强化学习 | 双语文档 |
| 强化学习库 | 自然语言任务/RLHF | 多智能体训练 | 自博弈训练 | 离线强化学习 | [DeepSpeed](https://github.com/microsoft/DeepSpeed) |
|:------------------------------------------------------------------:|:------------------:|:--------------------:|:--------------------:|:------------------:|:------------------:|
| **[OpenRL](https://github.com/OpenRL-Lab/openrl)** | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
| [Stable Baselines3](https://github.com/DLR-RM/stable-baselines3) | :x: | :x: | :x: | :x: | :x: |
| [Ray/RLlib](https://github.com/ray-project/ray/tree/master/rllib/) | :x: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :x: |
| [DI-engine](https://github.com/opendilab/DI-engine/) | :x: | :heavy_check_mark: | not fullly supported | :heavy_check_mark: | :heavy_check_mark: |
| [Tianshou](https://github.com/thu-ml/tianshou) | :x: | not fullly supported | not fullly supported | :heavy_check_mark: | :heavy_check_mark: |
| [DI-engine](https://github.com/opendilab/DI-engine/) | :x: | :heavy_check_mark: | not fullly supported | :heavy_check_mark: | :x: |
| [Tianshou](https://github.com/thu-ml/tianshou) | :x: | not fullly supported | not fullly supported | :heavy_check_mark: | :x: |
| [MARLlib](https://github.com/Replicable-MARL/MARLlib) | :x: | :heavy_check_mark: | not fullly supported | :x: | :x: |
| [MAPPO Benchmark](https://github.com/marlbenchmark/on-policy) | :x: | :heavy_check_mark: | :x: | :x: | :x: |
| [RL4LMs](https://github.com/allenai/RL4LMs) | :heavy_check_mark: | :x: | :x: | :x: | :x: |
| [trlx](https://github.com/CarperAI/trlx) | :heavy_check_mark: | :x: | :x: | :x: | :x: |
| [trl](https://github.com/huggingface/trl) | :heavy_check_mark: | :x: | :x: | :x: | :x: |
| [trlx](https://github.com/CarperAI/trlx) | :heavy_check_mark: | :x: | :x: | :x: | :heavy_check_mark: |
| [trl](https://github.com/huggingface/trl) | :heavy_check_mark: | :x: | :x: | :x: | :heavy_check_mark: |
| [TimeChamber](https://github.com/inspirai/TimeChamber) | :x: | :x: | :heavy_check_mark: | :x: | :x: |

## 安装
Expand Down