Skip to content

Commit

Permalink
Merge pull request #285 from OpenRL-Lab/main
Browse files Browse the repository at this point in the history
v0.2.0
  • Loading branch information
huangshiyu13 authored Dec 20, 2023
2 parents 179ef90 + 59efab1 commit aea74bf
Show file tree
Hide file tree
Showing 132 changed files with 3,999 additions and 6,566 deletions.
6 changes: 5 additions & 1 deletion .github/workflows/unit_test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,10 @@ jobs:
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: Install system dependencies
run: |
sudo apt-get update
sudo apt-get install -y xvfb libglu1-mesa-dev python3-opengl
- name: Upgrade pip
run: |
python -m pip install --upgrade pip setuptools wheel
Expand All @@ -27,7 +31,7 @@ jobs:
- name: do_unittest
timeout-minutes: 40
run: |
python3 -m pytest tests --cov=openrl --cov-report=xml -m unittest --cov-report=term-missing --durations=0 -v --color=yes
xvfb-run -s "-screen 0 1400x900x24" python3 -m pytest tests --cov=openrl --cov-report=xml -m unittest --cov-report=term-missing --durations=0 -v --color=yes -s
- name: Upload coverage reports to Codecov with GitHub Action
uses: codecov/codecov-action@v3
with:
Expand Down
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -153,10 +153,11 @@ run_results/
api_docs
.vscode
*.pkl
api_docs
*.json
opponent_pool
!/examples/selfplay/opponent_templates/tictactoe_opponent/info.json
!/examples/nlp/ds_config.json
!/examples/nlp/eval_ds_config.json
wandb_run
examples/dmc/new.gif
/examples/snake/submissions/rl/actor_2000.pth
Expand Down
4 changes: 2 additions & 2 deletions Project.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ However, in many practical applications, it is important to develop reasonable a
In this paper, we propose an on-policy framework for discovering multiple strategies for the same task.
Experimental results show that our method efficiently finds diverse strategies in a wide variety of reinforcement learning tasks.

- Paper: [DGPO: Discovering Multiple Strategies with Diversity-Guided Policy Optimization](https://arxiv.org/abs/2207.05631)(AAMAS Extended Abstract 2023)
- Authors: Wenze Chen, Shiyu Huang, Yuan Chiang, Ting Chen, Jun Zhu
- Paper: [DGPO: Discovering Multiple Strategies with Diversity-Guided Policy Optimization](https://arxiv.org/abs/2207.05631)(AAAAI 2024)
- Authors: Wenze Chen, Shiyu Huang, Yuan Chiang, Tim Pearce, Wei-Wei Tu, Ting Chen, Jun Zhu


36 changes: 19 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
<div align="center">
<a href="https://openrl-docs.readthedocs.io/zh/latest/index.html"><img width="450px" height="auto" src="docs/images/openrl_text.png"></a>
<a href="https://openrl-docs.readthedocs.io/"><img width="450px" height="auto" src="docs/images/openrl_text.png"></a>
</div>

---
Expand All @@ -25,10 +25,10 @@
[![Contributors](https://img.shields.io/github/contributors/OpenRL-Lab/openrl)](https://github.com/OpenRL-Lab/openrl/graphs/contributors)
[![GitHub license](https://img.shields.io/github/license/OpenRL-Lab/openrl)](https://github.com/OpenRL-Lab/openrl/blob/master/LICENSE)

[![Embark](https://img.shields.io/badge/discord-OpenRL-%237289da.svg?logo=discord)](https://discord.gg/guvAS2up)
[![Embark](https://img.shields.io/badge/discord-OpenRL-%237289da.svg?logo=discord)](https://discord.gg/qMbVT2qBhr)
[![slack badge](https://img.shields.io/badge/Slack-join-blueviolet?logo=slack&amp)](https://join.slack.com/t/openrlhq/shared_invite/zt-1tqwpvthd-Eeh0IxQ~DIaGqYXoW2IUQg)

OpenRL-v0.1.7 is updated on Sep 21, 2023
OpenRL-v0.2.0 is updated on Dec 20, 2023

The main branch is the latest version of OpenRL, which is under active development. If you just want to have a try with
OpenRL, you can switch to the stable branch.
Expand Down Expand Up @@ -58,6 +58,8 @@ Currently, the features supported by OpenRL include:

- Reinforcement learning training support for natural language tasks (such as dialogue)

- Support [DeepSpeed](https://github.com/microsoft/DeepSpeed)

- Support [Arena](https://openrl-docs.readthedocs.io/en/latest/arena/index.html) , which allows convenient evaluation of
various agents (even submissions for [JiDi](https://openrl-docs.readthedocs.io/en/latest/arena/index.html#performing-local-evaluation-of-agents-submitted-to-the-jidi-platform-using-openrl)) in a competitive environment.

Expand Down Expand Up @@ -160,19 +162,19 @@ Here we provide a table for the comparison of OpenRL and existing popular RL lib
OpenRL employs a modular design and high-level abstraction, allowing users to accomplish training for various tasks
through a unified and user-friendly interface.

| Library | NLP/RLHF | Multi-agent | Self-Play Training | Offline RL | Bilingual Document |
|:------------------------------------------------------------------:|:------------------:|:--------------------:|:--------------------:|:------------------:|:------------------:|
| **[OpenRL](https://github.com/OpenRL-Lab/openrl)** | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
| [Stable Baselines3](https://github.com/DLR-RM/stable-baselines3) | :x: | :x: | :x: | :x: | :x: |
| [Ray/RLlib](https://github.com/ray-project/ray/tree/master/rllib/) | :x: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :x: |
| [DI-engine](https://github.com/opendilab/DI-engine/) | :x: | :heavy_check_mark: | not fullly supported | :heavy_check_mark: | :heavy_check_mark: |
| [Tianshou](https://github.com/thu-ml/tianshou) | :x: | not fullly supported | not fullly supported | :heavy_check_mark: | :heavy_check_mark: |
| [MARLlib](https://github.com/Replicable-MARL/MARLlib) | :x: | :heavy_check_mark: | not fullly supported | :x: | :x: |
| [MAPPO Benchmark](https://github.com/marlbenchmark/on-policy) | :x: | :heavy_check_mark: | :x: | :x: | :x: |
| [RL4LMs](https://github.com/allenai/RL4LMs) | :heavy_check_mark: | :x: | :x: | :x: | :x: |
| [trlx](https://github.com/CarperAI/trlx) | :heavy_check_mark: | :x: | :x: | :x: | :x: |
| [trl](https://github.com/huggingface/trl) | :heavy_check_mark: | :x: | :x: | :x: | :x: |
| [TimeChamber](https://github.com/inspirai/TimeChamber) | :x: | :x: | :heavy_check_mark: | :x: | :x: |
| Library | NLP/RLHF | Multi-agent | Self-Play Training | Offline RL | [DeepSpeed](https://github.com/microsoft/DeepSpeed) |
|:------------------------------------------------------------------:|:------------------:|:--------------------:|:--------------------:|:------------------:|:--------------------:|
| **[OpenRL](https://github.com/OpenRL-Lab/openrl)** | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
| [Stable Baselines3](https://github.com/DLR-RM/stable-baselines3) | :x: | :x: | :x: | :x: | :x: |
| [Ray/RLlib](https://github.com/ray-project/ray/tree/master/rllib/) | :x: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :x: |
| [DI-engine](https://github.com/opendilab/DI-engine/) | :x: | :heavy_check_mark: | not fullly supported | :heavy_check_mark: | :x: |
| [Tianshou](https://github.com/thu-ml/tianshou) | :x: | not fullly supported | not fullly supported | :heavy_check_mark: | :x: |
| [MARLlib](https://github.com/Replicable-MARL/MARLlib) | :x: | :heavy_check_mark: | not fullly supported | :x: | :x: |
| [MAPPO Benchmark](https://github.com/marlbenchmark/on-policy) | :x: | :heavy_check_mark: | :x: | :x: | :x: |
| [RL4LMs](https://github.com/allenai/RL4LMs) | :heavy_check_mark: | :x: | :x: | :x: | :x: |
| [trlx](https://github.com/CarperAI/trlx) | :heavy_check_mark: | :x: | :x: | :x: | :heavy_check_mark: |
| [trl](https://github.com/huggingface/trl) | :heavy_check_mark: | :x: | :x: | :x: | :heavy_check_mark: |
| [TimeChamber](https://github.com/inspirai/TimeChamber) | :x: | :x: | :heavy_check_mark: | :x: | :x: |

## Installation

Expand Down Expand Up @@ -333,7 +335,7 @@ If you are using OpenRL in your research project, you are also welcome to join t

- Join the [slack](https://join.slack.com/t/openrlhq/shared_invite/zt-1tqwpvthd-Eeh0IxQ~DIaGqYXoW2IUQg) group to discuss
OpenRL usage and development with us.
- Join the [Discord](https://discord.gg/guvAS2up) group to discuss OpenRL usage and development with us.
- Join the [Discord](https://discord.gg/qMbVT2qBhr) group to discuss OpenRL usage and development with us.
- Send an E-mail to: [huangshiyu@4paradigm.com](huangshiyu@4paradigm.com)
- Join the [GitHub Discussion](https://github.com/orgs/OpenRL-Lab/discussions).

Expand Down
19 changes: 10 additions & 9 deletions README_zh.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
<div align="center">
<a href="https://openrl-docs.readthedocs.io/zh/latest/index.html"><img width="450px" height="auto" src="docs/images/openrl_text.png"></a>
<a href="https://openrl-docs.readthedocs.io/"><img width="450px" height="auto" src="docs/images/openrl_text.png"></a>
</div>


Expand All @@ -26,10 +26,10 @@
[![Contributors](https://img.shields.io/github/contributors/OpenRL-Lab/openrl)](https://github.com/OpenRL-Lab/openrl/graphs/contributors)
[![GitHub license](https://img.shields.io/github/license/OpenRL-Lab/openrl)](https://github.com/OpenRL-Lab/openrl/blob/master/LICENSE)

[![Embark](https://img.shields.io/badge/discord-OpenRL-%237289da.svg?logo=discord)](https://discord.gg/guvAS2up)
[![Embark](https://img.shields.io/badge/discord-OpenRL-%237289da.svg?logo=discord)](https://discord.gg/qMbVT2qBhr)
[![slack badge](https://img.shields.io/badge/Slack-join-blueviolet?logo=slack&amp)](https://join.slack.com/t/openrlhq/shared_invite/zt-1tqwpvthd-Eeh0IxQ~DIaGqYXoW2IUQg)

OpenRL-v0.1.7 is updated on Sep 21, 2023
OpenRL-v0.1.10 is updated on Oct 27, 2023

The main branch is the latest version of OpenRL, which is under active development. If you just want to have a try with
OpenRL, you can switch to the stable branch.
Expand All @@ -51,6 +51,7 @@ OpenRL基于PyTorch进行开发,目标是为强化学习研究社区提供一
- 支持通过专家数据进行离线强化学习训练
- 支持自博弈训练
- 支持自然语言任务(如对话任务)的强化学习训练
- 支持[DeepSpeed](https://github.com/microsoft/DeepSpeed)
- 支持[竞技场](https://openrl-docs.readthedocs.io/zh/latest/arena/index.html)功能,可以在多智能体对抗性环境中方便地对各种智能体(甚至是[及第平台](https://openrl-docs.readthedocs.io/zh/latest/arena/index.html#openrl)上提交的智能体)进行评测。
- 支持从[Hugging Face](https://huggingface.co/)上导入模型和数据。支持加载Hugging Face上[Stable-baselines3的模型](https://openrl-docs.readthedocs.io/zh/latest/sb3/index.html)来进行测试和训练。
- 提供用户自有环境接入OpenRL的[详细教程](https://openrl-docs.readthedocs.io/zh/latest/custom_env/index.html).
Expand Down Expand Up @@ -128,18 +129,18 @@ OpenRL-Lab将持续维护和更新OpenRL,欢迎大家加入我们的[开源社

这里我们提供了一个表格,比较了OpenRL和其他常用的强化学习库。 OpenRL采用模块化设计和高层次的抽象,使得用户可以通过统一的简单易用的接口完成各种任务的训练。

| 强化学习库 | 自然语言任务/RLHF | 多智能体训练 | 自博弈训练 | 离线强化学习 | 双语文档 |
| 强化学习库 | 自然语言任务/RLHF | 多智能体训练 | 自博弈训练 | 离线强化学习 | [DeepSpeed](https://github.com/microsoft/DeepSpeed) |
|:------------------------------------------------------------------:|:------------------:|:--------------------:|:--------------------:|:------------------:|:------------------:|
| **[OpenRL](https://github.com/OpenRL-Lab/openrl)** | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
| [Stable Baselines3](https://github.com/DLR-RM/stable-baselines3) | :x: | :x: | :x: | :x: | :x: |
| [Ray/RLlib](https://github.com/ray-project/ray/tree/master/rllib/) | :x: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :x: |
| [DI-engine](https://github.com/opendilab/DI-engine/) | :x: | :heavy_check_mark: | not fullly supported | :heavy_check_mark: | :heavy_check_mark: |
| [Tianshou](https://github.com/thu-ml/tianshou) | :x: | not fullly supported | not fullly supported | :heavy_check_mark: | :heavy_check_mark: |
| [DI-engine](https://github.com/opendilab/DI-engine/) | :x: | :heavy_check_mark: | not fullly supported | :heavy_check_mark: | :x: |
| [Tianshou](https://github.com/thu-ml/tianshou) | :x: | not fullly supported | not fullly supported | :heavy_check_mark: | :x: |
| [MARLlib](https://github.com/Replicable-MARL/MARLlib) | :x: | :heavy_check_mark: | not fullly supported | :x: | :x: |
| [MAPPO Benchmark](https://github.com/marlbenchmark/on-policy) | :x: | :heavy_check_mark: | :x: | :x: | :x: |
| [RL4LMs](https://github.com/allenai/RL4LMs) | :heavy_check_mark: | :x: | :x: | :x: | :x: |
| [trlx](https://github.com/CarperAI/trlx) | :heavy_check_mark: | :x: | :x: | :x: | :x: |
| [trl](https://github.com/huggingface/trl) | :heavy_check_mark: | :x: | :x: | :x: | :x: |
| [trlx](https://github.com/CarperAI/trlx) | :heavy_check_mark: | :x: | :x: | :x: | :heavy_check_mark: |
| [trl](https://github.com/huggingface/trl) | :heavy_check_mark: | :x: | :x: | :x: | :heavy_check_mark: |
| [TimeChamber](https://github.com/inspirai/TimeChamber) | :x: | :x: | :heavy_check_mark: | :x: | :x: |

## 安装
Expand Down Expand Up @@ -293,7 +294,7 @@ openrl --mode train --env CartPole-v1

- 加入 [slack](https://join.slack.com/t/openrlhq/shared_invite/zt-1tqwpvthd-Eeh0IxQ~DIaGqYXoW2IUQg)
群组,与我们一起讨论OpenRL的使用和开发。
- 加入 [Discord](https://discord.gg/guvAS2up) 群组,与我们一起讨论OpenRL的使用和开发。
- 加入 [Discord](https://discord.gg/qMbVT2qBhr) 群组,与我们一起讨论OpenRL的使用和开发。
- 发送邮件到: [huangshiyu@4paradigm.com](huangshiyu@4paradigm.com)
- 加入 [GitHub Discussion](https://github.com/orgs/OpenRL-Lab/discussions)

Expand Down
9 changes: 9 additions & 0 deletions examples/arena/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@

```bash
pip install "openrl[selfplay]"
pip install "pettingzoo[mpe]","pettingzoo[butterfly]"
```

### Usage
Expand All @@ -15,3 +16,11 @@ python run_arena.py
### Evaluate Google Research Football submissions for JiDi locally

If you want to evaluate your Google Research Football submissions for JiDi locally, please try to use tizero as illustrated [here](foothttps://github.com/OpenRL-Lab/TiZero#evaluate-jidi-submissions-locally).

### Evaluate more environments

We also provide a script to evaluate more environments, including MPE, Go, Texas Holdem, Butterfly. You can run the script as follows:

```shell
python evaluate_more_envs.py
```
104 changes: 104 additions & 0 deletions examples/arena/evaluate_more_envs.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# Copyright 2023 The OpenRL Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

""""""

from pettingzoo.butterfly import cooperative_pong_v5
from pettingzoo.classic import connect_four_v3, go_v5, rps_v2, texas_holdem_no_limit_v6
from pettingzoo.mpe import simple_push_v3

from openrl.arena import make_arena
from openrl.arena.agents.local_agent import LocalAgent
from openrl.arena.agents.random_agent import RandomAgent
from openrl.envs.PettingZoo.registration import register
from openrl.envs.wrappers.pettingzoo_wrappers import RecordWinner


def ConnectFourEnv(render_mode, **kwargs):
return connect_four_v3.env(render_mode)


def RockPaperScissorsEnv(render_mode, **kwargs):
return rps_v2.env(num_actions=3, max_cycles=15)


def GoEnv(render_mode, **kwargs):
return go_v5.env(render_mode=render_mode, board_size=5, komi=7.5)


def TexasHoldemEnv(render_mode, **kwargs):
return texas_holdem_no_limit_v6.env(render_mode=render_mode)


# MPE
def SimplePushEnv(render_mode, **kwargs):
return simple_push_v3.env(render_mode=render_mode)


def CooperativePongEnv(render_mode, **kwargs):
return cooperative_pong_v5.env(render_mode=render_mode)


def register_new_envs():
new_env_dict = {
"connect_four_v3": ConnectFourEnv,
"RockPaperScissors": RockPaperScissorsEnv,
"go_v5": GoEnv,
"texas_holdem_no_limit_v6": TexasHoldemEnv,
"simple_push_v3": SimplePushEnv,
"cooperative_pong_v5": CooperativePongEnv,
}

for env_id, env in new_env_dict.items():
register(env_id, env)
return new_env_dict.keys()


def run_arena(
env_id: str,
parallel: bool = True,
seed=0,
total_games: int = 10,
max_game_onetime: int = 5,
):
env_wrappers = [RecordWinner]

arena = make_arena(env_id, env_wrappers=env_wrappers, use_tqdm=False)

agent1 = LocalAgent("../selfplay/opponent_templates/random_opponent")
agent2 = RandomAgent()

arena.reset(
agents={"agent1": agent1, "agent2": agent2},
total_games=total_games,
max_game_onetime=max_game_onetime,
seed=seed,
)
result = arena.run(parallel=parallel)
arena.close()
print(result)
return result


def test_new_envs():
env_ids = register_new_envs()
seed = 0
for env_id in env_ids:
run_arena(env_id=env_id, seed=seed, parallel=False, total_games=1)


if __name__ == "__main__":
test_new_envs()
Loading

0 comments on commit aea74bf

Please sign in to comment.