Skip to content

Commit

Permalink
pass black code style test and pre-commit test
Browse files Browse the repository at this point in the history
  • Loading branch information
lwaekfjlk committed Mar 13, 2024
1 parent a2223d0 commit 1ebd2d1
Show file tree
Hide file tree
Showing 140 changed files with 5,082 additions and 2,973 deletions.
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,4 +24,4 @@ repos:
- repo: https://github.com/kynan/nbstripout
rev: 0.6.0
hooks:
- id: nbstripout
- id: nbstripout
12 changes: 6 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@

# Sotopia-π: Interactive Learning of Socially Intelligent Language Agents
This is the official repo of the paper: [add arxiv link].
# Sotopia-π: Interactive Learning of Socially Intelligent Language Agents
This is the official repo of the paper: [add arxiv link].
For highlights of the paper, please see our [website](https://sotopia-dev.vercel.app/projects/sotopia-pi).

![title](imgs/acl2024_teaser.png)

We introduce Sotopia-π, a method that improves the social intelligence of large language models (LLMs) through social interaction. The method involves three steps: (1) automatically generates new social tasks, (2) collects data from both expert policy and agent policy for training, and (3) updates agent policy based on positive data rated by GPT-4. The training and evaluation environment is based on the [Sotopia](https://github.com/XuhuiZhou/sotopia) framework.
We introduce Sotopia-π, a method that improves the social intelligence of large language models (LLMs) through social interaction. The method involves three steps: (1) automatically generates new social tasks, (2) collects data from both expert policy and agent policy for training, and (3) updates agent policy based on positive data rated by GPT-4. The training and evaluation environment is based on the [Sotopia](https://github.com/XuhuiZhou/sotopia) framework.

## Preparations
- Install dependencies:
Expand All @@ -21,10 +21,10 @@ We introduce Sotopia-π, a method that improves the social intelligence of large
conda env config vars set REDIS_OM_URL="redis://user:password@host:port"
```
## Step 1: Social Task Generation
The first step is to generate synthesized social tasks by sampling keywords from datasets and prompting GPT-4 Turbo to generate corresponding social tasks. For detailed implementation, please refer to [this section](https://github.com/sotopia-lab/sotopia-pi/tree/main/data_generate#social-task-generation).
The first step is to generate synthesized social tasks by sampling keywords from datasets and prompting GPT-4 Turbo to generate corresponding social tasks. For detailed implementation, please refer to [this section](https://github.com/sotopia-lab/sotopia-pi/tree/main/data_generate#social-task-generation).

## Step 2: Training Data Collection
The second step is to collect data from expert (GPT-4 vs. GPT-4) as behavior cloning trajectories and from self (our model vs. our model) as self-reinforcement trajectories.
The second step is to collect data from expert (GPT-4 vs. GPT-4) as behavior cloning trajectories and from self (our model vs. our model) as self-reinforcement trajectories.
To collect behavior cloning data, run
```bash
cd data_generate
Expand All @@ -48,7 +48,7 @@ This step requires (1) filter the collected conversation data based on GPT-4 rat

## Human Evaluation

* We develop a personalized project based on oTree and release the human evaluation project via Prolific.
* We develop a personalized project based on oTree and release the human evaluation project via Prolific.
* Detailed instruction on reproducing human evaluation is mentioned [here](https://github.com/sotopia-lab/sotopia-pi/tree/main/human_eval).

## Model Checkpoints
Expand Down
22 changes: 11 additions & 11 deletions data_generate/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,11 +27,11 @@ python3 generate_conversations.py --eval-script scripts/generate_conv_sft.sh --e
```

## Agent Performance Evaluation
We evaluate our trained models based on the social tasks in Sotopia. The tag `sotopia_env` in `env_files/used_env.json` represents all 90 social tasks in Sotopia, and the tag `sotopia_hard_env` represents the 14 hard social tasks.
We evaluate our trained models based on the social tasks in Sotopia. The tag `sotopia_env` in `env_files/used_env.json` represents all 90 social tasks in Sotopia, and the tag `sotopia_hard_env` represents the 14 hard social tasks.

We provide an example script `scripts/eval_sft.sh` of evaluating the trained model (named `custom_model`) and a partner model (GPT-3.5 Turbo by default) under the Sotopia framework. In the script, make sure to modify the `custom_model`'s name (e.g. `checkpoint_improve-0_epoch-20`) and API URL (e.g. `http://0.0.0.0:8106/v1` if the model is deployed on localhost) in the corresponding gin file (e.g. `data_generate/scripts/sotopia_conf/generation_utils_conf/generate_mistral_gpt-3.5-turbo.gin`).
We provide an example script `scripts/eval_sft.sh` of evaluating the trained model (named `custom_model`) and a partner model (GPT-3.5 Turbo by default) under the Sotopia framework. In the script, make sure to modify the `custom_model`'s name (e.g. `checkpoint_improve-0_epoch-20`) and API URL (e.g. `http://0.0.0.0:8106/v1` if the model is deployed on localhost) in the corresponding gin file (e.g. `data_generate/scripts/sotopia_conf/generation_utils_conf/generate_mistral_gpt-3.5-turbo.gin`).

Running the following code will run conversations between the trained model and a partner model. The code automatically prompts GPT-4 to provide scores on seven social dimensions.
Running the following code will run conversations between the trained model and a partner model. The code automatically prompts GPT-4 to provide scores on seven social dimensions.
```python
python3 generate_conversations.py --eval-script scripts/eval_sft.sh --env-file env_files/used_env.json --experiment-name SFT-round-1 --tag sft_round_1_mistral_gpt-3.5-turbo_test --batch-size 4 --agent1-model custom_model --agent2-model gpt-3.5-turbo --push-to-db True
```
Expand All @@ -43,7 +43,7 @@ python3 generate_conversations.py --eval-script scripts/eval_sft.sh --env-file e

For Sotopia's inspirational prompt, it includes cherry-pick a few examples from 6 datasets (`social\_iqa`, `social\_chem`, `normbank`, `deal-or-no-deal`, `persuation_for_good`, `mindcraft`)

For our inspirational prompt, it include full examples from 3 datasets (`social\_iqa`, `social\_chem`, `normbank`).
For our inspirational prompt, it include full examples from 3 datasets (`social\_iqa`, `social\_chem`, `normbank`).

Notice1: The reason why we does not include `deal-or-no-deal` and `mindcraft` is because we think those inspirational prompt is too similar within one dataset and would cause some leakage if we train on them and test on sotopia ones

Expand All @@ -52,20 +52,20 @@ Notice2: The reason why we do not include `persuation_for_good` is because we ca

### Explanations for EnvProfile generation:

With inspirational prompt, we utilize `gpt-4-turbo` to generate EnvProfile.
With inspirational prompt, we utilize `gpt-4-turbo` to generate EnvProfile.

Note that our function also allow other OpenAI models with different temperature. The default model is gpt-4-turbo and default temperature is 0.5.
Note that our function also allow other OpenAI models with different temperature. The default model is gpt-4-turbo and default temperature is 0.5.


### Detailed Steps of generating inspirational prompts

1. We create new inspirational prompts csv under env_files folder, based on three sources used in SOTOPIA scenario generation. The sources are social_iqa, social_chemistry and normbank. For each source, we make sure the duplicates are dropped and there is NOT overlapping with SOTOPIA.
2. We generate 430 new scenarios, roughly evenly distributed across three sources. The logic to generate new scenarios is as follow:
<br> a. For a target amount of scenarios, we divide the number by three to get X(or number of total sources)
<br> b. For each source, we randomly select X number of unused prompts, and for each prompt, we randomly select an environment profile example currently in the database, then we use openAI completion with model and temperature to generate new scenario.
<br> b. For each source, we randomly select X number of unused prompts, and for each prompt, we randomly select an environment profile example currently in the database, then we use openAI completion with model and temperature to generate new scenario.
<br> c. After generation, we save all used propmts, the corresponding pk and generate model in to used_prompts.csv under env_files, so as to track used prompts and avoid future repetition.

3. We also create sampling function that allow random sample from current redis database, and filter out SOTOPIA scenarios and used scenarios, which are saved under used_env.json. The reason is that we want to avoid generating conversation using the same scenarios, to keep diversity.
3. We also create sampling function that allow random sample from current redis database, and filter out SOTOPIA scenarios and used scenarios, which are saved under used_env.json. The reason is that we want to avoid generating conversation using the same scenarios, to keep diversity.


## Setting up Redis database
Expand Down Expand Up @@ -133,9 +133,9 @@ To do so, one of the member must first have access to TIGER. Then, follow the st

* 7.2.0-v6 - this is due to version incompatibility. On TIGER, redis version is current at 5.0.7, but the dump.rdb we are using is in version 7.2.3. Using latest redis-stack without specifying verison would lead to incompatibility. We must specified we want the redis-stack to run using a newer version of image. If the dump.rdb are in version 6.2.12 for example, `redis/redis-stack:latest` is enough.

To check the version of redis, run `redis-cli INFO SERVER` on command line.
To check the version of redis, run `redis-cli INFO SERVER` on command line.

To check if the server is successfully running, you could either go online using `http://SERVER:PORT/redis-stack/browser`, or run `docker ps` on command line and see if the container named NAMEYOUWANT is running.
To check if the server is successfully running, you could either go online using `http://SERVER:PORT/redis-stack/browser`, or run `docker ps` on command line and see if the container named NAMEYOUWANT is running.

=======

Expand Down Expand Up @@ -176,4 +176,4 @@ Link: <https://github.com/RedisJSON/RedisJSON>

The default version for redis could be 7.2.x. However, to deploy it on tiger, we need to use the 6.2.x version of redis. Therefore, the command line running on local could be:

`docker run -p 6379:6379 --name redis-stack-old redis/redis-stack:6.2.6-v10` instead of using latest. After running on local and save all data to redis db, we should get a dump.rdb in the folder that are in version 6.2.6. We could then upload this file to tiger server.
`docker run -p 6379:6379 --name redis-stack-old redis/redis-stack:6.2.6-v10` instead of using latest. After running on local and save all data to redis db, we should get a dump.rdb in the folder that are in version 6.2.6. We could then upload this file to tiger server.
2 changes: 1 addition & 1 deletion data_generate/env_files/used_env.json
Original file line number Diff line number Diff line change
Expand Up @@ -527,4 +527,4 @@
"01H7VFHP8AN5643B0NR0NP00VE",
"01H7VFHN7A1ZX5KSMT2YN9RXC4"
]
}
}
135 changes: 86 additions & 49 deletions data_generate/generate_conversations.py
Original file line number Diff line number Diff line change
@@ -1,85 +1,122 @@
import subprocess
import re
import json
import argparse
import json
import re
import subprocess


def overwrite_eval_bash(
eval_script: str,
tag: str,
env_ids: list,
batch_size: int = 2,
agent1_model: str = "gpt-3.5-turbo",
agent2_model: str = "gpt-3.5-turbo",
push_to_db: bool = True) -> None:
eval_script: str,
tag: str,
env_ids: list,
batch_size: int = 2,
agent1_model: str = "gpt-3.5-turbo",
agent2_model: str = "gpt-3.5-turbo",
push_to_db: bool = True,
) -> None:

with open(eval_script, 'r') as f:
with open(eval_script, "r") as f:
lines = f.readlines()

for i in range(len(lines)):
# change TAG, TAG_TO_CHECK_EXISTING_EPISODES
if "--gin.TAG_TO_CHECK_EXISTING_EPISODES" in lines[i]:
pattern = r'(--gin\.TAG_TO_CHECK_EXISTING_EPISODES=")([^"]*)(".*\n)'
lines[i] = re.sub(pattern, r'\1' + tag + r'\3', lines[i])
pattern = (
r'(--gin\.TAG_TO_CHECK_EXISTING_EPISODES=")([^"]*)(".*\n)'
)
lines[i] = re.sub(pattern, r"\1" + tag + r"\3", lines[i])
elif "--gin.TAG" in lines[i]:
pattern = r'(--gin\.TAG=")([^"]*)(".*\n)'
lines[i] = re.sub(pattern, r'\1' + tag + r'\3', lines[i])
lines[i] = re.sub(pattern, r"\1" + tag + r"\3", lines[i])
# change ENV_IDS
elif "--gin.ENV_IDS" in lines[i]:
pattern = r'(--gin\.ENV_IDS=).*?(\s*\\)'
pattern = r"(--gin\.ENV_IDS=).*?(\s*\\)"
lines[i] = re.sub(
pattern, r'\1' + json.dumps(env_ids) + r"'" + r'\2', lines[i])
pattern, r"\1" + json.dumps(env_ids) + r"'" + r"\2", lines[i]
)
# change batch size
elif "--gin.BATCH_SIZE" in lines[i]:
pattern = r'(--gin\.BATCH_SIZE=)(\d+)'
lines[i] = re.sub(pattern, r'\g<1>' + str(batch_size), lines[i])
pattern = r"(--gin\.BATCH_SIZE=)(\d+)"
lines[i] = re.sub(pattern, r"\g<1>" + str(batch_size), lines[i])
# change agent models
elif "--gin.AGENT1_MODEL" in lines[i]:
pattern = r'(--gin\.AGENT1_MODEL=")([^"]*)(".*\n)'
lines[i] = re.sub(pattern, r'\1' + agent1_model + r'\3', lines[i])
lines[i] = re.sub(pattern, r"\1" + agent1_model + r"\3", lines[i])
elif "--gin.AGENT2_MODEL" in lines[i]:
pattern = r'(--gin\.AGENT2_MODEL=")([^"]*)(".*\n)'
lines[i] = re.sub(pattern, r'\1' + agent2_model + r'\3', lines[i])
lines[i] = re.sub(pattern, r"\1" + agent2_model + r"\3", lines[i])
# change push to db flag
elif "--gin.PUSH_TO_DB" in lines[i]:
pattern = r'(--gin\.PUSH_TO_DB=)(True|False)'
lines[i] = re.sub(
pattern, r'\g<1>' + str(push_to_db), lines[i])
pattern = r"(--gin\.PUSH_TO_DB=)(True|False)"
lines[i] = re.sub(pattern, r"\g<1>" + str(push_to_db), lines[i])

with open(eval_script, 'w') as f:
f.write(''.join(lines))
with open(eval_script, "w") as f:
f.write("".join(lines))


def main():
parser = argparse.ArgumentParser()
parser.add_argument("--eval-script", type=str, required=True,
help="Required. Provide template bash file for sotopia evaluation.")
parser.add_argument("--env-file", type=str,
default="env_files/used_env.json", help="Default: env_files/used_env.json. Provide the json file of env ids for conversation generation.")
parser.add_argument("--experiment-name", type=str, required=True,
help="Required. Need the experiment_name, which is the key of the env_file.")
parser.add_argument("--tag", type=str, required=True,
help="Required. Provide a unique tag that will be pushed to REDIS database.")
parser.add_argument("--batch-size", type=int, default=2,
help="Default: 2. Provide the batch size of calling APIs.")
parser.add_argument("--agent1-model", type=str, default="gpt-3.5-turbo",
help="Default: gpt-3.5-turbo. Provide the name of OPENAI model.")
parser.add_argument("--agent2-model", type=str, default="gpt-3.5-turbo",
help="Default: gpt-3.5-turbo. Provide the name of OPENAI model.")
parser.add_argument("--push-to-db", type=str, default=True,
help="Default: True. If you choose False, then the conversations will not be pushed to REDIS database.")
parser.add_argument(
"--eval-script",
type=str,
required=True,
help="Required. Provide template bash file for sotopia evaluation.",
)
parser.add_argument(
"--env-file",
type=str,
default="env_files/used_env.json",
help="Default: env_files/used_env.json. Provide the json file of env ids for conversation generation.",
)
parser.add_argument(
"--experiment-name",
type=str,
required=True,
help="Required. Need the experiment_name, which is the key of the env_file.",
)
parser.add_argument(
"--tag",
type=str,
required=True,
help="Required. Provide a unique tag that will be pushed to REDIS database.",
)
parser.add_argument(
"--batch-size",
type=int,
default=2,
help="Default: 2. Provide the batch size of calling APIs.",
)
parser.add_argument(
"--agent1-model",
type=str,
default="gpt-3.5-turbo",
help="Default: gpt-3.5-turbo. Provide the name of OPENAI model.",
)
parser.add_argument(
"--agent2-model",
type=str,
default="gpt-3.5-turbo",
help="Default: gpt-3.5-turbo. Provide the name of OPENAI model.",
)
parser.add_argument(
"--push-to-db",
type=str,
default=True,
help="Default: True. If you choose False, then the conversations will not be pushed to REDIS database.",
)
args = parser.parse_args()

with open(args.env_file, 'r') as f:
with open(args.env_file, "r") as f:
env_ids = json.loads(f.read())[args.experiment_name]

overwrite_eval_bash(eval_script=args.eval_script,
tag=args.tag,
env_ids=env_ids,
batch_size=args.batch_size,
agent1_model=args.agent1_model,
agent2_model=args.agent2_model,
push_to_db=True if args.push_to_db == "True" else False)
overwrite_eval_bash(
eval_script=args.eval_script,
tag=args.tag,
env_ids=env_ids,
batch_size=args.batch_size,
agent1_model=args.agent1_model,
agent2_model=args.agent2_model,
push_to_db=True if args.push_to_db == "True" else False,
)

command = f"bash {args.eval_script}"
subprocess.run(command.split())
Expand Down
Loading

0 comments on commit 1ebd2d1

Please sign in to comment.