pass black code style test and pre-commit test

sotopia-lab · Mar 13, 2024 · 1ebd2d1 · 1ebd2d1
1 parent a2223d0
commit 1ebd2d1
Show file tree

Hide file tree

Showing 140 changed files with 5,082 additions and 2,973 deletions.
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -24,4 +24,4 @@ repos:
 -   repo: https://github.com/kynan/nbstripout
     rev: 0.6.0
     hooks:
-      - id: nbstripout
+      - id: nbstripout
diff --git a/README.md b/README.md
@@ -1,11 +1,11 @@
 
-# Sotopia-π: Interactive Learning of Socially Intelligent Language Agents 
-This is the official repo of the paper: [add arxiv link]. 
+# Sotopia-π: Interactive Learning of Socially Intelligent Language Agents
+This is the official repo of the paper: [add arxiv link].
 For highlights of the paper, please see our [website](https://sotopia-dev.vercel.app/projects/sotopia-pi).
 
 ![title](imgs/acl2024_teaser.png)
 
-We introduce Sotopia-π, a method that improves the social intelligence of large language models (LLMs) through social interaction. The method involves three steps: (1) automatically generates new social tasks, (2) collects data from both expert policy and agent policy for training, and (3) updates agent policy based on positive data rated by GPT-4. The training and evaluation environment is based on the [Sotopia](https://github.com/XuhuiZhou/sotopia) framework. 
+We introduce Sotopia-π, a method that improves the social intelligence of large language models (LLMs) through social interaction. The method involves three steps: (1) automatically generates new social tasks, (2) collects data from both expert policy and agent policy for training, and (3) updates agent policy based on positive data rated by GPT-4. The training and evaluation environment is based on the [Sotopia](https://github.com/XuhuiZhou/sotopia) framework.
 
 ## Preparations
 - Install dependencies:
@@ -21,10 +21,10 @@ We introduce Sotopia-π, a method that improves the social intelligence of large
   conda env config vars set REDIS_OM_URL="redis://user:password@host:port"
   ```
 ## Step 1: Social Task Generation
-The first step is to generate synthesized social tasks by sampling keywords from datasets and prompting GPT-4 Turbo to generate corresponding social tasks. For detailed implementation, please refer to [this section](https://github.com/sotopia-lab/sotopia-pi/tree/main/data_generate#social-task-generation). 
+The first step is to generate synthesized social tasks by sampling keywords from datasets and prompting GPT-4 Turbo to generate corresponding social tasks. For detailed implementation, please refer to [this section](https://github.com/sotopia-lab/sotopia-pi/tree/main/data_generate#social-task-generation).
 
 ## Step 2: Training Data Collection
-The second step is to collect data from expert (GPT-4 vs. GPT-4) as behavior cloning trajectories and from self (our model vs. our model) as self-reinforcement trajectories. 
+The second step is to collect data from expert (GPT-4 vs. GPT-4) as behavior cloning trajectories and from self (our model vs. our model) as self-reinforcement trajectories.
 To collect behavior cloning data, run
 ```bash
 cd data_generate
@@ -48,7 +48,7 @@ This step requires (1) filter the collected conversation data based on GPT-4 rat
 
 ## Human Evaluation
 
-* We develop a personalized project based on oTree and release the human evaluation project via Prolific. 
+* We develop a personalized project based on oTree and release the human evaluation project via Prolific.
 * Detailed instruction on reproducing human evaluation is mentioned [here](https://github.com/sotopia-lab/sotopia-pi/tree/main/human_eval).
 
 ## Model Checkpoints

diff --git a/data_generate/README.md b/data_generate/README.md
@@ -27,11 +27,11 @@ python3 generate_conversations.py --eval-script scripts/generate_conv_sft.sh --e
 ```
 
 ## Agent Performance Evaluation
-We evaluate our trained models based on the social tasks in Sotopia. The tag `sotopia_env` in `env_files/used_env.json` represents all 90 social tasks in Sotopia, and the tag `sotopia_hard_env` represents the 14 hard social tasks. 
+We evaluate our trained models based on the social tasks in Sotopia. The tag `sotopia_env` in `env_files/used_env.json` represents all 90 social tasks in Sotopia, and the tag `sotopia_hard_env` represents the 14 hard social tasks.
 
-We provide an example script `scripts/eval_sft.sh` of evaluating the trained model (named `custom_model`) and a partner model (GPT-3.5 Turbo by default) under the Sotopia framework. In the script, make sure to modify the `custom_model`'s name (e.g. `checkpoint_improve-0_epoch-20`) and API URL (e.g. `http://0.0.0.0:8106/v1` if the model is deployed on localhost) in the corresponding gin file (e.g. `data_generate/scripts/sotopia_conf/generation_utils_conf/generate_mistral_gpt-3.5-turbo.gin`). 
+We provide an example script `scripts/eval_sft.sh` of evaluating the trained model (named `custom_model`) and a partner model (GPT-3.5 Turbo by default) under the Sotopia framework. In the script, make sure to modify the `custom_model`'s name (e.g. `checkpoint_improve-0_epoch-20`) and API URL (e.g. `http://0.0.0.0:8106/v1` if the model is deployed on localhost) in the corresponding gin file (e.g. `data_generate/scripts/sotopia_conf/generation_utils_conf/generate_mistral_gpt-3.5-turbo.gin`).
 
-Running the following code will run conversations between the trained model and a partner model. The code automatically prompts GPT-4 to provide scores on seven social dimensions. 
+Running the following code will run conversations between the trained model and a partner model. The code automatically prompts GPT-4 to provide scores on seven social dimensions.
 ```python
 python3 generate_conversations.py --eval-script scripts/eval_sft.sh --env-file env_files/used_env.json --experiment-name SFT-round-1 --tag sft_round_1_mistral_gpt-3.5-turbo_test --batch-size 4 --agent1-model custom_model --agent2-model gpt-3.5-turbo --push-to-db True
 ```
@@ -43,7 +43,7 @@ python3 generate_conversations.py --eval-script scripts/eval_sft.sh --env-file e
 
 For Sotopia's inspirational prompt, it includes cherry-pick a few examples from 6 datasets (`social\_iqa`, `social\_chem`, `normbank`, `deal-or-no-deal`, `persuation_for_good`, `mindcraft`)
 
-For our inspirational prompt, it include full examples from 3 datasets (`social\_iqa`, `social\_chem`, `normbank`). 
+For our inspirational prompt, it include full examples from 3 datasets (`social\_iqa`, `social\_chem`, `normbank`).
 
 Notice1: The reason why we does not include `deal-or-no-deal` and `mindcraft` is because we think those inspirational prompt is too similar within one dataset and would cause some leakage if we train on them and test on sotopia ones
 
@@ -52,20 +52,20 @@ Notice2: The reason why we do not include `persuation_for_good` is because we ca
 
 ### Explanations for EnvProfile generation:
 
-With inspirational prompt, we utilize `gpt-4-turbo` to generate EnvProfile. 
+With inspirational prompt, we utilize `gpt-4-turbo` to generate EnvProfile.
 
-Note that our function also allow other OpenAI models with different temperature. The default model is gpt-4-turbo and default temperature is 0.5. 
+Note that our function also allow other OpenAI models with different temperature. The default model is gpt-4-turbo and default temperature is 0.5.
 
 
 ### Detailed Steps of generating inspirational prompts
 
 1. We create new inspirational prompts csv under env_files folder, based on three sources used in SOTOPIA scenario generation. The sources are social_iqa, social_chemistry and normbank. For each source, we make sure the duplicates are dropped and there is NOT overlapping with SOTOPIA.
 2. We generate 430 new scenarios, roughly evenly distributed across three sources. The logic to generate new scenarios is as follow:
 <br> a. For a target amount of scenarios, we divide the number by three to get X(or number of total sources)
-<br> b. For each source, we randomly select X number of unused prompts, and for each prompt, we randomly select an environment profile example currently in the database, then we use openAI completion with model and temperature to generate new scenario. 
+<br> b. For each source, we randomly select X number of unused prompts, and for each prompt, we randomly select an environment profile example currently in the database, then we use openAI completion with model and temperature to generate new scenario.
 <br> c. After generation, we save all used propmts, the corresponding pk and generate model in to used_prompts.csv under env_files, so as to track used prompts and avoid future repetition.
 
-3. We also create sampling function that allow random sample from current redis database, and filter out SOTOPIA scenarios and used scenarios, which are saved under used_env.json. The reason is that we want to avoid generating conversation using the same scenarios, to keep diversity. 
+3. We also create sampling function that allow random sample from current redis database, and filter out SOTOPIA scenarios and used scenarios, which are saved under used_env.json. The reason is that we want to avoid generating conversation using the same scenarios, to keep diversity.
 
 
 ## Setting up Redis database
@@ -133,9 +133,9 @@ To do so, one of the member must first have access to TIGER. Then, follow the st
 
 * 7.2.0-v6 - this is due to version incompatibility. On TIGER, redis version is current at 5.0.7, but the dump.rdb we are using is in version 7.2.3. Using latest redis-stack without specifying verison would lead to incompatibility. We must specified we want the redis-stack to run using a newer version of image. If the dump.rdb are in version 6.2.12 for example, `redis/redis-stack:latest` is enough.
 
-To check the version of redis, run `redis-cli INFO SERVER` on command line. 
+To check the version of redis, run `redis-cli INFO SERVER` on command line.
 
-To check if the server is successfully running, you could either go online using `http://SERVER:PORT/redis-stack/browser`, or run `docker ps` on command line and see if the container named NAMEYOUWANT is running. 
+To check if the server is successfully running, you could either go online using `http://SERVER:PORT/redis-stack/browser`, or run `docker ps` on command line and see if the container named NAMEYOUWANT is running.
 
 =======
 
@@ -176,4 +176,4 @@ Link: <https://github.com/RedisJSON/RedisJSON>
 
 The default version for redis could be 7.2.x. However, to deploy it on tiger, we need to use the 6.2.x version of redis. Therefore, the command line running on local could be:
 
-`docker run -p 6379:6379 --name redis-stack-old redis/redis-stack:6.2.6-v10` instead of using latest. After running on local and save all data to redis db, we should get a dump.rdb in the folder that are in version 6.2.6. We could then upload this file to tiger server. 
+`docker run -p 6379:6379 --name redis-stack-old redis/redis-stack:6.2.6-v10` instead of using latest. After running on local and save all data to redis db, we should get a dump.rdb in the folder that are in version 6.2.6. We could then upload this file to tiger server.
diff --git a/data_generate/env_files/used_env.json b/data_generate/env_files/used_env.json
@@ -527,4 +527,4 @@
         "01H7VFHP8AN5643B0NR0NP00VE",
         "01H7VFHN7A1ZX5KSMT2YN9RXC4"
     ]
-}
+}
diff --git a/data_generate/generate_conversations.py b/data_generate/generate_conversations.py
@@ -1,85 +1,122 @@
-import subprocess
-import re
-import json
 import argparse
+import json
+import re
+import subprocess
 
 
 def overwrite_eval_bash(
-        eval_script: str,
-        tag: str,
-        env_ids: list,
-        batch_size: int = 2,
-        agent1_model: str = "gpt-3.5-turbo",
-        agent2_model: str = "gpt-3.5-turbo",
-        push_to_db: bool = True) -> None:
+    eval_script: str,
+    tag: str,
+    env_ids: list,
+    batch_size: int = 2,
+    agent1_model: str = "gpt-3.5-turbo",
+    agent2_model: str = "gpt-3.5-turbo",
+    push_to_db: bool = True,
+) -> None:
 
-    with open(eval_script, 'r') as f:
+    with open(eval_script, "r") as f:
         lines = f.readlines()
 
     for i in range(len(lines)):
         # change TAG, TAG_TO_CHECK_EXISTING_EPISODES
         if "--gin.TAG_TO_CHECK_EXISTING_EPISODES" in lines[i]:
-            pattern = r'(--gin\.TAG_TO_CHECK_EXISTING_EPISODES=")([^"]*)(".*\n)'
-            lines[i] = re.sub(pattern, r'\1' + tag + r'\3', lines[i])
+            pattern = (
+                r'(--gin\.TAG_TO_CHECK_EXISTING_EPISODES=")([^"]*)(".*\n)'
+            )
+            lines[i] = re.sub(pattern, r"\1" + tag + r"\3", lines[i])
         elif "--gin.TAG" in lines[i]:
             pattern = r'(--gin\.TAG=")([^"]*)(".*\n)'
-            lines[i] = re.sub(pattern, r'\1' + tag + r'\3', lines[i])
+            lines[i] = re.sub(pattern, r"\1" + tag + r"\3", lines[i])
         # change ENV_IDS
         elif "--gin.ENV_IDS" in lines[i]:
-            pattern = r'(--gin\.ENV_IDS=).*?(\s*\\)'
+            pattern = r"(--gin\.ENV_IDS=).*?(\s*\\)"
             lines[i] = re.sub(
-                pattern, r'\1' + json.dumps(env_ids) + r"'" + r'\2', lines[i])
+                pattern, r"\1" + json.dumps(env_ids) + r"'" + r"\2", lines[i]
+            )
         # change batch size
         elif "--gin.BATCH_SIZE" in lines[i]:
-            pattern = r'(--gin\.BATCH_SIZE=)(\d+)'
-            lines[i] = re.sub(pattern, r'\g<1>' + str(batch_size), lines[i])
+            pattern = r"(--gin\.BATCH_SIZE=)(\d+)"
+            lines[i] = re.sub(pattern, r"\g<1>" + str(batch_size), lines[i])
         # change agent models
         elif "--gin.AGENT1_MODEL" in lines[i]:
             pattern = r'(--gin\.AGENT1_MODEL=")([^"]*)(".*\n)'
-            lines[i] = re.sub(pattern, r'\1' + agent1_model + r'\3', lines[i])
+            lines[i] = re.sub(pattern, r"\1" + agent1_model + r"\3", lines[i])
         elif "--gin.AGENT2_MODEL" in lines[i]:
             pattern = r'(--gin\.AGENT2_MODEL=")([^"]*)(".*\n)'
-            lines[i] = re.sub(pattern, r'\1' + agent2_model + r'\3', lines[i])
+            lines[i] = re.sub(pattern, r"\1" + agent2_model + r"\3", lines[i])
         # change push to db flag
         elif "--gin.PUSH_TO_DB" in lines[i]:
-            pattern = r'(--gin\.PUSH_TO_DB=)(True|False)'
-            lines[i] = re.sub(
-                pattern, r'\g<1>' + str(push_to_db), lines[i])
+            pattern = r"(--gin\.PUSH_TO_DB=)(True|False)"
+            lines[i] = re.sub(pattern, r"\g<1>" + str(push_to_db), lines[i])
 
-    with open(eval_script, 'w') as f:
-        f.write(''.join(lines))
+    with open(eval_script, "w") as f:
+        f.write("".join(lines))
 
 
 def main():
     parser = argparse.ArgumentParser()
-    parser.add_argument("--eval-script", type=str, required=True,
-                        help="Required. Provide template bash file for sotopia evaluation.")
-    parser.add_argument("--env-file", type=str,
-                        default="env_files/used_env.json", help="Default: env_files/used_env.json. Provide the json file of env ids for conversation generation.")
-    parser.add_argument("--experiment-name", type=str, required=True,
-                        help="Required. Need the experiment_name, which is the key of the env_file.")
-    parser.add_argument("--tag", type=str, required=True,
-                        help="Required. Provide a unique tag that will be pushed to REDIS database.")
-    parser.add_argument("--batch-size", type=int, default=2,
-                        help="Default: 2. Provide the batch size of calling APIs.")
-    parser.add_argument("--agent1-model", type=str, default="gpt-3.5-turbo",
-                        help="Default: gpt-3.5-turbo. Provide the name of OPENAI model.")
-    parser.add_argument("--agent2-model", type=str, default="gpt-3.5-turbo",
-                        help="Default: gpt-3.5-turbo. Provide the name of OPENAI model.")
-    parser.add_argument("--push-to-db", type=str, default=True,
-                        help="Default: True. If you choose False, then the conversations will not be pushed to REDIS database.")
+    parser.add_argument(
+        "--eval-script",
+        type=str,
+        required=True,
+        help="Required. Provide template bash file for sotopia evaluation.",
+    )
+    parser.add_argument(
+        "--env-file",
+        type=str,
+        default="env_files/used_env.json",
+        help="Default: env_files/used_env.json. Provide the json file of env ids for conversation generation.",
+    )
+    parser.add_argument(
+        "--experiment-name",
+        type=str,
+        required=True,
+        help="Required. Need the experiment_name, which is the key of the env_file.",
+    )
+    parser.add_argument(
+        "--tag",
+        type=str,
+        required=True,
+        help="Required. Provide a unique tag that will be pushed to REDIS database.",
+    )
+    parser.add_argument(
+        "--batch-size",
+        type=int,
+        default=2,
+        help="Default: 2. Provide the batch size of calling APIs.",
+    )
+    parser.add_argument(
+        "--agent1-model",
+        type=str,
+        default="gpt-3.5-turbo",
+        help="Default: gpt-3.5-turbo. Provide the name of OPENAI model.",
+    )
+    parser.add_argument(
+        "--agent2-model",
+        type=str,
+        default="gpt-3.5-turbo",
+        help="Default: gpt-3.5-turbo. Provide the name of OPENAI model.",
+    )
+    parser.add_argument(
+        "--push-to-db",
+        type=str,
+        default=True,
+        help="Default: True. If you choose False, then the conversations will not be pushed to REDIS database.",
+    )
     args = parser.parse_args()
 
-    with open(args.env_file, 'r') as f:
+    with open(args.env_file, "r") as f:
         env_ids = json.loads(f.read())[args.experiment_name]
 
-    overwrite_eval_bash(eval_script=args.eval_script,
-                        tag=args.tag,
-                        env_ids=env_ids,
-                        batch_size=args.batch_size,
-                        agent1_model=args.agent1_model,
-                        agent2_model=args.agent2_model,
-                        push_to_db=True if args.push_to_db == "True" else False)
+    overwrite_eval_bash(
+        eval_script=args.eval_script,
+        tag=args.tag,
+        env_ids=env_ids,
+        batch_size=args.batch_size,
+        agent1_model=args.agent1_model,
+        agent2_model=args.agent2_model,
+        push_to_db=True if args.push_to_db == "True" else False,
+    )
 
     command = f"bash {args.eval_script}"
     subprocess.run(command.split())
-Original file line number
+Diff line change
@@ Expand Up / @@ -527,4 +527,4 @@ @@
             "01H7VFHP8AN5643B0NR0NP00VE",
             "01H7VFHN7A1ZX5KSMT2YN9RXC4"
         ]
-    }
+    }