sotopia-lab · ProKil · Jan 7, 2024 · Jan 2, 2024 · Jan 3, 2024 · Jan 3, 2024
diff --git a/.github/workflows/mypy.yml b/.github/workflows/mypy.yml
@@ -1,11 +1,5 @@
 name: Mypy
-on:
-  push:
-    branches:
-    - main
-  pull_request:
-    branches:
-    - main
+on: [push]
 
 jobs:
   Static-Type-Checking:
@@ -21,15 +15,10 @@ jobs:
         python-version: 3.11.2
     - name: Install dependencies
       run: |
-        pip install --upgrade pip
-        pip install -r requirements.txt
-        pip install -e .[dev]
+        curl -sSL https://install.python-poetry.org | python3
+        poetry install --all-extras
     - name: Type-checking package with mypy
       run: |
-        # Manually install mypy in the standard way.
-        pip --quiet install -U mypy
-        # Log this mypy version for debuggability.
-        mypy --version
         # Run this mypy instance against our main package.
-        mypy --install-types --non-interactive sotopia
-        mypy --strict .
+        poetry run mypy --install-types --non-interactive sotopia
+        poetry run mypy --strict .
diff --git a/.github/workflows/tests.yml b/.github/workflows/tests.yml
@@ -1,11 +1,5 @@
 name: Pytest
-on:
-  push:
-    branches:
-    - main
-  pull_request:
-    branches:
-    - main
+on: [push]
 
 jobs:
   Pytest:
@@ -21,13 +15,12 @@ jobs:
         python-version: 3.11.2
     - name: Install dependencies
       run: |
-        pip install --upgrade pip
-        pip install -r requirements.txt
-        pip install -e .[dev]
+        curl -sSL https://install.python-poetry.org | python3
+        poetry install --all-extras
     - name: Test with pytest
       env: # Or as an environment variable
         OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
         REDIS_OM_URL: ${{ secrets.REDIS_OM_URL }}
         TOGETHER_API_KEY: ${{ secrets.TOGETHER_API_KEY }}
       run: |
-        pytest
+        poetry run pytest
diff --git a/README.md b/README.md
@@ -10,11 +10,12 @@
 
 
 ## Installation
+This package supports Python 3.11 and above. In one line,
+`pip install sotopia`.
 
-This package supports Python 3.11 and above. We recommend using a virtual environment to install this package, e.g. with anaconda3: `conda create -n sotopia python=3.11; conda activate sotopia; conda install -c conda-forge pip`. Then, install the requirements and this package.
+Or from scratch, use a virtual environment, e.g. with anaconda3: `conda create -n sotopia python=3.11; conda activate sotopia; curl -sSL https://install.python-poetry.org | python3`. Then, install the requirements and this package.
 ```bash
-python -m pip install -r requirements.txt # make sure the packages are installed in the specific conda environment
-python -m pip install -e .
+poetry install
 ```
 
 OpenAI key is required to run the code. Please set the environment variable `OPENAI_API_KEY` to your key. The recommend way is to add the key to the conda environment:

diff --git a/docs/all_the_issues.md b/docs/all_the_issues.md
@@ -0,0 +1,5 @@
+# Come here if you encounter any issues
+
+## Missing episodes
+
+Large batch size may cause some episodes to be skipped. This is due to the fact that the server may not be able to handle the load. Try reducing the batch size. But you can also use the script in `examples/fix_missing_episodes.py` to fix the missing episodes.
diff --git a/docs/examples.md b/docs/examples.md
@@ -0,0 +1,12 @@
+# Example Scripts For Using The Library
+
+## Example 1: Evaluating existing episodes
+
+```python
+python examples/evaluate_existing_episodes.py --tag=<tag to upload to the database> --model=<the model used to re-evaluate the existing episodes> --batch_size=<batch size used for evaluation> --push-to-db
+```
+
+Run ```python examples/evaluate_existing_episodes.py --help``` for more information.
+
+## Example 2: Generate script-like episodes
+See `docs/simulation_modes.md` for more information.
diff --git a/docs/hyperparameters.md b/docs/hyperparameters.md
@@ -0,0 +1,6 @@
+# Hyperparameters that are used in the simulation
+
+## Tags
+
+- `TAG`: The tag of the simulation. This tag is used to identify the simulation in the database.
+- `TAG_TO_CHECK_EXISTING_EPISODES`: Scripts like `examples/experiment_eval.py` checks if there are existing episodes with the same tag in the database. If there are, the simulation **will not** be run. This is to avoid running the same simulation twice. If you want to run the simulation again, you can change the tag or set `TAG_TO_CHECK_EXISTING_EPISODES` to `None`.
diff --git a/docs/simulation_modes.md b/docs/simulation_modes.md
@@ -0,0 +1,45 @@
+# Different Modes of Simulation
+
+## Simulation Modes
+
+The simulation can be run in different modes. The mode is specified in the configuration file. The following modes are available:
+
+### Sotopia-lite
+
+- `lite`: The simulation runs without characters' detailed background information but just names. To use this mode, set `lite` to `True` in the gin configuration command.
+e.g.,
+```bash
+python examples/experiment_eval.py \
+ --gin_file sotopia_conf/generation_utils_conf/generate.gin \
+ --gin_file sotopia_conf/server_conf/server.gin \
+ --gin_file sotopia_conf/run_async_server_in_batch.gin \
+ '--gin.ENV_IDS=[]' \
+ '--gin.AGENT1_MODEL="gpt-3.5-turbo"' \
+ '--gin.AGENT2_MODEL="gpt-3.5-turbo"' \
+ '--gin.BATCH_SIZE=5' \
+ '--gin.TAG="lite_gpt3.5_gpt3.5"' \
+ '--gin.TAG_TO_CHECK_EXISTING_EPISODES="lite_gpt3.5_gpt3.5"' \
+ '--gin.PUSH_TO_DB=False' \
+ '--gin.OMNISCIENT=False' \
+ '--gin.VERBOSE=False' \
+ '--gin.LITE=True' \
+```
+
+### Sotopia-script
+
+- `script`: The simulation runs with enabling LLMs generating the interaction in one shot with a script writing setting. To use this mode, set `script` to `True` in the gin configuration command.
+
+e.g.,
+
+```bash
+python examples/generate_script.py \
+ --gin_file sotopia_conf/generation_utils_conf/generate.gin \
+ --gin_file sotopia_conf/run_async_server_in_batch_script.gin \
+ '--gin.ENV_IDS=[]' \
+ '--gin.SCRIPT_MODEL="gpt-3.5-turbo"' \
+ '--gin.BATCH_SIZE=5' \
+ '--gin.TAG="lite_script_gpt3.5_gpt3.5"' \
+ '--gin.TAG_TO_CHECK_EXISTING_EPISODES="lite_script_gpt3.5_gpt3.5"' \
+ '--gin.PUSH_TO_DB=True' \
+ '--gin.VERBOSE=False' \
+ ```
diff --git a/examples/evaluate_existing_episode.py b/examples/evaluate_existing_episode.py
@@ -0,0 +1,134 @@
+import asyncio
+import logging
+import subprocess
+from datetime import datetime
+from logging import FileHandler
+
+import gin
+import typer
+from experiment_eval import _iterate_env_agent_combo_not_in_db
+from rich import print
+from rich.logging import RichHandler
+from tqdm import tqdm
+
+from sotopia.agents.llm_agent import Agents
+from sotopia.database.logs import AnnotationForEpisode, EpisodeLog
+from sotopia.database.persistent_profile import EnvironmentProfile
+from sotopia.generation_utils.generate import LLM_Name, agenerate_script
+from sotopia.messages.message_classes import (
+    AgentAction,
+    Observation,
+    ScriptBackground,
+)
+from sotopia.samplers import (
+    BaseSampler,
+    ConstraintBasedSampler,
+    EnvAgentCombo,
+)
+from sotopia.server import aevaluate_one_episode, arun_one_script
+
+# date and message only
+FORMAT = "%(asctime)s - %(levelname)s - %(name)s - %(message)s"
+
+process = subprocess.Popen(
+    ["git", "rev-parse", "HEAD"], shell=False, stdout=subprocess.PIPE
+)
+git_head_hash = process.communicate()[0].strip()
+
+logging.basicConfig(
+    level=15,
+    format=FORMAT,
+    datefmt="[%X]",
+    handlers=[
+        RichHandler(),
+        FileHandler(
+            datetime.now().strftime(
+                f"./logs/%H_%M_%d_%m_%Y_{str(git_head_hash.decode('utf-8'))}.log"
+            )
+        ),
+    ],
+)
+app = typer.Typer()
+
+
+def run_async_server_in_batch_aevaluate(
+    batch_size: int = 10,
+    model: LLM_Name = "gpt-4",
+    reeval_list: list[str] = [],
+    tag: str | None = None,
+    push_to_db: bool = False,
+    verbose: bool = False,
+) -> None:
+
+    if not verbose:
+        logger = logging.getLogger()
+        logger.setLevel(logging.CRITICAL)
+        rich_handler = logger.handlers[0]
+        logger.removeHandler(rich_handler)
+
+    episode_batch: list[EpisodeLog] = []
+
+    while True:
+        for env_pk in tqdm(reeval_list):
+            episode = EpisodeLog.get(env_pk)
+            episode_batch.append(episode)
+            if len(episode_batch) == batch_size:
+                logging.info(
+                    f"Running batch of {batch_size} episodes: {episode_batch}"
+                )
+                for episode in episode_batch:
+                    asyncio.run(
+                        aevaluate_one_episode(
+                            episode=episode,
+                            model=model,
+                            tag=tag,
+                            push_to_db=push_to_db,
+                        )
+                    )
+                episode_batch = []
+        else:
+            if episode_batch:
+                logging.info(
+                    f"Running batch of {batch_size} episodes: {episode_batch}"
+                )
+            for episode in episode_batch:
+                asyncio.run(
+                    aevaluate_one_episode(
+                        episode=episode,
+                        model=model,
+                        tag=tag,
+                        push_to_db=push_to_db,
+                    )
+                )
+            return
+
+
+annotated_episodes_pks = [
+    AnnotationForEpisode.get(anno).episode
+    for anno in AnnotationForEpisode.all_pks()
+]
+annotated_episodes_pks = list(set(annotated_episodes_pks))
+
+
+@app.command()
+def run_server(
+    tag: str = typer.Option("reeval_llama2"),
+    model: LLM_Name = typer.Option("togethercomputer/llama-2-70b-chat"),
+    batch_size: int = typer.Option(5),
+    push_to_db: bool = typer.Option(True),
+    verbose: bool = typer.Option(False),
+) -> None:
+
+    # Call the function with the specified parameters
+    run_async_server_in_batch_aevaluate(
+        tag=tag,
+        model=model,
+        batch_size=batch_size,
+        push_to_db=push_to_db,
+        verbose=verbose,
+        reeval_list=annotated_episodes_pks,
+    )
+
+
+if __name__ == "__main__":
+    app()