Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync private repo with this public repo #12

Merged
merged 8 commits into from
Jan 7, 2024
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 5 additions & 16 deletions .github/workflows/mypy.yml
Original file line number Diff line number Diff line change
@@ -1,11 +1,5 @@
name: Mypy
on:
push:
branches:
- main
pull_request:
branches:
- main
on: [push]

jobs:
Static-Type-Checking:
Expand All @@ -21,15 +15,10 @@ jobs:
python-version: 3.11.2
- name: Install dependencies
run: |
pip install --upgrade pip
pip install -r requirements.txt
pip install -e .[dev]
curl -sSL https://install.python-poetry.org | python3
poetry install --all-extras
- name: Type-checking package with mypy
run: |
# Manually install mypy in the standard way.
pip --quiet install -U mypy
# Log this mypy version for debuggability.
mypy --version
# Run this mypy instance against our main package.
mypy --install-types --non-interactive sotopia
mypy --strict .
poetry run mypy --install-types --non-interactive sotopia
poetry run mypy --strict .
15 changes: 4 additions & 11 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
@@ -1,11 +1,5 @@
name: Pytest
on:
push:
branches:
- main
pull_request:
branches:
- main
on: [push]

jobs:
Pytest:
Expand All @@ -21,13 +15,12 @@ jobs:
python-version: 3.11.2
- name: Install dependencies
run: |
pip install --upgrade pip
pip install -r requirements.txt
pip install -e .[dev]
curl -sSL https://install.python-poetry.org | python3
poetry install --all-extras
- name: Test with pytest
env: # Or as an environment variable
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
REDIS_OM_URL: ${{ secrets.REDIS_OM_URL }}
TOGETHER_API_KEY: ${{ secrets.TOGETHER_API_KEY }}
run: |
pytest
poetry run pytest
7 changes: 4 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,12 @@


## Installation
This package supports Python 3.11 and above. In one line,
`pip install sotopia`.

This package supports Python 3.11 and above. We recommend using a virtual environment to install this package, e.g. with anaconda3: `conda create -n sotopia python=3.11; conda activate sotopia; conda install -c conda-forge pip`. Then, install the requirements and this package.
Or from scratch, use a virtual environment, e.g. with anaconda3: `conda create -n sotopia python=3.11; conda activate sotopia; curl -sSL https://install.python-poetry.org | python3`. Then, install the requirements and this package.
```bash
python -m pip install -r requirements.txt # make sure the packages are installed in the specific conda environment
python -m pip install -e .
poetry install
```

OpenAI key is required to run the code. Please set the environment variable `OPENAI_API_KEY` to your key. The recommend way is to add the key to the conda environment:
Expand Down
5 changes: 5 additions & 0 deletions docs/all_the_issues.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Come here if you encounter any issues

## Missing episodes

Large batch size may cause some episodes to be skipped. This is due to the fact that the server may not be able to handle the load. Try reducing the batch size. But you can also use the script in `examples/fix_missing_episodes.py` to fix the missing episodes.
12 changes: 12 additions & 0 deletions docs/examples.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Example Scripts For Using The Library

## Example 1: Evaluating existing episodes

```python
python examples/evaluate_existing_episodes.py --tag=<tag to upload to the database> --model=<the model used to re-evaluate the existing episodes> --batch_size=<batch size used for evaluation> --push-to-db
```

Run ```python examples/evaluate_existing_episodes.py --help``` for more information.

## Example 2: Generate script-like episodes
See `docs/simulation_modes.md` for more information.
6 changes: 6 additions & 0 deletions docs/hyperparameters.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Hyperparameters that are used in the simulation

## Tags

- `TAG`: The tag of the simulation. This tag is used to identify the simulation in the database.
- `TAG_TO_CHECK_EXISTING_EPISODES`: Scripts like `examples/experiment_eval.py` checks if there are existing episodes with the same tag in the database. If there are, the simulation **will not** be run. This is to avoid running the same simulation twice. If you want to run the simulation again, you can change the tag or set `TAG_TO_CHECK_EXISTING_EPISODES` to `None`.
45 changes: 45 additions & 0 deletions docs/simulation_modes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# Different Modes of Simulation

## Simulation Modes

The simulation can be run in different modes. The mode is specified in the configuration file. The following modes are available:

### Sotopia-lite

- `lite`: The simulation runs without characters' detailed background information but just names. To use this mode, set `lite` to `True` in the gin configuration command.
e.g.,
```bash
python examples/experiment_eval.py \
--gin_file sotopia_conf/generation_utils_conf/generate.gin \
--gin_file sotopia_conf/server_conf/server.gin \
--gin_file sotopia_conf/run_async_server_in_batch.gin \
'--gin.ENV_IDS=[]' \
'--gin.AGENT1_MODEL="gpt-3.5-turbo"' \
'--gin.AGENT2_MODEL="gpt-3.5-turbo"' \
'--gin.BATCH_SIZE=5' \
'--gin.TAG="lite_gpt3.5_gpt3.5"' \
'--gin.TAG_TO_CHECK_EXISTING_EPISODES="lite_gpt3.5_gpt3.5"' \
'--gin.PUSH_TO_DB=False' \
'--gin.OMNISCIENT=False' \
'--gin.VERBOSE=False' \
'--gin.LITE=True' \
```

### Sotopia-script

- `script`: The simulation runs with enabling LLMs generating the interaction in one shot with a script writing setting. To use this mode, set `script` to `True` in the gin configuration command.

e.g.,

```bash
python examples/generate_script.py \
--gin_file sotopia_conf/generation_utils_conf/generate.gin \
--gin_file sotopia_conf/run_async_server_in_batch_script.gin \
'--gin.ENV_IDS=[]' \
'--gin.SCRIPT_MODEL="gpt-3.5-turbo"' \
'--gin.BATCH_SIZE=5' \
'--gin.TAG="lite_script_gpt3.5_gpt3.5"' \
'--gin.TAG_TO_CHECK_EXISTING_EPISODES="lite_script_gpt3.5_gpt3.5"' \
'--gin.PUSH_TO_DB=True' \
'--gin.VERBOSE=False' \
```
134 changes: 134 additions & 0 deletions examples/evaluate_existing_episode.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
import asyncio
import logging
import subprocess
from datetime import datetime
from logging import FileHandler

import gin
import typer
from experiment_eval import _iterate_env_agent_combo_not_in_db
from rich import print
from rich.logging import RichHandler
from tqdm import tqdm

from sotopia.agents.llm_agent import Agents
from sotopia.database.logs import AnnotationForEpisode, EpisodeLog
from sotopia.database.persistent_profile import EnvironmentProfile
from sotopia.generation_utils.generate import LLM_Name, agenerate_script
from sotopia.messages.message_classes import (
AgentAction,
Observation,
ScriptBackground,
)
from sotopia.samplers import (
BaseSampler,
ConstraintBasedSampler,
EnvAgentCombo,
)
from sotopia.server import aevaluate_one_episode, arun_one_script

# date and message only
FORMAT = "%(asctime)s - %(levelname)s - %(name)s - %(message)s"

process = subprocess.Popen(
["git", "rev-parse", "HEAD"], shell=False, stdout=subprocess.PIPE
)
git_head_hash = process.communicate()[0].strip()

logging.basicConfig(
level=15,
format=FORMAT,
datefmt="[%X]",
handlers=[
RichHandler(),
FileHandler(
datetime.now().strftime(
f"./logs/%H_%M_%d_%m_%Y_{str(git_head_hash.decode('utf-8'))}.log"
)
),
],
)
app = typer.Typer()


def run_async_server_in_batch_aevaluate(
batch_size: int = 10,
model: LLM_Name = "gpt-4",
reeval_list: list[str] = [],
tag: str | None = None,
push_to_db: bool = False,
verbose: bool = False,
) -> None:

if not verbose:
logger = logging.getLogger()
logger.setLevel(logging.CRITICAL)
rich_handler = logger.handlers[0]
logger.removeHandler(rich_handler)

episode_batch: list[EpisodeLog] = []

while True:
for env_pk in tqdm(reeval_list):
episode = EpisodeLog.get(env_pk)
episode_batch.append(episode)
if len(episode_batch) == batch_size:
logging.info(
f"Running batch of {batch_size} episodes: {episode_batch}"
)
for episode in episode_batch:
asyncio.run(
aevaluate_one_episode(
episode=episode,
model=model,
tag=tag,
push_to_db=push_to_db,
)
)
episode_batch = []
else:
if episode_batch:
logging.info(
f"Running batch of {batch_size} episodes: {episode_batch}"
)
for episode in episode_batch:
asyncio.run(
aevaluate_one_episode(
episode=episode,
model=model,
tag=tag,
push_to_db=push_to_db,
)
)
return


annotated_episodes_pks = [
AnnotationForEpisode.get(anno).episode
for anno in AnnotationForEpisode.all_pks()
]
annotated_episodes_pks = list(set(annotated_episodes_pks))


@app.command()
def run_server(
tag: str = typer.Option("reeval_llama2"),
model: LLM_Name = typer.Option("togethercomputer/llama-2-70b-chat"),
batch_size: int = typer.Option(5),
push_to_db: bool = typer.Option(True),
verbose: bool = typer.Option(False),
) -> None:

# Call the function with the specified parameters
run_async_server_in_batch_aevaluate(
tag=tag,
model=model,
batch_size=batch_size,
push_to_db=push_to_db,
verbose=verbose,
reeval_list=annotated_episodes_pks,
)


if __name__ == "__main__":
app()
Loading
Loading