Merge branch 'demo' into feature/sotopia-demo-ui

sotopia-lab · Dec 15, 2024 · 9e3651b · 9e3651b
2 parents 1d743c0 + ab6903a
commit 9e3651b
Show file tree

Hide file tree

Showing 30 changed files with 1,794 additions and 4,288 deletions.
diff --git a/docs/pages/concepts/evaluation_dimension.md b/docs/pages/concepts/evaluation_dimension.md
@@ -0,0 +1,116 @@
+## Overview
+
+Evaluation dimensions are used to evaluate the quality of social interactions.
+In original Sotopia paper, there are 7 dimensions to evaluate the quality of social interactions, where we named them as `sotopia` evaluation dimensions:
+- believability
+- relationship
+- knowledge
+- secret
+- social rules
+- financial and material benefits
+- goal
+
+The `SotopiaDimensions` can be used directly without initializing the database. It provides a set of predefined evaluation dimensions that are ready to use for evaluating social interactions. For example,
+
+```python
+from sotopia.envs.parallel import ParallelSotopiaEnv
+from sotopia.envs.evaluators import EvaluationForTwoAgents, ReachGoalLLMEvaluator, RuleBasedTerminatedEvaluator, SotopiaDimensions
+
+env = ParallelSotopiaEnv(
+    env_profile=env_profile,
+        model_name=model_names["env"],
+        action_order="round-robin",
+        evaluators=[
+            RuleBasedTerminatedEvaluator(max_turn_number=20, max_stale_turn=2),
+        ],
+        terminal_evaluators=[
+            ReachGoalLLMEvaluator(
+                model_names["env"],
+                EvaluationForTwoAgents[SotopiaDimensions],  # type: ignore
+                # TODO check how to do type annotation
+            ),
+        ],
+    )
+```
+
+
+However we observe under many use cases people may want to evaluate with customized evaluation metrics, so we provide a way to build custom evaluation dimensions.
+For a quick reference, you can directly check out the `examples/use_custom_dimensions.py`.
+
+### CustomEvaluationDimension
+The [`CustomEvaluationDimension`](/python_API/database/evaluation_dimensions) is a class that can be used to create a custom evaluation dimension.
+There are four parameters:
+- name: the name of the dimension
+- description: the description of the dimension
+- range_low: the minimum score of the dimension (should be an integer)
+- range_high: the maximum score of the dimension (should be an integer)
+
+### CustomEvaluationDimensionList
+The [`CustomEvaluationDimensionList`](/python_API/database/evaluation_dimensions) is a class that can be used to create a custom evaluation dimension list based on the existing dimensions. It helps one to group multiple dimensions together for a specific use case.
+There are two parameters:
+- name: the name of the dimension list
+- dimension_pks: the primary keys of the dimensions in the dimension list
+
+### EvaluationDimensionBuilder
+The [`EvaluationDimensionBuilder`](/python_API/database/evaluation_dimensions) is a class that can be used to generate a custom evaluation dimension model based on the existing dimensions.
+
+
+## Usage
+### Initialize the database
+The default evaluation metric is still `SotopiaDimensions` in `sotopia.env.evaluators`.There is no `CustomEvaluationDimension` in the database by default. To initialize the database, please refer to `examples/use_custom_dimensions.py`.
+
+
+### Use the custom evaluation dimensions
+After you initialize your customized evaluation dimensions, you can choose to use any one of these methods provided below:
+
+#### Method 1: Choose dimensions by names
+```python
+evaluation_dimensions = (
+    EvaluationDimensionBuilder.select_existing_dimension_model_by_name(
+        ["transactivity", "verbal_equity"]
+    )
+)
+```
+
+#### Method 2: Directly choose the grouped evaluation dimension list
+```python
+evaluation_dimensions = (
+    EvaluationDimensionBuilder.select_existing_dimension_model_by_list_name(
+        "sotopia"
+    )
+)
+```
+
+#### Method 3: Build a custom evaluation dimension model temporarily
+We provide multiple ways to build a custom evaluation dimension model with `EvaluationDimensionBuilder`, specifically:
+- `generate_dimension_model`: build an evaluation dimension from existing dimension primary keys.
+- `generate_dimension_model_from_dict`: build an evaluation dimension from a dictionary that specifies the parameters of the `CustomEvaluationDimension`. For example
+```json
+[
+    {
+        "name": "believability",
+        "description": "The believability of the interaction",
+        "range_low": 0,
+        "range_high": 10
+    },
+    ...
+]
+```
+- `select_existing_dimension_model_by_name`: build an evaluation dimension from existing dimension names. For example `['believability', 'goal']`
+- `select_existing_dimension_model_by_list_name`: build an evaluation dimension from existing `CustomEvaluationDimensionList` list names. For example, directly use `sotopia`.
+
+
+After you get the evaluation dimension model, you can pass it as a parameter for the `Evaluator`, for example,
+```python
+evaluation_dimensions = (
+    EvaluationDimensionBuilder.select_existing_dimension_model_by_list_name(
+        "sotopia"
+    )
+)
+terminal_evaluators=[
+    ReachGoalLLMEvaluator(
+        model_names["env"],
+        EvaluationForTwoAgents[evaluation_dimensions],  # type: ignore
+    ),
+],
+```
diff --git a/docs/pages/python_API/database/evaluation_dimensions.md b/docs/pages/python_API/database/evaluation_dimensions.md
@@ -0,0 +1,54 @@
+# `evaluation_dimensions.py`
+
+This module provides classes and utilities for defining and managing custom evaluation dimensions within the Sotopia environment. It includes classes for individual dimensions, lists of dimensions, and a builder for creating dimension models.
+
+## Classes
+
+### `CustomEvaluationDimension`
+
+Represents a custom evaluation dimension with specific attributes such as name, description, and score range.
+
+#### Attributes
+- `name`: `str`. The name of the dimension.
+- `description`: `str`. A brief description of the dimension.
+- `range_low`: `int`. The minimum score for the dimension.
+- `range_high`: `int`. The maximum score for the dimension.
+
+### `CustomEvaluationDimensionList`
+
+Groups multiple custom evaluation dimensions together.
+
+#### Attributes
+- `name`: `str`. The name of the dimension list.
+- `dimension_pks`: `list[str]`. A list of primary keys for the dimensions included in the list.
+
+### `EvaluationDimensionBuilder`
+
+Provides utility methods to create and manage evaluation dimension models.
+
+#### Methods
+- `create_range_validator(low: int, high: int)`: Creates a validator for score ranges.
+
+  **Arguments:**
+  - `low`: `int`. The minimum score allowed.
+  - `high`: `int`. The maximum score allowed.
+
+- `build_dimension_model(dimension_ids: list[str])`: Builds a dimension model from primary keys.
+
+  **Arguments:**
+  - `dimension_ids`: `list[str]`. A list of dimension primary keys.
+
+- `build_dimension_model_from_dict(dimensions: list[dict[str, Union[str, int]]])`: Builds a dimension model from a dictionary.
+
+  **Arguments:**
+  - `dimensions`: `list[dict[str, Union[str, int]]]`. A list of dictionaries specifying dimension attributes.
+
+- `select_existing_dimension_model_by_name(dimension_names: list[str])`: Selects a dimension model by dimension names.
+
+  **Arguments:**
+  - `dimension_names`: `list[str]`. A list of dimension names.
+
+- `select_existing_dimension_model_by_list_name(list_name: str)`: Selects a dimension model by list name.
+
+  **Arguments:**
+  - `list_name`: `str`. The name of the dimension list.
diff --git a/examples/experiment_eval.py b/examples/experiment_eval.py
@@ -17,6 +17,7 @@
     EnvAgentComboStorage,
     EnvironmentProfile,
     EpisodeLog,
+    EvaluationDimensionBuilder,
 )
 from sotopia.envs.evaluators import (
     EvaluationForTwoAgents,
@@ -34,6 +35,7 @@
 )
 from sotopia.server import run_async_server
 from sotopia_conf.gin_utils import parse_gin_flags, run
+# from sotopia.database import EvaluationDimensionBuilder
 
 _DEFAULT_GIN_SEARCH_PATHS = [
     os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
@@ -109,6 +111,18 @@ def _iterate_env_agent_combo_not_in_db(
     tag: str | None = None,
 ) -> Generator[EnvAgentCombo[Observation, AgentAction], None, None]:
     """We iterate over each environment and return the **first** env-agent combo that is not in the database."""
+    # loading evaluation metric
+    try:
+        evaluation_dimensions = EvaluationDimensionBuilder.select_existing_dimension_model_by_list_name(
+            "sotopia"
+        )  # Initialize your customized dimension, please refer to `examples/use_custom_dimensions.py`
+    except Exception as e:
+        print(
+            "No customized evaluation dimensions found, using default SotopiaDimensions",
+            e,
+        )
+        evaluation_dimensions = SotopiaDimensions
+
     if not env_ids:
         env_ids = list(EnvironmentProfile.all_pks())
     for env_id in env_ids:
@@ -152,7 +166,8 @@ def _iterate_env_agent_combo_not_in_db(
                 terminal_evaluators=[
                     ReachGoalLLMEvaluator(
                         model_names["env"],
-                        EvaluationForTwoAgents[SotopiaDimensions],
+                        EvaluationForTwoAgents[evaluation_dimensions],  # type: ignore
+                        # TODO check how to do type annotation
                     ),
                 ],
             )

diff --git a/examples/experimental/nodes/initial_message_node.py b/examples/experimental/nodes/initial_message_node.py
@@ -18,6 +18,7 @@ def __init__(
         input_tick_channel: str,
         output_channels: list[str],
         env_scenario: str,
+        node_name: str,
         redis_url: str = "redis://localhost:6379/0",
     ):
         super().__init__(
@@ -26,6 +27,7 @@ def __init__(
                 (output_channel, Text) for output_channel in output_channels
             ],
             redis_url=redis_url,
+            node_name=node_name,
         )
         self.env_scenario = env_scenario
         self.output_channels = output_channels

diff --git a/examples/experimental/sotopia_original_replica/llm_agent_sotopia.py b/examples/experimental/sotopia_original_replica/llm_agent_sotopia.py
@@ -0,0 +1,113 @@
+import logging
+import sys
+from rich.logging import RichHandler
+
+from aact import NodeFactory
+
+from sotopia.experimental.agents.base_agent import BaseAgent
+from sotopia.experimental.agents.datamodels import Observation, AgentAction
+
+from sotopia.generation_utils import agenerate
+from sotopia.generation_utils.generate import StrOutputParser
+
+# Check Python version
+if sys.version_info >= (3, 11):
+    pass
+else:
+    pass
+
+# Configure logging
+FORMAT = "%(asctime)s - %(levelname)s - %(name)s - %(message)s"
+logging.basicConfig(
+    level=logging.WARNING,
+    format=FORMAT,
+    datefmt="[%X]",
+    handlers=[RichHandler()],
+)
+
+
+@NodeFactory.register("llm_agent")
+class LLMAgent(BaseAgent[Observation, AgentAction]):
+    def __init__(
+        self,
+        input_channels: list[str],
+        output_channel: str,
+        query_interval: int,
+        agent_name: str,
+        node_name: str,
+        goal: str,
+        model_name: str,
+        redis_url: str,
+    ):
+        super().__init__(
+            [(input_channel, Observation) for input_channel in input_channels],
+            [(output_channel, AgentAction)],
+            redis_url,
+            node_name,
+        )
+        self.output_channel = output_channel
+        self.query_interval = query_interval
+        self.count_ticks = 0
+        self.message_history: list[Observation] = []
+        self.name = agent_name
+        self.model_name = model_name
+        self.goal = goal
+
+    def _format_message_history(self, message_history: list[Observation]) -> str:
+        ## TODO: akhatua Fix the mapping of action to be gramatically correct
+        return "\n".join(message.to_natural_language() for message in message_history)
+
+    async def aact(self, obs: Observation) -> AgentAction:
+        if obs.turn_number == -1:
+            return AgentAction(
+                agent_name=self.name,
+                output_channel=self.output_channel,
+                action_type="none",
+                argument=self.model_name,
+            )
+
+        self.message_history.append(obs)
+
+        if len(obs.available_actions) == 1 and "none" in obs.available_actions:
+            return AgentAction(
+                agent_name=self.name,
+                output_channel=self.output_channel,
+                action_type="none",
+                argument="",
+            )
+        elif len(obs.available_actions) == 1 and "leave" in obs.available_actions:
+            self.shutdown_event.set()
+            return AgentAction(
+                agent_name=self.name,
+                output_channel=self.output_channel,
+                action_type="leave",
+                argument="",
+            )
+        else:
+            history = self._format_message_history(self.message_history)
+            action: str = await agenerate(
+                model_name=self.model_name,
+                template="Imagine that you are a friend of the other persons. Here is the "
+                "conversation between you and them.\n"
+                "You are {agent_name} in the conversation.\n"
+                "{message_history}\n"
+                "and you plan to {goal}.\n"
+                "You can choose to interrupt the other person "
+                "by saying something or not to interrupt by outputting notiong. What would you say? "
+                "Please only output a sentence or not outputting anything."
+                "{format_instructions}",
+                input_values={
+                    "message_history": history,
+                    "goal": self.goal,
+                    "agent_name": self.name,
+                },
+                temperature=0.7,
+                output_parser=StrOutputParser(),
+            )
+
+            return AgentAction(
+                agent_name=self.name,
+                output_channel=self.output_channel,
+                action_type="speak",
+                argument=action,
+            )