Test suite v0.4 (#637)

* test: add `__init__` to make `tests/` a package * test: add llm_event_spy fixture for tests * test: add VCR.py fixture for HTTP interaction recording * deps: group integration-testing * test: add fixture to mock package availability in tests * test: Add integration tests for OpenAI provider and features * test: add tests for concurrent API requests handling * Improve vcr.py configuration Signed-off-by: Teo <teocns@gmail.com> * ruff Signed-off-by: Teo <teocns@gmail.com> * chore(pyproject): update pytest options and loop scope * chore(tests): update vcr.py ignore_hosts and options * pyproject.toml Signed-off-by: Teo <teocns@gmail.com> * centralize teardown in conftest.py (clear singletons, end all sessions) Signed-off-by: Teo <teocns@gmail.com> * change vcr_config scope to session Signed-off-by: Teo <teocns@gmail.com> * integration: auto start agentops session Signed-off-by: Teo <teocns@gmail.com> * Move unit tests to dedicated folder (tests/unit) Signed-off-by: Teo <teocns@gmail.com> * Isolate vcr_config import into tests/integration Signed-off-by: Teo <teocns@gmail.com> * configure pytest to run only unit tests by default, and include integration tests only when explicitly specified. Signed-off-by: Teo <teocns@gmail.com> * ci(python-tests): separate job between unit-integration tests * set python-tests timeout to 5 minutes Signed-off-by: Teo <teocns@gmail.com> * ruff Signed-off-by: Teo <teocns@gmail.com> * Implement jwt fixture, centralized reusable mock_req into conftest.py Signed-off-by: Teo <teocns@gmail.com> reauthorize Signed-off-by: Teo <teocns@gmail.com> * ci(python-tests): simplify env management, remove cov from integration-teests Signed-off-by: Teo <teocns@gmail.com> * ruff Signed-off-by: Teo <teocns@gmail.com> * fix: cassette for test_concurrent_api_requests Signed-off-by: Teo <teocns@gmail.com> * Cleanup vcr.py comments Signed-off-by: Teo <teocns@gmail.com> * add a `TODO` for removing `vcrpy` git version after its release * refactor openai assistants response handling for easier testing * add more keys for different llm providers * add integration tests for other providers * remove openai version limitation * add providers as deps * chore: add mistralai to test dependencies * remove `mistral` from dependency since its incorrect * ruff * re-record cassettes * tests/fixtures/providers: fallback to `test-api-key` if no provider is found all provider fixtures will: Use the actual API key if it's set in the environment Fall back to "test-api-key" if no environment variable is found Signed-off-by: Teo <teocns@gmail.com> * set keys for `litellm` * Improve tests/integration/test_llm_providers.py openai assistants Signed-off-by: Teo <teocns@gmail.com> * Make integration tests appropriately skip, regenerate x1 cassette Signed-off-by: Teo <teocns@gmail.com> * explicit tests/integration/conftest finxtures import Signed-off-by: Teo <teocns@gmail.com> * deps: improve dev packages versionings * Make integration tests run with python 3.12 Signed-off-by: Teo <teocns@gmail.com> * add uv.lock Signed-off-by: Teo <teocns@gmail.com> * test concurrent api requests: remove matcher on method, possibly causing contingent error Signed-off-by: Teo <teocns@gmail.com> * Run static-analysis with python 3.12.2 Signed-off-by: Teo <teocns@gmail.com> --------- Signed-off-by: Teo <teocns@gmail.com> Co-authored-by: Pratyush Shukla <ps4534@nyu.edu>
AgentOps-AI · Jan 15, 2025 · ae0f11b · ae0f11b
1 parent 81c60c6
commit ae0f11b
Show file tree

Hide file tree

Showing 41 changed files with 7,335 additions and 330 deletions.
diff --git a/.gitattributes b/.gitattributes
@@ -0,0 +1 @@
+uv.lock binary
diff --git a/.github/workflows/python-tests.yaml b/.github/workflows/python-tests.yaml
@@ -1,6 +1,19 @@
 # :: Use nektos/act to run this locally
 # :: Example: 
-# :: `act push -j python-tests --matrix python-version:3.10 --container-architecture linux/amd64`
+# :: `act push -j unit-tests --matrix python-version:3.10 --container-architecture linux/amd64`
+#
+# This workflow runs two separate test suites:
+# 1. Unit Tests (python-tests job):
+#    - Runs across Python 3.9 to 3.13
+#    - Located in tests/unit directory
+#    - Coverage report uploaded to Codecov for Python 3.11 only
+#
+# 2. Integration Tests (integration-tests job):
+#    - Runs only on Python 3.13
+#    - Located in tests/integration directory
+#    - Longer timeout (15 min vs 10 min for unit tests)
+#    - Separate cache for dependencies
+
 name: Python Tests
 on:
   workflow_dispatch: {}
@@ -23,10 +36,12 @@ on:
       - 'tests/**/*.ipynb'
 
 jobs:
-  python-tests:
+  unit-tests:
     runs-on: ubuntu-latest
     env:
       OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
+      AGENTOPS_API_KEY: ${{ secrets.AGENTOPS_API_KEY }}
+      PYTHONUNBUFFERED: "1"
 
     strategy:
       matrix:
@@ -49,14 +64,10 @@ jobs:
         run: |
           uv sync --group test --group dev
 
-      - name: Run tests with coverage
-        timeout-minutes: 10
+      - name: Run unit tests with coverage
+        timeout-minutes: 5
         run: |
-          uv run -m pytest tests/ -v --cov=agentops --cov-report=xml
-        env:
-          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
-          AGENTOPS_API_KEY: ${{ secrets.AGENTOPS_API_KEY }}
-          PYTHONUNBUFFERED: "1"
+          uv run -m pytest tests/unit -v --cov=agentops --cov-report=xml
 
       # Only upload coverage report for python3.11
       - name: Upload coverage to Codecov
@@ -68,3 +79,31 @@ jobs:
           flags: unittests
           name: codecov-umbrella
           fail_ci_if_error: true # Should we?
+
+  integration-tests:
+    runs-on: ubuntu-latest
+    env:
+      OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
+      AGENTOPS_API_KEY: ${{ secrets.AGENTOPS_API_KEY }}
+      PYTHONUNBUFFERED: "1"
+
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Setup UV
+        uses: astral-sh/setup-uv@v5
+        continue-on-error: true
+        with:
+          python-version: "3.12"
+          enable-cache: true
+          cache-suffix: uv-3.12-integration
+          cache-dependency-glob: "**/pyproject.toml"
+
+      - name: Install dependencies
+        run: |
+          uv sync --group test --group dev
+
+      - name: Run integration tests
+        timeout-minutes: 5
+        run: |
+          uv run pytest tests/integration
diff --git a/.github/workflows/static-analysis.yaml b/.github/workflows/static-analysis.yaml
@@ -40,7 +40,7 @@ jobs:
         with:
           enable-cache: true
           cache-dependency-glob: "**/pyproject.toml"
-          python-version: "3.11.10"
+          python-version: "3.12.2"
 
       - name: Install packages
         run: |

diff --git a/agentops/llms/providers/openai.py b/agentops/llms/providers/openai.py
@@ -136,6 +136,69 @@ async def async_generator():
 
         return response
 
+    def handle_assistant_response(self, response, kwargs, init_timestamp, session: Optional[Session] = None) -> dict:
+        """Handle response based on return type"""
+        from openai.pagination import BasePage
+
+        action_event = ActionEvent(init_timestamp=init_timestamp, params=kwargs)
+        if session is not None:
+            action_event.session_id = session.session_id
+
+        try:
+            # Set action type and returns
+            action_event.action_type = (
+                response.__class__.__name__.split("[")[1][:-1]
+                if isinstance(response, BasePage)
+                else response.__class__.__name__
+            )
+            action_event.returns = response.model_dump() if hasattr(response, "model_dump") else response
+            action_event.end_timestamp = get_ISO_time()
+            self._safe_record(session, action_event)
+
+            # Create LLMEvent if usage data exists
+            response_dict = response.model_dump() if hasattr(response, "model_dump") else {}
+
+            if "id" in response_dict and response_dict.get("id").startswith("run"):
+                if response_dict["id"] not in self.assistants_run_steps:
+                    self.assistants_run_steps[response_dict.get("id")] = {"model": response_dict.get("model")}
+
+            if "usage" in response_dict and response_dict["usage"] is not None:
+                llm_event = LLMEvent(init_timestamp=init_timestamp, params=kwargs)
+                if session is not None:
+                    llm_event.session_id = session.session_id
+
+                llm_event.model = response_dict.get("model")
+                llm_event.prompt_tokens = response_dict["usage"]["prompt_tokens"]
+                llm_event.completion_tokens = response_dict["usage"]["completion_tokens"]
+                llm_event.end_timestamp = get_ISO_time()
+                self._safe_record(session, llm_event)
+
+            elif "data" in response_dict:
+                for item in response_dict["data"]:
+                    if "usage" in item and item["usage"] is not None:
+                        llm_event = LLMEvent(init_timestamp=init_timestamp, params=kwargs)
+                        if session is not None:
+                            llm_event.session_id = session.session_id
+
+                        llm_event.model = self.assistants_run_steps[item["run_id"]]["model"]
+                        llm_event.prompt_tokens = item["usage"]["prompt_tokens"]
+                        llm_event.completion_tokens = item["usage"]["completion_tokens"]
+                        llm_event.end_timestamp = get_ISO_time()
+                        self._safe_record(session, llm_event)
+
+        except Exception as e:
+            self._safe_record(session, ErrorEvent(trigger_event=action_event, exception=e))
+
+            kwargs_str = pprint.pformat(kwargs)
+            response = pprint.pformat(response)
+            logger.warning(
+                f"Unable to parse response for Assistants API. Skipping upload to AgentOps\n"
+                f"response:\n {response}\n"
+                f"kwargs:\n {kwargs_str}\n"
+            )
+
+        return response
+
     def override(self):
         self._override_openai_v1_completion()
         self._override_openai_v1_async_completion()
@@ -234,68 +297,6 @@ def _override_openai_assistants_beta(self):
         """Override OpenAI Assistants API methods"""
         from openai._legacy_response import LegacyAPIResponse
         from openai.resources import beta
-        from openai.pagination import BasePage
-
-        def handle_response(response, kwargs, init_timestamp, session: Optional[Session] = None) -> dict:
-            """Handle response based on return type"""
-            action_event = ActionEvent(init_timestamp=init_timestamp, params=kwargs)
-            if session is not None:
-                action_event.session_id = session.session_id
-
-            try:
-                # Set action type and returns
-                action_event.action_type = (
-                    response.__class__.__name__.split("[")[1][:-1]
-                    if isinstance(response, BasePage)
-                    else response.__class__.__name__
-                )
-                action_event.returns = response.model_dump() if hasattr(response, "model_dump") else response
-                action_event.end_timestamp = get_ISO_time()
-                self._safe_record(session, action_event)
-
-                # Create LLMEvent if usage data exists
-                response_dict = response.model_dump() if hasattr(response, "model_dump") else {}
-
-                if "id" in response_dict and response_dict.get("id").startswith("run"):
-                    if response_dict["id"] not in self.assistants_run_steps:
-                        self.assistants_run_steps[response_dict.get("id")] = {"model": response_dict.get("model")}
-
-                if "usage" in response_dict and response_dict["usage"] is not None:
-                    llm_event = LLMEvent(init_timestamp=init_timestamp, params=kwargs)
-                    if session is not None:
-                        llm_event.session_id = session.session_id
-
-                    llm_event.model = response_dict.get("model")
-                    llm_event.prompt_tokens = response_dict["usage"]["prompt_tokens"]
-                    llm_event.completion_tokens = response_dict["usage"]["completion_tokens"]
-                    llm_event.end_timestamp = get_ISO_time()
-                    self._safe_record(session, llm_event)
-
-                elif "data" in response_dict:
-                    for item in response_dict["data"]:
-                        if "usage" in item and item["usage"] is not None:
-                            llm_event = LLMEvent(init_timestamp=init_timestamp, params=kwargs)
-                            if session is not None:
-                                llm_event.session_id = session.session_id
-
-                            llm_event.model = self.assistants_run_steps[item["run_id"]]["model"]
-                            llm_event.prompt_tokens = item["usage"]["prompt_tokens"]
-                            llm_event.completion_tokens = item["usage"]["completion_tokens"]
-                            llm_event.end_timestamp = get_ISO_time()
-                            self._safe_record(session, llm_event)
-
-            except Exception as e:
-                self._safe_record(session, ErrorEvent(trigger_event=action_event, exception=e))
-
-                kwargs_str = pprint.pformat(kwargs)
-                response = pprint.pformat(response)
-                logger.warning(
-                    f"Unable to parse response for Assistants API. Skipping upload to AgentOps\n"
-                    f"response:\n {response}\n"
-                    f"kwargs:\n {kwargs_str}\n"
-                )
-
-            return response
 
         def create_patched_function(original_func):
             def patched_function(*args, **kwargs):
@@ -309,7 +310,7 @@ def patched_function(*args, **kwargs):
                 if isinstance(response, LegacyAPIResponse):
                     return response
 
-                return handle_response(response, kwargs, init_timestamp, session=session)
+                return self.handle_assistant_response(response, kwargs, init_timestamp, session=session)
 
             return patched_function
 

diff --git a/pyproject.toml b/pyproject.toml
@@ -41,30 +41,47 @@ dependencies = [
 
 [dependency-groups]
 test = [
-    "openai>=1.0.0,<2.0.0",
-    "langchain",
+    "openai>=1.0.0",
+    "anthropic",
+    "cohere",
+    "litellm",
+    "ai21>=3.0.0",
+    "groq",
+    "ollama",
+    "mistralai",
+    # ;;
+    # The below is a really hard dependency, that can be installed only between python >=3.10,<3.13.
+    # CI will fail because all tests will automatically pull this dependency group;
+    # we need a separate group specifically for integration tests which will run on pinned 3.1x
+    # ------------------------------------------------------------------------------------------------------------------------------------
+    # "crewai-tools @ git+https://github.com/crewAIInc/crewAI-tools.git@a14091abb24527c97ccfcc8539d529c8b4559a0f; python_version>='3.10'", 
+    # ------------------------------------------------------------------------------------------------------------------------------------
+    # ;;
+    "autogen<0.4.0",
     "pytest-cov",
+    "fastapi[standard]",
 ]
 
 dev = [
     # Testing essentials
-    "pytest>=7.4.0,<8.0.0",          # Testing framework with good async support
-    "pytest-depends",         # For testing complex agent workflows
-    "pytest-asyncio",         # Async test support for testing concurrent agent operations
-    "pytest-mock",           # Mocking capabilities for isolating agent components
-    "pyfakefs",             # File system testing
-    "pytest-recording",      # Alternative to pytest-vcr with better Python 3.x support
-    "vcrpy @ git+https://github.com/kevin1024/vcrpy.git@81978659f1b18bbb7040ceb324a19114e4a4f328",
+    "pytest>=8.0.0", # Testing framework with good async support
+    "pytest-depends", # For testing complex agent workflows
+    "pytest-asyncio", # Async test support for testing concurrent agent operations
+    "pytest-mock", # Mocking capabilities for isolating agent components
+    "pyfakefs", # File system testing
+    "pytest-recording", # Alternative to pytest-vcr with better Python 3.x support
+    # TODO: Use release version after vcrpy is released with this fix.
+    "vcrpy @ git+https://github.com/kevin1024/vcrpy.git@5f1b20c4ca4a18c1fc8cfe049d7df12ca0659c9b",
     # Code quality and type checking
-    "ruff",                  # Fast Python linter for maintaining code quality
-    "mypy",                # Static type checking for better reliability
-    "types-requests",      # Type stubs for requests library
-
+    "ruff", # Fast Python linter for maintaining code quality
+    "mypy", # Static type checking for better reliability
+    "types-requests", # Type stubs for requests library
     # HTTP mocking and environment
     "requests_mock>=1.11.0", # Mock HTTP requests for testing agent external communications
-    "python-dotenv",         # Environment management for secure testing
-
+    "python-dotenv", # Environment management for secure testing
     # Agent integration testing
+    "pytest-sugar>=1.0.0",
+    "pdbpp>=0.10.3",
 ]
 
 # CI dependencies
@@ -89,19 +106,17 @@ constraint-dependencies = [
     # For Python ≥3.10 (where autogen-core might be present), use newer versions
     "opentelemetry-api>=1.27.0; python_version>='3.10'",
     "opentelemetry-sdk>=1.27.0; python_version>='3.10'",
-    "opentelemetry-exporter-otlp-proto-http>=1.27.0; python_version>='3.10'"
+    "opentelemetry-exporter-otlp-proto-http>=1.27.0; python_version>='3.10'",
 ]
 
 [tool.autopep8]
 max_line_length = 120
 
 [tool.pytest.ini_options]
 asyncio_mode = "auto"
-asyncio_default_fixture_loop_scope = "function" # WARNING: Changing this may break tests. A `module`-scoped session might be faster, but also unstable.
-test_paths = [
-    "tests",
-]
-addopts = "--tb=short -p no:warnings"
+asyncio_default_fixture_loop_scope = "module" # WARNING: Changing this may break tests. A `module`-scoped session might be faster, but also unstable.
+testpaths = ["tests/unit"]  # Default to unit tests
+addopts = "--tb=short -p no:warnings --import-mode=importlib --ignore=tests/integration"  # Ignore integration by default
 pythonpath = ["."]
 faulthandler_timeout = 30  # Reduced from 60
 timeout = 60  # Reduced from 300

diff --git a/tests/__init__.py b/tests/__init__.py
diff --git a/tests/fixtures/event.py b/tests/fixtures/event.py
@@ -0,0 +1,32 @@
+from collections import defaultdict
+from typing import TYPE_CHECKING
+
+import pytest
+
+if TYPE_CHECKING:
+    from pytest_mock import MockerFixture
+
+
+@pytest.fixture(scope="function")
+def llm_event_spy(agentops_client, mocker: "MockerFixture") -> dict[str, "MockerFixture"]:
+    """
+    Fixture that provides spies on both providers' response handling
+
+    These fixtures are reset on each test run (function scope). To use it,
+    simply pass it as an argument to the test function. Example:
+
+    ```
+    def test_my_test(llm_event_spy):
+        # test code here
+        llm_event_spy["litellm"].assert_called_once()
+    ```
+    """
+    from agentops.llms.providers.anthropic import AnthropicProvider
+    from agentops.llms.providers.litellm import LiteLLMProvider
+    from agentops.llms.providers.openai import OpenAiProvider
+
+    return {
+        "litellm": mocker.spy(LiteLLMProvider(agentops_client), "handle_response"),
+        "openai": mocker.spy(OpenAiProvider(agentops_client), "handle_response"),
+        "anthropic": mocker.spy(AnthropicProvider(agentops_client), "handle_response"),
+    }
diff --git a/tests/fixtures/packaging.py b/tests/fixtures/packaging.py
@@ -0,0 +1,26 @@
+import builtins
+import pytest
+
+
+@pytest.fixture
+def hide_available_pkg(monkeypatch):
+    """
+    Hide the availability of a package by mocking the __import__ function.
+
+    Usage:
+        @pytest.mark.usefixtures('hide_available_pkg')
+        def test_message():
+            with pytest.raises(ImportError, match='Install "pkg" to use test_function'):
+                foo('test_function')
+
+    Source:
+        https://stackoverflow.com/questions/60227582/making-a-python-test-think-an-installed-package-is-not-available
+    """
+    import_orig = builtins.__import__
+
+    def mocked_import(name, *args, **kwargs):
+        if name == "pkg":
+            raise ImportError()
+        return import_orig(name, *args, **kwargs)
+
+    monkeypatch.setattr(builtins, "__import__", mocked_import)