Skip to content

Commit

Permalink
Test suite v0.4 (#637)
Browse files Browse the repository at this point in the history
* test: add `__init__` to make `tests/` a package

* test: add llm_event_spy fixture for tests

* test: add VCR.py fixture for HTTP interaction recording

* deps: group integration-testing

* test: add fixture to mock package availability in tests

* test: Add integration tests for OpenAI provider and features

* test: add tests for concurrent API requests handling

* Improve vcr.py configuration

Signed-off-by: Teo <teocns@gmail.com>

* ruff

Signed-off-by: Teo <teocns@gmail.com>

* chore(pyproject): update pytest options and loop scope

* chore(tests): update vcr.py ignore_hosts and options

* pyproject.toml

Signed-off-by: Teo <teocns@gmail.com>

* centralize teardown in conftest.py (clear singletons, end all sessions)

Signed-off-by: Teo <teocns@gmail.com>

* change vcr_config scope to session

Signed-off-by: Teo <teocns@gmail.com>

* integration: auto start agentops session

Signed-off-by: Teo <teocns@gmail.com>

* Move unit tests to dedicated folder (tests/unit)

Signed-off-by: Teo <teocns@gmail.com>

* Isolate vcr_config import into tests/integration

Signed-off-by: Teo <teocns@gmail.com>

* configure pytest to run only unit tests by default, and include integration tests only when explicitly specified.

Signed-off-by: Teo <teocns@gmail.com>

* ci(python-tests): separate job between unit-integration tests

* set python-tests timeout to 5 minutes

Signed-off-by: Teo <teocns@gmail.com>

* ruff

Signed-off-by: Teo <teocns@gmail.com>

* Implement jwt fixture, centralized reusable mock_req into conftest.py

Signed-off-by: Teo <teocns@gmail.com>

reauthorize

Signed-off-by: Teo <teocns@gmail.com>

* ci(python-tests): simplify env management, remove cov from integration-teests

Signed-off-by: Teo <teocns@gmail.com>

* ruff

Signed-off-by: Teo <teocns@gmail.com>

* fix: cassette for test_concurrent_api_requests

Signed-off-by: Teo <teocns@gmail.com>

* Cleanup vcr.py comments

Signed-off-by: Teo <teocns@gmail.com>

* add a `TODO` for removing `vcrpy` git version after its release

* refactor openai assistants response handling for easier testing

* add more keys for different llm providers

* add integration tests for other providers

* remove openai version limitation

* add providers as deps

* chore: add mistralai to test dependencies

* remove `mistral` from dependency since its incorrect

* ruff

* re-record cassettes

* tests/fixtures/providers: fallback to `test-api-key` if no provider is found

all provider fixtures will:
Use the actual API key if it's set in the environment
Fall back to "test-api-key" if no environment variable is found

Signed-off-by: Teo <teocns@gmail.com>

* set keys for `litellm`

* Improve tests/integration/test_llm_providers.py openai assistants

Signed-off-by: Teo <teocns@gmail.com>

* Make integration tests appropriately skip, regenerate x1 cassette

Signed-off-by: Teo <teocns@gmail.com>

* explicit tests/integration/conftest finxtures import

Signed-off-by: Teo <teocns@gmail.com>

* deps: improve dev packages versionings

* Make integration tests run with python 3.12

Signed-off-by: Teo <teocns@gmail.com>

* add uv.lock

Signed-off-by: Teo <teocns@gmail.com>

* test concurrent api requests: remove matcher on method, possibly causing contingent error

Signed-off-by: Teo <teocns@gmail.com>

* Run static-analysis with python 3.12.2

Signed-off-by: Teo <teocns@gmail.com>

---------

Signed-off-by: Teo <teocns@gmail.com>
Co-authored-by: Pratyush Shukla <ps4534@nyu.edu>
  • Loading branch information
teocns and the-praxs authored Jan 15, 2025
1 parent 81c60c6 commit ae0f11b
Show file tree
Hide file tree
Showing 41 changed files with 7,335 additions and 330 deletions.
1 change: 1 addition & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
uv.lock binary
57 changes: 48 additions & 9 deletions .github/workflows/python-tests.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,19 @@
# :: Use nektos/act to run this locally
# :: Example:
# :: `act push -j python-tests --matrix python-version:3.10 --container-architecture linux/amd64`
# :: `act push -j unit-tests --matrix python-version:3.10 --container-architecture linux/amd64`
#
# This workflow runs two separate test suites:
# 1. Unit Tests (python-tests job):
# - Runs across Python 3.9 to 3.13
# - Located in tests/unit directory
# - Coverage report uploaded to Codecov for Python 3.11 only
#
# 2. Integration Tests (integration-tests job):
# - Runs only on Python 3.13
# - Located in tests/integration directory
# - Longer timeout (15 min vs 10 min for unit tests)
# - Separate cache for dependencies

name: Python Tests
on:
workflow_dispatch: {}
Expand All @@ -23,10 +36,12 @@ on:
- 'tests/**/*.ipynb'

jobs:
python-tests:
unit-tests:
runs-on: ubuntu-latest
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
AGENTOPS_API_KEY: ${{ secrets.AGENTOPS_API_KEY }}
PYTHONUNBUFFERED: "1"

strategy:
matrix:
Expand All @@ -49,14 +64,10 @@ jobs:
run: |
uv sync --group test --group dev
- name: Run tests with coverage
timeout-minutes: 10
- name: Run unit tests with coverage
timeout-minutes: 5
run: |
uv run -m pytest tests/ -v --cov=agentops --cov-report=xml
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
AGENTOPS_API_KEY: ${{ secrets.AGENTOPS_API_KEY }}
PYTHONUNBUFFERED: "1"
uv run -m pytest tests/unit -v --cov=agentops --cov-report=xml
# Only upload coverage report for python3.11
- name: Upload coverage to Codecov
Expand All @@ -68,3 +79,31 @@ jobs:
flags: unittests
name: codecov-umbrella
fail_ci_if_error: true # Should we?

integration-tests:
runs-on: ubuntu-latest
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
AGENTOPS_API_KEY: ${{ secrets.AGENTOPS_API_KEY }}
PYTHONUNBUFFERED: "1"

steps:
- uses: actions/checkout@v4

- name: Setup UV
uses: astral-sh/setup-uv@v5
continue-on-error: true
with:
python-version: "3.12"
enable-cache: true
cache-suffix: uv-3.12-integration
cache-dependency-glob: "**/pyproject.toml"

- name: Install dependencies
run: |
uv sync --group test --group dev
- name: Run integration tests
timeout-minutes: 5
run: |
uv run pytest tests/integration
2 changes: 1 addition & 1 deletion .github/workflows/static-analysis.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ jobs:
with:
enable-cache: true
cache-dependency-glob: "**/pyproject.toml"
python-version: "3.11.10"
python-version: "3.12.2"

- name: Install packages
run: |
Expand Down
127 changes: 64 additions & 63 deletions agentops/llms/providers/openai.py
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,69 @@ async def async_generator():

return response

def handle_assistant_response(self, response, kwargs, init_timestamp, session: Optional[Session] = None) -> dict:
"""Handle response based on return type"""
from openai.pagination import BasePage

action_event = ActionEvent(init_timestamp=init_timestamp, params=kwargs)
if session is not None:
action_event.session_id = session.session_id

try:
# Set action type and returns
action_event.action_type = (
response.__class__.__name__.split("[")[1][:-1]
if isinstance(response, BasePage)
else response.__class__.__name__
)
action_event.returns = response.model_dump() if hasattr(response, "model_dump") else response
action_event.end_timestamp = get_ISO_time()
self._safe_record(session, action_event)

# Create LLMEvent if usage data exists
response_dict = response.model_dump() if hasattr(response, "model_dump") else {}

if "id" in response_dict and response_dict.get("id").startswith("run"):
if response_dict["id"] not in self.assistants_run_steps:
self.assistants_run_steps[response_dict.get("id")] = {"model": response_dict.get("model")}

if "usage" in response_dict and response_dict["usage"] is not None:
llm_event = LLMEvent(init_timestamp=init_timestamp, params=kwargs)
if session is not None:
llm_event.session_id = session.session_id

llm_event.model = response_dict.get("model")
llm_event.prompt_tokens = response_dict["usage"]["prompt_tokens"]
llm_event.completion_tokens = response_dict["usage"]["completion_tokens"]
llm_event.end_timestamp = get_ISO_time()
self._safe_record(session, llm_event)

elif "data" in response_dict:
for item in response_dict["data"]:
if "usage" in item and item["usage"] is not None:
llm_event = LLMEvent(init_timestamp=init_timestamp, params=kwargs)
if session is not None:
llm_event.session_id = session.session_id

llm_event.model = self.assistants_run_steps[item["run_id"]]["model"]
llm_event.prompt_tokens = item["usage"]["prompt_tokens"]
llm_event.completion_tokens = item["usage"]["completion_tokens"]
llm_event.end_timestamp = get_ISO_time()
self._safe_record(session, llm_event)

except Exception as e:
self._safe_record(session, ErrorEvent(trigger_event=action_event, exception=e))

kwargs_str = pprint.pformat(kwargs)
response = pprint.pformat(response)
logger.warning(
f"Unable to parse response for Assistants API. Skipping upload to AgentOps\n"
f"response:\n {response}\n"
f"kwargs:\n {kwargs_str}\n"
)

return response

def override(self):
self._override_openai_v1_completion()
self._override_openai_v1_async_completion()
Expand Down Expand Up @@ -234,68 +297,6 @@ def _override_openai_assistants_beta(self):
"""Override OpenAI Assistants API methods"""
from openai._legacy_response import LegacyAPIResponse
from openai.resources import beta
from openai.pagination import BasePage

def handle_response(response, kwargs, init_timestamp, session: Optional[Session] = None) -> dict:
"""Handle response based on return type"""
action_event = ActionEvent(init_timestamp=init_timestamp, params=kwargs)
if session is not None:
action_event.session_id = session.session_id

try:
# Set action type and returns
action_event.action_type = (
response.__class__.__name__.split("[")[1][:-1]
if isinstance(response, BasePage)
else response.__class__.__name__
)
action_event.returns = response.model_dump() if hasattr(response, "model_dump") else response
action_event.end_timestamp = get_ISO_time()
self._safe_record(session, action_event)

# Create LLMEvent if usage data exists
response_dict = response.model_dump() if hasattr(response, "model_dump") else {}

if "id" in response_dict and response_dict.get("id").startswith("run"):
if response_dict["id"] not in self.assistants_run_steps:
self.assistants_run_steps[response_dict.get("id")] = {"model": response_dict.get("model")}

if "usage" in response_dict and response_dict["usage"] is not None:
llm_event = LLMEvent(init_timestamp=init_timestamp, params=kwargs)
if session is not None:
llm_event.session_id = session.session_id

llm_event.model = response_dict.get("model")
llm_event.prompt_tokens = response_dict["usage"]["prompt_tokens"]
llm_event.completion_tokens = response_dict["usage"]["completion_tokens"]
llm_event.end_timestamp = get_ISO_time()
self._safe_record(session, llm_event)

elif "data" in response_dict:
for item in response_dict["data"]:
if "usage" in item and item["usage"] is not None:
llm_event = LLMEvent(init_timestamp=init_timestamp, params=kwargs)
if session is not None:
llm_event.session_id = session.session_id

llm_event.model = self.assistants_run_steps[item["run_id"]]["model"]
llm_event.prompt_tokens = item["usage"]["prompt_tokens"]
llm_event.completion_tokens = item["usage"]["completion_tokens"]
llm_event.end_timestamp = get_ISO_time()
self._safe_record(session, llm_event)

except Exception as e:
self._safe_record(session, ErrorEvent(trigger_event=action_event, exception=e))

kwargs_str = pprint.pformat(kwargs)
response = pprint.pformat(response)
logger.warning(
f"Unable to parse response for Assistants API. Skipping upload to AgentOps\n"
f"response:\n {response}\n"
f"kwargs:\n {kwargs_str}\n"
)

return response

def create_patched_function(original_func):
def patched_function(*args, **kwargs):
Expand All @@ -309,7 +310,7 @@ def patched_function(*args, **kwargs):
if isinstance(response, LegacyAPIResponse):
return response

return handle_response(response, kwargs, init_timestamp, session=session)
return self.handle_assistant_response(response, kwargs, init_timestamp, session=session)

return patched_function

Expand Down
57 changes: 36 additions & 21 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -41,30 +41,47 @@ dependencies = [

[dependency-groups]
test = [
"openai>=1.0.0,<2.0.0",
"langchain",
"openai>=1.0.0",
"anthropic",
"cohere",
"litellm",
"ai21>=3.0.0",
"groq",
"ollama",
"mistralai",
# ;;
# The below is a really hard dependency, that can be installed only between python >=3.10,<3.13.
# CI will fail because all tests will automatically pull this dependency group;
# we need a separate group specifically for integration tests which will run on pinned 3.1x
# ------------------------------------------------------------------------------------------------------------------------------------
# "crewai-tools @ git+https://github.com/crewAIInc/crewAI-tools.git@a14091abb24527c97ccfcc8539d529c8b4559a0f; python_version>='3.10'",
# ------------------------------------------------------------------------------------------------------------------------------------
# ;;
"autogen<0.4.0",
"pytest-cov",
"fastapi[standard]",
]

dev = [
# Testing essentials
"pytest>=7.4.0,<8.0.0", # Testing framework with good async support
"pytest-depends", # For testing complex agent workflows
"pytest-asyncio", # Async test support for testing concurrent agent operations
"pytest-mock", # Mocking capabilities for isolating agent components
"pyfakefs", # File system testing
"pytest-recording", # Alternative to pytest-vcr with better Python 3.x support
"vcrpy @ git+https://github.com/kevin1024/vcrpy.git@81978659f1b18bbb7040ceb324a19114e4a4f328",
"pytest>=8.0.0", # Testing framework with good async support
"pytest-depends", # For testing complex agent workflows
"pytest-asyncio", # Async test support for testing concurrent agent operations
"pytest-mock", # Mocking capabilities for isolating agent components
"pyfakefs", # File system testing
"pytest-recording", # Alternative to pytest-vcr with better Python 3.x support
# TODO: Use release version after vcrpy is released with this fix.
"vcrpy @ git+https://github.com/kevin1024/vcrpy.git@5f1b20c4ca4a18c1fc8cfe049d7df12ca0659c9b",
# Code quality and type checking
"ruff", # Fast Python linter for maintaining code quality
"mypy", # Static type checking for better reliability
"types-requests", # Type stubs for requests library

"ruff", # Fast Python linter for maintaining code quality
"mypy", # Static type checking for better reliability
"types-requests", # Type stubs for requests library
# HTTP mocking and environment
"requests_mock>=1.11.0", # Mock HTTP requests for testing agent external communications
"python-dotenv", # Environment management for secure testing

"python-dotenv", # Environment management for secure testing
# Agent integration testing
"pytest-sugar>=1.0.0",
"pdbpp>=0.10.3",
]

# CI dependencies
Expand All @@ -89,19 +106,17 @@ constraint-dependencies = [
# For Python ≥3.10 (where autogen-core might be present), use newer versions
"opentelemetry-api>=1.27.0; python_version>='3.10'",
"opentelemetry-sdk>=1.27.0; python_version>='3.10'",
"opentelemetry-exporter-otlp-proto-http>=1.27.0; python_version>='3.10'"
"opentelemetry-exporter-otlp-proto-http>=1.27.0; python_version>='3.10'",
]

[tool.autopep8]
max_line_length = 120

[tool.pytest.ini_options]
asyncio_mode = "auto"
asyncio_default_fixture_loop_scope = "function" # WARNING: Changing this may break tests. A `module`-scoped session might be faster, but also unstable.
test_paths = [
"tests",
]
addopts = "--tb=short -p no:warnings"
asyncio_default_fixture_loop_scope = "module" # WARNING: Changing this may break tests. A `module`-scoped session might be faster, but also unstable.
testpaths = ["tests/unit"] # Default to unit tests
addopts = "--tb=short -p no:warnings --import-mode=importlib --ignore=tests/integration" # Ignore integration by default
pythonpath = ["."]
faulthandler_timeout = 30 # Reduced from 60
timeout = 60 # Reduced from 300
Expand Down
Empty file added tests/__init__.py
Empty file.
32 changes: 32 additions & 0 deletions tests/fixtures/event.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
from collections import defaultdict
from typing import TYPE_CHECKING

import pytest

if TYPE_CHECKING:
from pytest_mock import MockerFixture


@pytest.fixture(scope="function")
def llm_event_spy(agentops_client, mocker: "MockerFixture") -> dict[str, "MockerFixture"]:
"""
Fixture that provides spies on both providers' response handling
These fixtures are reset on each test run (function scope). To use it,
simply pass it as an argument to the test function. Example:
```
def test_my_test(llm_event_spy):
# test code here
llm_event_spy["litellm"].assert_called_once()
```
"""
from agentops.llms.providers.anthropic import AnthropicProvider
from agentops.llms.providers.litellm import LiteLLMProvider
from agentops.llms.providers.openai import OpenAiProvider

return {
"litellm": mocker.spy(LiteLLMProvider(agentops_client), "handle_response"),
"openai": mocker.spy(OpenAiProvider(agentops_client), "handle_response"),
"anthropic": mocker.spy(AnthropicProvider(agentops_client), "handle_response"),
}
26 changes: 26 additions & 0 deletions tests/fixtures/packaging.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
import builtins
import pytest


@pytest.fixture
def hide_available_pkg(monkeypatch):
"""
Hide the availability of a package by mocking the __import__ function.
Usage:
@pytest.mark.usefixtures('hide_available_pkg')
def test_message():
with pytest.raises(ImportError, match='Install "pkg" to use test_function'):
foo('test_function')
Source:
https://stackoverflow.com/questions/60227582/making-a-python-test-think-an-installed-package-is-not-available
"""
import_orig = builtins.__import__

def mocked_import(name, *args, **kwargs):
if name == "pkg":
raise ImportError()
return import_orig(name, *args, **kwargs)

monkeypatch.setattr(builtins, "__import__", mocked_import)
Loading

0 comments on commit ae0f11b

Please sign in to comment.