Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup client-side integrations and tests #548

Merged
merged 2 commits into from
Mar 12, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 0 additions & 32 deletions .github/workflows/integrations.yml
Original file line number Diff line number Diff line change
Expand Up @@ -69,35 +69,3 @@ jobs:
if: steps.cache.outputs.cache-hit != 'true'
- name: Test
run: make test-cpu

# disabled for now, due to memory requirements
test-pixeltable-cpu:
if: false
name: Test Pixeltable CPU (Linux)
needs: [docker-test-cpu]
runs-on: ${{ matrix.os }}
timeout-minutes: 20
strategy:
max-parallel: 1
fail-fast: true
matrix:
os: ["ubuntu-latest"]
python-version: ["3.9"]
defaults:
run:
shell: bash -el {0}
steps:
- name: Checkout git repo
uses: actions/checkout@master
- uses: conda-incubator/setup-miniconda@v2
with:
miniforge-variant: Mambaforge
miniforge-version: latest
auto-update-conda: true
activate-environment: nos-pixeltable-${{ matrix.os }}-${{ matrix.python-version }}
python-version: ${{ matrix.python-version }}
use-mamba: true
- name: Install latest pixeltable and NOS
run: pip install git+https://github.com/mkornacker/pixeltable && pip install -e '.[test]'
- name: Test pixeltable integration
run: NOS_LOGGING_LEVEL=DEBUG pytest -sv tests/integrations/test_pixeltable.py
22 changes: 0 additions & 22 deletions docs/api/common/spec.md
Original file line number Diff line number Diff line change
@@ -1,26 +1,4 @@
::: nos.common.spec.FunctionSignature
handler: python
options:
members:
- get_inputs_spec
- get_outputs_spec

::: nos.common.spec.ObjectTypeInfo
handler: python
options:
members:
- __repr__
- is_batched
- batch_size
- base_type
- base_spec

::: nos.common.spec.FunctionSignature
handler: python
options:
members:
- get_inputs_spec
- get_outputs_spec


::: nos.common.spec.ModelSpec
Binary file removed docs/blog/assets/clip_embedding_times.png
Binary file not shown.
Binary file removed docs/blog/assets/clip_speed.png
Binary file not shown.
Binary file removed docs/blog/assets/nos_profile_list.png
Binary file not shown.
Binary file removed docs/blog/assets/reserved_vs_on_demand_first.png
Binary file not shown.
Binary file removed docs/blog/assets/reserved_vs_on_demand_second.png
Binary file not shown.
Binary file removed docs/blog/assets/t4_laion_price.png
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -1,19 +1,4 @@
---
date: 2024-02-02
tags:
- integrations
- skypilot
- chatGPT
categories:
- infra
- embeddings
authors:
- sloftin
links:
- posts/05-playing-with-nos-profiler.md
---

# OK Computer, Why are you slow?
# Profiling models with NOS

(Originally published at https://scottloftin.substack.com/p/lets-build-an-ml-sre-bot-with-nos)

Expand All @@ -26,7 +11,7 @@ nos profile method --method encode_image
nos profile list
```

<img src="/docs/blog/assets/nos_profile_list.png" width="100%">
<img src="/docs/demos/assets/nos_profile_list.png" width="100%">

We see a breakdown across four different image embedding models including the method and task (interchangeable in this case, each CLIP variant will support both Image and Text embedding as methods), the Iterations per Second, GPU memory footprint (how much space did this model have to allocate) and finally the GPU utilization, which measures how efficiently we are using the HW (in a very broad sense). A few things to note: the image size is fixed to 224X224X1 across all runs with a batch size of 1. In practice, the Iterations/Second will depend tremendously on tuning the batch size and image resolution for our target HW, which will be the subject of a followup post. For now, we’ll take these numbers at face value and see what we can work out about how exactly to run a large embedding workload. We’re going to use Skypilot to deploy the profiler to a Tesla T4 instance on GCP:

Expand Down Expand Up @@ -75,13 +60,13 @@ The OpenAI assistants API is somewhat unstable at the moment, but after a few tr

_Hey InfraBot, can you list the models in the profiling catalog by iterations per second?_

<img src="/docs/blog/assets/clip_speed.png" width="100%">
<img src="/docs/demos/assets/clip_speed.png" width="100%">

Ok, our raw profiling data is slowly becoming more readable. Let’s see how this all scales with the number of embeddings:

_Can you compute how long it would take to generate embeddings with each model for workload sizes in powers of 10, starting at 1000 image embeddings and ending at 10,000,000. Please plot these for each model in a graph._

<img src="/docs/blog/assets/clip_embedding_times.png" width="100%">
<img src="/docs/demos/assets/clip_embedding_times.png" width="100%">

Reasonable: runtime will depend linearly on total embeddings (again, we’re using batch size 1 for illustration purposes).

Expand Down Expand Up @@ -113,19 +98,19 @@ Ok, lets add some dollar signs to our plot above:

_Can you compute how much it would cost on a T4 with 1 GPU to generate embeddings with the cheapest model for workloads of powers of 10, starting at 1000 image embeddings and ending at 10,000,000. Please plot these in a graph._

<img src="/docs/blog/assets/t4_laion_price.png" width="100%">
<img src="/docs/demos/assets/t4_laion_price.png" width="100%">

The above looks reasonable assuming a minimum reservation of 1 hour (we aren’t doing serverless; we need to pay for the whole instance for the whole hour in our proposed cloud landscape). For 10 million embeddings, the total is something like 13 hours, so assuming an on-demand price of $0.35 we have $0.35*13 ~= $4.55, pretty close to the graph. But what if we wanted to index something like YouTube with ~500PB of videos? Ok, maybe not the whole site, but a substantial subset, maybe 10^11 images. If we extrapolate the above we’re looking at $40,000 in compute, which we would probably care about fitting to our workload. In particular, we might go with a reserved rather than an on-demand instance for a ~%50 discount, but at what point does that pay off? Unfortunately at time of writing, Skypilot doesn’t seem to include reserved instance pricing by default, but for a single instance type it’s easy enough to track down and feed to InfraBot: a 1 Year commitment brings us down to $0.220 per GPU, and a 3 Year commitment to $0.160 per GPU. It’s still higher than the spot price of course, but at this scale its reasonable to assume some SLA that prevents us from halting indexing on preemption. Let’s see if we can find a break-even point.

_Can you add the cost to reserve a 1 and 3 year instance? A 1 year reservation is $0.220 per gou per hour, and a 3 year reservation is $0.160 per gpu per hour._

<img src="/docs/blog/assets/reserved_vs_on_demand_first.png" width="100%">
<img src="/docs/demos/assets/reserved_vs_on_demand_first.png" width="100%">

Looks like we need to go a little further to the right

_Ok can you do the same plot, but at 10^9, 10^10, and 10^11_

<img src="/docs/blog/assets/reserved_vs_on_demand_second.png" width="100%">
<img src="/docs/demos/assets/reserved_vs_on_demand_second.png" width="100%">

10^10 embeddings at $0.35/hr is about $4,860, so this looks roughly correct. 10 Billion embeddings is about 100,000 Hours of (low resolution) video at full 30FPS, so while it’s quite large its not completely unheard of for a larger video service.

Expand Down
Empty file removed docs/integrations/pixeltable.md
Empty file.
2 changes: 1 addition & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,10 +24,10 @@ nav:
- Starting the server: docs/guides/starting-the-server.md
- Running inference: docs/guides/running-inference.md
- Serving custom models: docs/guides/serving-custom-models.md
# - 🧑‍🏫 Tutorials: docs/blog/-getting-started-with-nos-tutorials.html
- 🤖 Demos:
- Build a Discord image-generation bot: docs/demos/discord-bot.md
- Build a video search engine: docs/demos/video-search.md
- Profiling models with NOS: docs/demos/profiling-models-with-nos.md
- 👩‍💻 API Reference:
- CLI:
- <kbd>nos serve</kbd>: docs/cli/serve.md
Expand Down
14 changes: 0 additions & 14 deletions nos/__init__.py
Original file line number Diff line number Diff line change
@@ -1,15 +1 @@
import importlib
import sys

from nos.version import __version__ # noqa: F401

from .client import Client # noqa: F401
from .logging import logger # noqa: F401
from .server import init, shutdown # noqa: F401


def internal_libs_available():
"""Check if the internal module is available."""
from .common.runtime import is_package_available # noqa: F401

return is_package_available("autonomi.nos._internal")
1 change: 0 additions & 1 deletion nos/common/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,6 @@
ModelSpec,
ModelSpecMetadata,
ModelSpecMetadataCatalog,
ObjectTypeInfo,
)
from .tasks import TaskType
from .types import Batch, EmbeddingSpec, ImageSpec, ImageT, TensorSpec, TensorT
Expand Down
145 changes: 1 addition & 144 deletions nos/common/spec.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
import re
from dataclasses import field
from functools import cached_property
from typing import Any, Callable, Dict, List, Literal, Optional, Tuple, Union, get_args, get_origin
from typing import Any, Callable, Dict, Literal, Optional, Tuple, Union

import humanize
from pydantic import BaseModel, Field, field_validator
Expand All @@ -24,121 +24,6 @@
nos_service_pb2 = import_module("nos_service_pb2")


class ObjectTypeInfo:
"""Function signature information.

Parameters:
annotation (Any): Annotation for an input/output.
parameter (inspect.Parameter): Parameter information (optional).

Attributes:
_is_batched (bool): Batched flag.
_batch_size (int): Batch size.
_base_type (Any): Base type (Image.Image, np.ndarray etc).
_base_spec (Any): Base type specification (None, ImageSpec, TensorSpec etc).
"""

def __init__(self, annotation: Any, parameter: inspect.Parameter = None):
"""Initialize the function signature information."""
self.annotation = annotation
self.parameter = parameter
try:
(annotated_cls,) = annotation.__args__
except AttributeError:
annotated_cls = annotation

# Parse Batch annotation
self._is_batched, self._batch_size = False, None
if annotated_cls == Batch:
annotation, batch_size = annotation.__metadata__
self._is_batched, self._batch_size = True, batch_size
try:
(annotated_cls,) = annotation.__args__
except AttributeError:
annotated_cls = annotation

# Parse Tensor/type annotation
if annotated_cls in (TensorT, ImageT):
object_type, object_spec = annotation.__metadata__
else:
try:
(object_type,) = annotation.__metadata__
except AttributeError:
object_type = annotated_cls
object_spec = None

# Parse the base type and spec
self._base_type = object_type
self._base_spec = object_spec

def __repr__(self) -> str:
"""Return the function signature information representation."""
repr = (
f"""{self.__class__.__name__}(is_batched={self._is_batched}, batch_size={self._batch_size}, """
f"""base_type={self._base_type}, base_spec={self._base_spec})"""
)
if self.parameter:
p_repr = f"pname={self.parameter}, ptype={self.parameter.annotation}, pdefault={self.parameter.default}"
repr = f"{repr}, {p_repr}"
return repr

def parameter_name(self) -> str:
"""Return the parameter name."""
return self.parameter.name

def parameter_annotation(self) -> Any:
"""Return the parameter annotation."""
return self.parameter.annotation

def parameter_default(self) -> Any:
"""Return the parameter default."""
return self.parameter.default

def is_batched(self) -> bool:
"""Return the `is_batched` flag.

Returns:
bool: Flag to indicate if batching is enabled.
If true, `batch_size=None` implies dynamic batch size, otherwise `batch_size=<int>`.
"""
return self._is_batched

def batch_size(self) -> int:
"""Return the batch size.

Returns:
int: Batch size. If `None` and `is_batched` is `true`, then batch size is considered dynamic.
"""
return self._batch_size

def base_type(self) -> Any:
"""Return the base type.

Returns:
Any: Base type. Base type here can be simple types (e.g. `str`, `int`, ...) or
complex types with library dependencies (e.g. `np.ndarray`, `PIL.Image.Image` etc).
"""
return self._base_type

def base_spec(self) -> Optional[Union[TensorSpec, ImageSpec, EmbeddingSpec]]:
"""Return the base spec.

Returns:
Optional[Union[TensorSpec, ImageSpec, EmbeddingSpec]]: Base spec.
"""
return self._base_spec


def AnnotatedParameter(
annotation: Any, parameter: inspect.Parameter = None
) -> Union[ObjectTypeInfo, List[ObjectTypeInfo]]:
"""Annotate the parameter for inferring additional metdata."""
# Union of annotated types are converted into set of annotated types.
if get_origin(annotation) == Union:
return [AnnotatedParameter(ann, parameter) for ann in get_args(annotation)]
return ObjectTypeInfo(annotation, parameter)


class FunctionSignature(BaseModel):
"""Function signature that fully describes the remote-model to be executed
including `inputs`, `outputs`, `func_or_cls` to be executed,
Expand Down Expand Up @@ -236,34 +121,6 @@ def _decode_inputs(self, inputs: Dict[str, Any]) -> Dict[str, Any]:
inputs = FunctionSignature.validate(inputs, self.parameters)
return {k: loads(v) for k, v in inputs.items()}

def get_inputs_spec(self) -> Dict[str, Union[ObjectTypeInfo, List[ObjectTypeInfo]]]:
"""Return the full input function signature specification.

For example, for CLIP's text embedding model, the inputs/output spec is:
```
inputs = {'texts': ObjectTypeInfo(is_batched=True, batch_size=None, base_type=<class 'str'>, base_spec=None)}
outputs = {'embedding': ObjectTypeInfo(is_batched=True, batch_size=None, base_type=<class 'numpy.ndarray'>, base_spec=EmbeddingSpec(shape=(512,), dtype='float32'))}
```
Returns:
Dict[str, Union[ObjectTypeInfo, List[ObjectTypeInfo]]]: Inputs spec.
"""
parameters = self.parameters.copy()
parameters.pop("self", None)
return {k: AnnotatedParameter(self.input_annotations.get(k, p.annotation), p) for k, p in parameters.items()}

def get_outputs_spec(self) -> Dict[str, Union[ObjectTypeInfo, Dict[str, ObjectTypeInfo]]]:
"""Return the full output function signature specification.

Returns:
Dict[str, Union[ObjectTypeInfo, Dict[str, ObjectTypeInfo]]]: Outputs spec.
"""
if self.output_annotations is None:
return AnnotatedParameter(self.return_annotation)
elif isinstance(self.output_annotations, dict):
return {k: AnnotatedParameter(ann) for k, ann in self.output_annotations.items()}
else:
return AnnotatedParameter(self.output_annotations)


class ModelResources(BaseModel):
"""Model resources (device/host memory etc)."""
Expand Down
7 changes: 1 addition & 6 deletions nos/models/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
import numpy as np
from PIL import Image

from nos import hub, internal_libs_available
from nos import hub
from nos.common import ImageSpec, TaskType
from nos.common.types import Batch, ImageT

Expand All @@ -21,8 +21,3 @@
from .tts import TextToSpeech # noqa: F401
from .whisper import Whisper # noqa: F401
from .yolox import YOLOX # noqa: F401


if internal_libs_available():
# Register internal models with hub
from autonomi.nos._internal import models # noqa: F401, F403
1 change: 0 additions & 1 deletion nos/server/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@
from typing import List, Optional, Union

import psutil
import rich.status

import docker
import docker.errors
Expand Down
Loading
Loading