autonomi-ai · spillai · Mar 12, 2024 · Feb 13, 2024 · Mar 8, 2024
diff --git a/.github/workflows/integrations.yml b/.github/workflows/integrations.yml
@@ -69,35 +69,3 @@ jobs:
         if: steps.cache.outputs.cache-hit != 'true'
       - name: Test
         run: make test-cpu
-
-  # disabled for now, due to memory requirements
-  test-pixeltable-cpu:
-    if: false
-    name: Test Pixeltable CPU (Linux)
-    needs: [docker-test-cpu]
-    runs-on: ${{ matrix.os }}
-    timeout-minutes: 20
-    strategy:
-      max-parallel: 1
-      fail-fast: true
-      matrix:
-        os: ["ubuntu-latest"]
-        python-version: ["3.9"]
-    defaults:
-      run:
-        shell: bash -el {0}
-    steps:
-      - name: Checkout git repo
-        uses: actions/checkout@master
-      - uses: conda-incubator/setup-miniconda@v2
-        with:
-          miniforge-variant: Mambaforge
-          miniforge-version: latest
-          auto-update-conda: true
-          activate-environment: nos-pixeltable-${{ matrix.os }}-${{ matrix.python-version }}
-          python-version: ${{ matrix.python-version }}
-          use-mamba: true
-      - name: Install latest pixeltable and NOS
-        run: pip install git+https://github.com/mkornacker/pixeltable && pip install -e '.[test]'
-      - name: Test pixeltable integration
-        run: NOS_LOGGING_LEVEL=DEBUG pytest -sv tests/integrations/test_pixeltable.py
diff --git a/docs/api/common/spec.md b/docs/api/common/spec.md
@@ -1,26 +1,4 @@
 ::: nos.common.spec.FunctionSignature
     handler: python
-    options:
-      members:
-        - get_inputs_spec
-        - get_outputs_spec
-
-::: nos.common.spec.ObjectTypeInfo
-    handler: python
-    options:
-      members:
-        - __repr__
-        - is_batched
-        - batch_size
-        - base_type
-        - base_spec
-
-::: nos.common.spec.FunctionSignature
-    handler: python
-    options:
-      members:
-        - get_inputs_spec
-        - get_outputs_spec
-
 
 ::: nos.common.spec.ModelSpec
diff --git a/docs/blog/assets/clip_embedding_times.png b/docs/blog/assets/clip_embedding_times.png
diff --git a/docs/blog/assets/clip_speed.png b/docs/blog/assets/clip_speed.png
diff --git a/docs/blog/assets/nos_profile_list.png b/docs/blog/assets/nos_profile_list.png
diff --git a/docs/blog/assets/reserved_vs_on_demand_first.png b/docs/blog/assets/reserved_vs_on_demand_first.png
diff --git a/docs/blog/assets/reserved_vs_on_demand_second.png b/docs/blog/assets/reserved_vs_on_demand_second.png
diff --git a/docs/blog/assets/t4_laion_price.png b/docs/blog/assets/t4_laion_price.png
diff --git a/...log/posts/05-playing-with-nos-profiler.md → docs/demos/profiling-models-with-nos.md b/...log/posts/05-playing-with-nos-profiler.md → docs/demos/profiling-models-with-nos.md
@@ -1,19 +1,4 @@
----
-date: 2024-02-02
-tags:
-  - integrations
-  - skypilot
-  - chatGPT
-categories:
-  - infra
-  - embeddings
-authors:
- - sloftin
-links:
-  - posts/05-playing-with-nos-profiler.md
----
-
-# OK Computer, Why are you slow?
+# Profiling models with NOS
 
 (Originally published at https://scottloftin.substack.com/p/lets-build-an-ml-sre-bot-with-nos)
 
@@ -26,7 +11,7 @@ nos profile method --method encode_image
 nos profile list
 ```
 
-<img src="/docs/blog/assets/nos_profile_list.png" width="100%">
+<img src="/docs/demos/assets/nos_profile_list.png" width="100%">
 
 We see a breakdown across four different image embedding models including the method and task (interchangeable in this case, each CLIP variant will support both Image and Text embedding as methods), the Iterations per Second, GPU memory footprint (how much space did this model have to allocate) and finally the GPU utilization, which measures how efficiently we are using the HW (in a very broad sense). A few things to note: the image size is fixed to 224X224X1 across all runs with a batch size of 1. In practice, the Iterations/Second will depend tremendously on tuning the batch size and image resolution for our target HW, which will be the subject of a followup post. For now, we’ll take these numbers at face value and see what we can work out about how exactly to run a large embedding workload. We’re going to use Skypilot to deploy the profiler to a Tesla T4 instance on GCP:
 
@@ -75,13 +60,13 @@ The OpenAI assistants API is somewhat unstable at the moment, but after a few tr
 
 _Hey InfraBot, can you list the models in the profiling catalog by iterations per second?_
 
-<img src="/docs/blog/assets/clip_speed.png" width="100%">
+<img src="/docs/demos/assets/clip_speed.png" width="100%">
 
 Ok, our raw profiling data is slowly becoming more readable. Let’s see how this all scales with the number of embeddings:
 
 _Can you compute how long it would take to generate embeddings with each model for workload sizes in powers of 10, starting at 1000 image embeddings and ending at 10,000,000. Please plot these for each model in a graph._
 
-<img src="/docs/blog/assets/clip_embedding_times.png" width="100%">
+<img src="/docs/demos/assets/clip_embedding_times.png" width="100%">
 
 Reasonable: runtime will depend linearly on total embeddings (again, we’re using batch size 1 for illustration purposes).
 
@@ -113,19 +98,19 @@ Ok, lets add some dollar signs to our plot above:
 
 _Can you compute how much it would cost on a T4 with 1 GPU to generate embeddings with the cheapest model for workloads of powers of 10, starting at 1000 image embeddings and ending at 10,000,000. Please plot these in a graph._
 
-<img src="/docs/blog/assets/t4_laion_price.png" width="100%">
+<img src="/docs/demos/assets/t4_laion_price.png" width="100%">
 
 The above looks reasonable assuming a minimum reservation of 1 hour (we aren’t doing serverless; we need to pay for the whole instance for the whole hour in our proposed cloud landscape). For 10 million embeddings, the total is something like 13 hours, so assuming an on-demand price of $0.35 we have $0.35*13 ~= $4.55, pretty close to the graph. But what if we wanted to index something like YouTube with ~500PB of videos? Ok, maybe not the whole site, but a substantial subset, maybe 10^11 images. If we extrapolate the above we’re looking at $40,000 in compute, which we would probably care about fitting to our workload. In particular, we might go with a reserved rather than an on-demand instance for a ~%50 discount, but at what point does that pay off? Unfortunately at time of writing, Skypilot doesn’t seem to include reserved instance pricing by default, but for a single instance type it’s easy enough to track down and feed to InfraBot: a 1 Year commitment brings us down to $0.220 per GPU, and a 3 Year commitment to $0.160 per GPU. It’s still higher than the spot price of course, but at this scale its reasonable to assume some SLA that prevents us from halting indexing on preemption. Let’s see if we can find a break-even point.
 
 _Can you add the cost to reserve a 1 and 3 year instance? A 1 year reservation is $0.220 per gou per hour, and a 3 year reservation is $0.160 per gpu per hour._
 
-<img src="/docs/blog/assets/reserved_vs_on_demand_first.png" width="100%">
+<img src="/docs/demos/assets/reserved_vs_on_demand_first.png" width="100%">
 
 Looks like we need to go a little further to the right
 
 _Ok can you do the same plot, but at 10^9, 10^10, and 10^11_
 
-<img src="/docs/blog/assets/reserved_vs_on_demand_second.png" width="100%">
+<img src="/docs/demos/assets/reserved_vs_on_demand_second.png" width="100%">
 
 10^10 embeddings at $0.35/hr is about $4,860, so this looks roughly correct. 10 Billion embeddings is about 100,000 Hours of (low resolution) video at full 30FPS, so while it’s quite large its not completely unheard of for a larger video service.
 

diff --git a/docs/integrations/pixeltable.md b/docs/integrations/pixeltable.md
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -24,10 +24,10 @@ nav:
     - Starting the server: docs/guides/starting-the-server.md
     - Running inference: docs/guides/running-inference.md
     - Serving custom models: docs/guides/serving-custom-models.md
-  # - 🧑‍🏫 Tutorials: docs/blog/-getting-started-with-nos-tutorials.html
   - 🤖 Demos:
     - Build a Discord image-generation bot: docs/demos/discord-bot.md
     - Build a video search engine: docs/demos/video-search.md
+    - Profiling models with NOS: docs/demos/profiling-models-with-nos.md
   - 👩‍💻 API Reference:
     - CLI:
       - <kbd>nos serve</kbd>: docs/cli/serve.md

diff --git a/nos/__init__.py b/nos/__init__.py
@@ -1,15 +1 @@
-import importlib
-import sys
-
-from nos.version import __version__  # noqa: F401
-
-from .client import Client  # noqa: F401
-from .logging import logger  # noqa: F401
 from .server import init, shutdown  # noqa: F401
-
-
-def internal_libs_available():
-    """Check if the internal module is available."""
-    from .common.runtime import is_package_available  # noqa: F401
-
-    return is_package_available("autonomi.nos._internal")
diff --git a/nos/common/__init__.py b/nos/common/__init__.py
@@ -16,7 +16,6 @@
     ModelSpec,
     ModelSpecMetadata,
     ModelSpecMetadataCatalog,
-    ObjectTypeInfo,
 )
 from .tasks import TaskType
 from .types import Batch, EmbeddingSpec, ImageSpec, ImageT, TensorSpec, TensorT

diff --git a/nos/common/spec.py b/nos/common/spec.py
@@ -3,7 +3,7 @@
 import re
 from dataclasses import field
 from functools import cached_property
-from typing import Any, Callable, Dict, List, Literal, Optional, Tuple, Union, get_args, get_origin
+from typing import Any, Callable, Dict, Literal, Optional, Tuple, Union
 
 import humanize
 from pydantic import BaseModel, Field, field_validator
@@ -24,121 +24,6 @@
 nos_service_pb2 = import_module("nos_service_pb2")
 
 
-class ObjectTypeInfo:
-    """Function signature information.
-
-    Parameters:
-        annotation (Any): Annotation for an input/output.
-        parameter (inspect.Parameter): Parameter information (optional).
-
-    Attributes:
-        _is_batched (bool): Batched flag.
-        _batch_size (int): Batch size.
-        _base_type (Any): Base type (Image.Image, np.ndarray etc).
-        _base_spec (Any): Base type specification (None, ImageSpec, TensorSpec etc).
-    """
-
-    def __init__(self, annotation: Any, parameter: inspect.Parameter = None):
-        """Initialize the function signature information."""
-        self.annotation = annotation
-        self.parameter = parameter
-        try:
-            (annotated_cls,) = annotation.__args__
-        except AttributeError:
-            annotated_cls = annotation
-
-        # Parse Batch annotation
-        self._is_batched, self._batch_size = False, None
-        if annotated_cls == Batch:
-            annotation, batch_size = annotation.__metadata__
-            self._is_batched, self._batch_size = True, batch_size
-            try:
-                (annotated_cls,) = annotation.__args__
-            except AttributeError:
-                annotated_cls = annotation
-
-        # Parse Tensor/type annotation
-        if annotated_cls in (TensorT, ImageT):
-            object_type, object_spec = annotation.__metadata__
-        else:
-            try:
-                (object_type,) = annotation.__metadata__
-            except AttributeError:
-                object_type = annotated_cls
-            object_spec = None
-
-        # Parse the base type and spec
-        self._base_type = object_type
-        self._base_spec = object_spec
-
-    def __repr__(self) -> str:
-        """Return the function signature information representation."""
-        repr = (
-            f"""{self.__class__.__name__}(is_batched={self._is_batched}, batch_size={self._batch_size}, """
-            f"""base_type={self._base_type}, base_spec={self._base_spec})"""
-        )
-        if self.parameter:
-            p_repr = f"pname={self.parameter}, ptype={self.parameter.annotation}, pdefault={self.parameter.default}"
-            repr = f"{repr}, {p_repr}"
-        return repr
-
-    def parameter_name(self) -> str:
-        """Return the parameter name."""
-        return self.parameter.name
-
-    def parameter_annotation(self) -> Any:
-        """Return the parameter annotation."""
-        return self.parameter.annotation
-
-    def parameter_default(self) -> Any:
-        """Return the parameter default."""
-        return self.parameter.default
-
-    def is_batched(self) -> bool:
-        """Return the `is_batched` flag.
-
-        Returns:
-            bool: Flag to indicate if batching is enabled.
-                If true, `batch_size=None` implies dynamic batch size, otherwise `batch_size=<int>`.
-        """
-        return self._is_batched
-
-    def batch_size(self) -> int:
-        """Return the batch size.
-
-        Returns:
-            int: Batch size. If `None` and `is_batched` is `true`, then batch size is considered dynamic.
-        """
-        return self._batch_size
-
-    def base_type(self) -> Any:
-        """Return the base type.
-
-        Returns:
-            Any: Base type. Base type here can be simple types (e.g. `str`, `int`, ...) or
-                complex types with library dependencies (e.g. `np.ndarray`, `PIL.Image.Image` etc).
-        """
-        return self._base_type
-
-    def base_spec(self) -> Optional[Union[TensorSpec, ImageSpec, EmbeddingSpec]]:
-        """Return the base spec.
-
-        Returns:
-            Optional[Union[TensorSpec, ImageSpec, EmbeddingSpec]]: Base spec.
-        """
-        return self._base_spec
-
-
-def AnnotatedParameter(
-    annotation: Any, parameter: inspect.Parameter = None
-) -> Union[ObjectTypeInfo, List[ObjectTypeInfo]]:
-    """Annotate the parameter for inferring additional metdata."""
-    # Union of annotated types are converted into set of annotated types.
-    if get_origin(annotation) == Union:
-        return [AnnotatedParameter(ann, parameter) for ann in get_args(annotation)]
-    return ObjectTypeInfo(annotation, parameter)
-
-
 class FunctionSignature(BaseModel):
     """Function signature that fully describes the remote-model to be executed
     including `inputs`, `outputs`, `func_or_cls` to be executed,
@@ -236,34 +121,6 @@ def _decode_inputs(self, inputs: Dict[str, Any]) -> Dict[str, Any]:
         inputs = FunctionSignature.validate(inputs, self.parameters)
         return {k: loads(v) for k, v in inputs.items()}
 
-    def get_inputs_spec(self) -> Dict[str, Union[ObjectTypeInfo, List[ObjectTypeInfo]]]:
-        """Return the full input function signature specification.
-
-        For example, for CLIP's text embedding model, the inputs/output spec is:
-            ```
-            inputs  = {'texts': ObjectTypeInfo(is_batched=True, batch_size=None, base_type=<class 'str'>, base_spec=None)}
-            outputs = {'embedding': ObjectTypeInfo(is_batched=True, batch_size=None, base_type=<class 'numpy.ndarray'>, base_spec=EmbeddingSpec(shape=(512,), dtype='float32'))}
-            ```
-        Returns:
-            Dict[str, Union[ObjectTypeInfo, List[ObjectTypeInfo]]]: Inputs spec.
-        """
-        parameters = self.parameters.copy()
-        parameters.pop("self", None)
-        return {k: AnnotatedParameter(self.input_annotations.get(k, p.annotation), p) for k, p in parameters.items()}
-
-    def get_outputs_spec(self) -> Dict[str, Union[ObjectTypeInfo, Dict[str, ObjectTypeInfo]]]:
-        """Return the full output function signature specification.
-
-        Returns:
-            Dict[str, Union[ObjectTypeInfo, Dict[str, ObjectTypeInfo]]]: Outputs spec.
-        """
-        if self.output_annotations is None:
-            return AnnotatedParameter(self.return_annotation)
-        elif isinstance(self.output_annotations, dict):
-            return {k: AnnotatedParameter(ann) for k, ann in self.output_annotations.items()}
-        else:
-            return AnnotatedParameter(self.output_annotations)
-
 
 class ModelResources(BaseModel):
     """Model resources (device/host memory etc)."""

diff --git a/nos/models/__init__.py b/nos/models/__init__.py
@@ -4,7 +4,7 @@
 import numpy as np
 from PIL import Image
 
-from nos import hub, internal_libs_available
+from nos import hub
 from nos.common import ImageSpec, TaskType
 from nos.common.types import Batch, ImageT
 
@@ -21,8 +21,3 @@
 from .tts import TextToSpeech  # noqa: F401
 from .whisper import Whisper  # noqa: F401
 from .yolox import YOLOX  # noqa: F401
-
-
-if internal_libs_available():
-    # Register internal models with hub
-    from autonomi.nos._internal import models  # noqa: F401, F403
diff --git a/nos/server/__init__.py b/nos/server/__init__.py
@@ -6,7 +6,6 @@
 from typing import List, Optional, Union
 
 import psutil
-import rich.status
 
 import docker
 import docker.errors