Merge pull request #1902 from SeldonIO/master

ci: Merge change for release 1.6.1
SeldonIO · Sep 10, 2024 · 507aa06 · 507aa06
2 parents 09637a8 + a5c82d1
commit 507aa06
Show file tree

Hide file tree

Showing 42 changed files with 3,845 additions and 1,912 deletions.
diff --git a/.github/workflows/release.yml b/.github/workflows/release.yml
@@ -216,6 +216,7 @@ jobs:
           - mllib
           - sklearn
           - xgboost
+          - catboost
     steps:
       - name: Maximize build space
         uses: easimon/maximize-build-space@master

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,11 +1,93 @@
 # Changelog
 
 
+<a name="1.6.0"></a>
+## [1.6.0](https://github.com/SeldonIO/MLServer/releases/tag/1.6.0) - 26 Jun 2024
+
+ ## Overview
+
+
+### Upgrades
+ MLServer supports Pydantic V2. 
+
+### Features
+ MLServer supports streaming data to and from your models. 
+
+ Streaming support is available for both the REST and gRPC servers: 
+ * for the REST server is limited only to server streaming. This means that the client sends a single request to the server, and the server responds with a stream of data. 
+ * for the gRPC server is available for both client and server streaming. This means that the client sends a stream of data to the server, and the server responds with a stream of data.
+
+ See our [docs](https://mlserver.readthedocs.io/en/1.6.0/user-guide/streaming.html) and [example](https://mlserver.readthedocs.io/en/1.6.0/examples/streaming/README.html) for more details.
+
+## What's Changed
+* fix(ci): fix typo in CI name by [@sakoush](https://github.com/sakoush) in https://github.com/SeldonIO/MLServer/pull/1623
+* Update CHANGELOG by [@github-actions](https://github.com/github-actions) in https://github.com/SeldonIO/MLServer/pull/1624
+* Re-generate License Info by [@github-actions](https://github.com/github-actions) in https://github.com/SeldonIO/MLServer/pull/1634
+* Fix mlserver_huggingface settings device type by [@geodavic](https://github.com/geodavic) in https://github.com/SeldonIO/MLServer/pull/1486
+* fix: Adjust HF tests post-merge of PR [#1486](https://github.com/SeldonIO/MLServer/issues/1486) by [@sakoush](https://github.com/sakoush) in https://github.com/SeldonIO/MLServer/pull/1635
+* Update README.md w licensing clarification by [@paulb-seldon](https://github.com/paulb-seldon) in https://github.com/SeldonIO/MLServer/pull/1636
+* Re-generate License Info by [@github-actions](https://github.com/github-actions) in https://github.com/SeldonIO/MLServer/pull/1642
+* fix(ci): optimise disk space for GH workers by [@sakoush](https://github.com/sakoush) in https://github.com/SeldonIO/MLServer/pull/1644
+* build: Update maintainers by [@jesse-c](https://github.com/jesse-c) in https://github.com/SeldonIO/MLServer/pull/1659
+* fix: Missing f-string directives by [@jesse-c](https://github.com/jesse-c) in https://github.com/SeldonIO/MLServer/pull/1677
+* build: Add Catboost runtime to Dependabot by [@jesse-c](https://github.com/jesse-c) in https://github.com/SeldonIO/MLServer/pull/1689
+* Fix JSON input shapes by [@ReveStobinson](https://github.com/ReveStobinson) in https://github.com/SeldonIO/MLServer/pull/1679
+* build(deps): bump alibi-detect from 0.11.5 to 0.12.0 by [@jesse-c](https://github.com/jesse-c) in https://github.com/SeldonIO/MLServer/pull/1702
+* build(deps): bump alibi from 0.9.5 to 0.9.6 by [@jesse-c](https://github.com/jesse-c) in https://github.com/SeldonIO/MLServer/pull/1704
+* Docs correction - Updated README.md in mlflow to match column names order by [@vivekk0903](https://github.com/vivekk0903) in https://github.com/SeldonIO/MLServer/pull/1703
+* fix(runtimes): Remove unused Pydantic dependencies by [@jesse-c](https://github.com/jesse-c) in https://github.com/SeldonIO/MLServer/pull/1725
+* test: Detect generate failures by [@jesse-c](https://github.com/jesse-c) in https://github.com/SeldonIO/MLServer/pull/1729
+* build: Add granularity in types generation by [@jesse-c](https://github.com/jesse-c) in https://github.com/SeldonIO/MLServer/pull/1749
+* Migrate to Pydantic v2 by [@jesse-c](https://github.com/jesse-c) in https://github.com/SeldonIO/MLServer/pull/1748
+* Re-generate License Info by [@github-actions](https://github.com/github-actions) in https://github.com/SeldonIO/MLServer/pull/1753
+* Revert "build(deps): bump uvicorn from 0.28.0 to 0.29.0" by [@jesse-c](https://github.com/jesse-c) in https://github.com/SeldonIO/MLServer/pull/1758
+* refactor(pydantic): Remaining migrations for deprecated functions by [@jesse-c](https://github.com/jesse-c) in https://github.com/SeldonIO/MLServer/pull/1757
+* Fixed openapi dataplane.yaml by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in https://github.com/SeldonIO/MLServer/pull/1752
+* fix(pandas): Use Pydantic v2 compatible type by [@jesse-c](https://github.com/jesse-c) in https://github.com/SeldonIO/MLServer/pull/1760
+* Fix Pandas codec decoding from numpy arrays by [@lhnwrk](https://github.com/lhnwrk) in https://github.com/SeldonIO/MLServer/pull/1751
+* build: Bump versions for Read the Docs by [@jesse-c](https://github.com/jesse-c) in https://github.com/SeldonIO/MLServer/pull/1761
+* docs: Remove quotes around local TOC by [@jesse-c](https://github.com/jesse-c) in https://github.com/SeldonIO/MLServer/pull/1764
+* Spawn worker in custom environment by [@lhnwrk](https://github.com/lhnwrk) in https://github.com/SeldonIO/MLServer/pull/1739
+* Re-generate License Info by [@github-actions](https://github.com/github-actions) in https://github.com/SeldonIO/MLServer/pull/1767
+* basic contributing guide on contributing and opening a PR by [@bohemia420](https://github.com/bohemia420) in https://github.com/SeldonIO/MLServer/pull/1773
+* Inference streaming support by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in https://github.com/SeldonIO/MLServer/pull/1750
+* Re-generate License Info by [@github-actions](https://github.com/github-actions) in https://github.com/SeldonIO/MLServer/pull/1779
+* build: Lock GitHub runners' OS by [@jesse-c](https://github.com/jesse-c) in https://github.com/SeldonIO/MLServer/pull/1765
+* Removed text-model form benchmarking by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in https://github.com/SeldonIO/MLServer/pull/1790
+* Bumped mlflow to 2.13.1 and gunicorn to 22.0.0 by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in https://github.com/SeldonIO/MLServer/pull/1791
+* Build(deps): Update to poetry version 1.8.3 in docker build by [@sakoush](https://github.com/sakoush) in https://github.com/SeldonIO/MLServer/pull/1792
+* Bumped werkzeug to 3.0.3 by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in https://github.com/SeldonIO/MLServer/pull/1793
+* Docs streaming by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in https://github.com/SeldonIO/MLServer/pull/1789
+* Bump uvicorn 0.30.1 by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in https://github.com/SeldonIO/MLServer/pull/1795
+* Fixes for all-runtimes by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in https://github.com/SeldonIO/MLServer/pull/1794
+* Fix BaseSettings import for pydantic v2 by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in https://github.com/SeldonIO/MLServer/pull/1798
+* Bumped preflight version to 1.9.7 by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in https://github.com/SeldonIO/MLServer/pull/1797
+* build: Install dependencies only in Tox environments  by [@jesse-c](https://github.com/jesse-c) in https://github.com/SeldonIO/MLServer/pull/1785
+* Bumped to 1.6.0.dev2 by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in https://github.com/SeldonIO/MLServer/pull/1803
+* Fix CI/CD macos-huggingface by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in https://github.com/SeldonIO/MLServer/pull/1805
+* Fixed macos kafka CI by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in https://github.com/SeldonIO/MLServer/pull/1807
+* Update poetry lock by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in https://github.com/SeldonIO/MLServer/pull/1808
+* Re-generate License Info by [@github-actions](https://github.com/github-actions) in https://github.com/SeldonIO/MLServer/pull/1813
+* Fix/macos all runtimes by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in https://github.com/SeldonIO/MLServer/pull/1823
+* fix: Update stale reviewer in licenses.yml workflow by [@sakoush](https://github.com/sakoush) in https://github.com/SeldonIO/MLServer/pull/1824
+* ci: Merge changes from master to release branch by [@sakoush](https://github.com/sakoush) in https://github.com/SeldonIO/MLServer/pull/1825
+
+## New Contributors
+* [@paulb-seldon](https://github.com/paulb-seldon) made their first contribution in https://github.com/SeldonIO/MLServer/pull/1636
+* [@ReveStobinson](https://github.com/ReveStobinson) made their first contribution in https://github.com/SeldonIO/MLServer/pull/1679
+* [@vivekk0903](https://github.com/vivekk0903) made their first contribution in https://github.com/SeldonIO/MLServer/pull/1703
+* [@RobertSamoilescu](https://github.com/RobertSamoilescu) made their first contribution in https://github.com/SeldonIO/MLServer/pull/1752
+* [@lhnwrk](https://github.com/lhnwrk) made their first contribution in https://github.com/SeldonIO/MLServer/pull/1751
+* [@bohemia420](https://github.com/bohemia420) made their first contribution in https://github.com/SeldonIO/MLServer/pull/1773
+
+**Full Changelog**: https://github.com/SeldonIO/MLServer/compare/1.5.0...1.6.0
+
+[Changes][1.6.0]
+
+
 <a name="1.5.0"></a>
 ## [1.5.0](https://github.com/SeldonIO/MLServer/releases/tag/1.5.0) - 05 Mar 2024
 
-<!-- Release notes generated using configuration in .github/release.yml at 1.5.0 -->
-
 ## What's Changed
 
 * Update CHANGELOG by [@github-actions](https://github.com/github-actions) in https://github.com/SeldonIO/MLServer/pull/1592
@@ -427,6 +509,7 @@ To learn more about how to use MLServer directly from the MLflow CLI, check out
 [Changes][1.1.0]
 
 
+[1.6.0]: https://github.com/SeldonIO/MLServer/compare/1.5.0...1.6.0
 [1.5.0]: https://github.com/SeldonIO/MLServer/compare/1.4.0...1.5.0
 [1.4.0]: https://github.com/SeldonIO/MLServer/compare/1.3.5...1.4.0
 [1.3.5]: https://github.com/SeldonIO/MLServer/compare/1.3.4...1.3.5

diff --git a/docs/examples/cassava/model/requirements.txt b/docs/examples/cassava/model/requirements.txt
@@ -1,2 +1,2 @@
-tensorflow==2.12.0
+tensorflow==2.12.1
 tensorflow-hub==0.13.0
diff --git a/docs/examples/cassava/requirements.txt b/docs/examples/cassava/requirements.txt
@@ -1,3 +1,3 @@
 mlserver==1.3.2
-tensorflow==2.12.0
+tensorflow==2.12.1
 tensorflow-hub==0.13.0
diff --git a/docs/examples/streaming/README.ipynb b/docs/examples/streaming/README.ipynb
@@ -42,7 +42,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 2,
+   "execution_count": 1,
    "metadata": {},
    "outputs": [
     {
@@ -121,7 +121,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 4,
+   "execution_count": 2,
    "metadata": {},
    "outputs": [
     {
@@ -138,8 +138,7 @@
     "{\n",
     "  \"debug\": false,\n",
     "  \"parallel_workers\": 0,\n",
-    "  \"gzip_enabled\": false,\n",
-    "  \"metrics_endpoint\": null\n",
+    "  \"gzip_enabled\": false\n",
     "}\n"
    ]
   },
@@ -150,8 +149,7 @@
     "Note the currently there are three main limitations of the streaming support in MLServer:\n",
     "\n",
     "- distributed workers are not supported (i.e., the `parallel_workers` setting should be set to `0`)\n",
-    "- `gzip` middleware is not supported for REST (i.e., `gzip_enabled` setting should be set to `false`)\n",
-    "- metrics endpoint is not available (i.e. `metrics_endpoint` is also disabled for streaming for gRPC)"
+    "- `gzip` middleware is not supported for REST (i.e., `gzip_enabled` setting should be set to `false`)"
    ]
   },
   {
@@ -163,7 +161,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 5,
+   "execution_count": 3,
    "metadata": {},
    "outputs": [
     {
@@ -227,14 +225,14 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 6,
+   "execution_count": 4,
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "Writing generate-request.json\n"
+      "Overwriting generate-request.json\n"
      ]
     }
    ],
@@ -272,9 +270,22 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 5,
    "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "['What']\n",
+      "[' is']\n",
+      "[' the']\n",
+      "[' capital']\n",
+      "[' of']\n",
+      "[' France?']\n"
+     ]
+    }
+   ],
    "source": [
     "import httpx\n",
     "from httpx_sse import connect_sse\n",
@@ -301,9 +312,22 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 6,
    "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "['What']\n",
+      "[' is']\n",
+      "[' the']\n",
+      "[' capital']\n",
+      "[' of']\n",
+      "[' France?']\n"
+     ]
+    }
+   ],
    "source": [
     "import grpc\n",
     "import mlserver.types as types\n",
@@ -315,7 +339,7 @@
     "inference_request = types.InferenceRequest.parse_file(\"./generate-request.json\")\n",
     "\n",
     "# need to convert from string to bytes for grpc\n",
-    "inference_request.inputs[0] = StringCodec.encode_input(\"prompt\", inference_request.inputs[0].data.__root__)\n",
+    "inference_request.inputs[0] = StringCodec.encode_input(\"prompt\", inference_request.inputs[0].data.root)\n",
     "inference_request_g = converters.ModelInferRequestConverter.from_types(\n",
     "    inference_request, model_name=\"text-model\", model_version=None\n",
     ")\n",
@@ -338,11 +362,6 @@
    "source": [
     "Note that for gRPC, the request is transformed into an async generator which is then passed to the `ModelStreamInfer` method. The response is also an async generator which can be iterated over to get the response."
    ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": []
   }
  ],
  "metadata": {
@@ -361,7 +380,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.10.14"
+   "version": "3.10.12"
   }
  },
  "nbformat": 4,

diff --git a/docs/examples/streaming/README.md b/docs/examples/streaming/README.md
@@ -78,8 +78,7 @@ The next step will be to create 2 configuration files:
 {
   "debug": false,
   "parallel_workers": 0,
-  "gzip_enabled": false,
-  "metrics_endpoint": null
+  "gzip_enabled": false
 }
 
 ```
@@ -88,7 +87,6 @@ Note the currently there are three main limitations of the streaming support in
 
 - distributed workers are not supported (i.e., the `parallel_workers` setting should be set to `0`)
 - `gzip` middleware is not supported for REST (i.e., `gzip_enabled` setting should be set to `false`)
-- metrics endpoint is not available (i.e. `metrics_endpoint` is also disabled for streaming for gRPC)
 
 #### model-settings.json
 
@@ -195,7 +193,7 @@ import mlserver.grpc.dataplane_pb2_grpc as dataplane
 inference_request = types.InferenceRequest.parse_file("./generate-request.json")
 
 # need to convert from string to bytes for grpc
-inference_request.inputs[0] = StringCodec.encode_input("prompt", inference_request.inputs[0].data.__root__)
+inference_request.inputs[0] = StringCodec.encode_input("prompt", inference_request.inputs[0].data.root)
 inference_request_g = converters.ModelInferRequestConverter.from_types(
     inference_request, model_name="text-model", model_version=None
 )
@@ -213,5 +211,3 @@ async with grpc.aio.insecure_channel("localhost:8081") as grpc_channel:
 ```
 
 Note that for gRPC, the request is transformed into an async generator which is then passed to the `ModelStreamInfer` method. The response is also an async generator which can be iterated over to get the response.
-
-
diff --git a/docs/examples/streaming/settings.json b/docs/examples/streaming/settings.json
@@ -2,6 +2,5 @@
 {
   "debug": false,
   "parallel_workers": 0,
-  "gzip_enabled": false,
-  "metrics_endpoint": null
+  "gzip_enabled": false
 }
diff --git a/docs/examples/streaming/text_model.py b/docs/examples/streaming/text_model.py
@@ -7,19 +7,6 @@
 
 class TextModel(MLModel):
 
-    async def predict(self, payload: InferenceRequest) -> InferenceResponse:
-        text = StringCodec.decode_input(payload.inputs[0])[0]
-        return InferenceResponse(
-            model_name=self._settings.name,
-            outputs=[
-                StringCodec.encode_output(
-                    name="output",
-                    payload=[text],
-                    use_bytes=True,
-                ),
-            ],
-        )
-
     async def predict_stream(
         self, payloads: AsyncIterator[InferenceRequest]
     ) -> AsyncIterator[InferenceResponse]:

diff --git a/docs/user-guide/custom.md b/docs/user-guide/custom.md
@@ -215,7 +215,8 @@ In these cases, to load your custom runtime, MLServer will need access to these
 dependencies.
 
 It is possible to load this custom set of dependencies by providing them
-through an [environment tarball](../examples/conda/README), whose path can be
+through an [environment tarball](../examples/conda/README) or by giving a
+path to an already exisiting python environment. Both paths can be
 specified within your `model-settings.json` file.
 
 ```{warning}
@@ -277,6 +278,21 @@ Note that, in the folder layout above, we are assuming that:
   }
   ```
 
+If you want to use an already exisiting python environment, you can use the parameter `environment_path` of your `model-settings.json`:
+
+```
+---
+emphasize-lines: 5
+---
+{
+  "model": "sum-model",
+  "implementation": "models.MyCustomRuntime",
+  "parameters": {
+    "environment_path": "~/micromambda/envs/my-conda-environment"
+  }
+}
+```
+
 ## Building a custom MLServer image
 
 ```{note}

diff --git a/docs/user-guide/streaming.md b/docs/user-guide/streaming.md
@@ -32,4 +32,3 @@ There are three main limitations of the streaming support in MLServer:
 
 - the `parallel_workers` setting should be set to `0` to disable distributed workers (to be addressed in future releases)
 - for REST, the `gzip_enabled` setting should be set to `false` to disable GZIP compression, as streaming is not compatible with GZIP compression (see issue [here]( https://github.com/encode/starlette/issues/20#issuecomment-704106436))
-- `metrics_endpoint` is also disabled for streaming for gRPC (to be addressed in future releases)
-Original file line number
+Diff line change
@@ Expand Up / @@ -216,6 +216,7 @@ jobs: @@
               - mllib
               - sklearn
               - xgboost
+              - catboost
         steps:
           - name: Maximize build space
             uses: easimon/maximize-build-space@master
@@ Expand Down @@
Original file line number	Diff line number	Diff line change
Expand Up		@@ -32,4 +32,3 @@ There are three main limitations of the streaming support in MLServer:

		- the `parallel_workers` setting should be set to `0` to disable distributed workers (to be addressed in future releases)
		- for REST, the `gzip_enabled` setting should be set to `false` to disable GZIP compression, as streaming is not compatible with GZIP compression (see issue [here]( https://github.com/encode/starlette/issues/20#issuecomment-704106436))
		- `metrics_endpoint` is also disabled for streaming for gRPC (to be addressed in future releases)