Skip to content

Commit

Permalink
Merge pull request #1902 from SeldonIO/master
Browse files Browse the repository at this point in the history
ci: Merge change for release 1.6.1
  • Loading branch information
RobertSamoilescu authored Sep 10, 2024
2 parents 09637a8 + a5c82d1 commit 507aa06
Show file tree
Hide file tree
Showing 42 changed files with 3,845 additions and 1,912 deletions.
1 change: 1 addition & 0 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -216,6 +216,7 @@ jobs:
- mllib
- sklearn
- xgboost
- catboost
steps:
- name: Maximize build space
uses: easimon/maximize-build-space@master
Expand Down
87 changes: 85 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,93 @@
# Changelog


<a name="1.6.0"></a>
## [1.6.0](https://github.com/SeldonIO/MLServer/releases/tag/1.6.0) - 26 Jun 2024

## Overview


### Upgrades
MLServer supports Pydantic V2.

### Features
MLServer supports streaming data to and from your models.

Streaming support is available for both the REST and gRPC servers:
* for the REST server is limited only to server streaming. This means that the client sends a single request to the server, and the server responds with a stream of data.
* for the gRPC server is available for both client and server streaming. This means that the client sends a stream of data to the server, and the server responds with a stream of data.

See our [docs](https://mlserver.readthedocs.io/en/1.6.0/user-guide/streaming.html) and [example](https://mlserver.readthedocs.io/en/1.6.0/examples/streaming/README.html) for more details.

## What's Changed
* fix(ci): fix typo in CI name by [@sakoush](https://github.com/sakoush) in https://github.com/SeldonIO/MLServer/pull/1623
* Update CHANGELOG by [@github-actions](https://github.com/github-actions) in https://github.com/SeldonIO/MLServer/pull/1624
* Re-generate License Info by [@github-actions](https://github.com/github-actions) in https://github.com/SeldonIO/MLServer/pull/1634
* Fix mlserver_huggingface settings device type by [@geodavic](https://github.com/geodavic) in https://github.com/SeldonIO/MLServer/pull/1486
* fix: Adjust HF tests post-merge of PR [#1486](https://github.com/SeldonIO/MLServer/issues/1486) by [@sakoush](https://github.com/sakoush) in https://github.com/SeldonIO/MLServer/pull/1635
* Update README.md w licensing clarification by [@paulb-seldon](https://github.com/paulb-seldon) in https://github.com/SeldonIO/MLServer/pull/1636
* Re-generate License Info by [@github-actions](https://github.com/github-actions) in https://github.com/SeldonIO/MLServer/pull/1642
* fix(ci): optimise disk space for GH workers by [@sakoush](https://github.com/sakoush) in https://github.com/SeldonIO/MLServer/pull/1644
* build: Update maintainers by [@jesse-c](https://github.com/jesse-c) in https://github.com/SeldonIO/MLServer/pull/1659
* fix: Missing f-string directives by [@jesse-c](https://github.com/jesse-c) in https://github.com/SeldonIO/MLServer/pull/1677
* build: Add Catboost runtime to Dependabot by [@jesse-c](https://github.com/jesse-c) in https://github.com/SeldonIO/MLServer/pull/1689
* Fix JSON input shapes by [@ReveStobinson](https://github.com/ReveStobinson) in https://github.com/SeldonIO/MLServer/pull/1679
* build(deps): bump alibi-detect from 0.11.5 to 0.12.0 by [@jesse-c](https://github.com/jesse-c) in https://github.com/SeldonIO/MLServer/pull/1702
* build(deps): bump alibi from 0.9.5 to 0.9.6 by [@jesse-c](https://github.com/jesse-c) in https://github.com/SeldonIO/MLServer/pull/1704
* Docs correction - Updated README.md in mlflow to match column names order by [@vivekk0903](https://github.com/vivekk0903) in https://github.com/SeldonIO/MLServer/pull/1703
* fix(runtimes): Remove unused Pydantic dependencies by [@jesse-c](https://github.com/jesse-c) in https://github.com/SeldonIO/MLServer/pull/1725
* test: Detect generate failures by [@jesse-c](https://github.com/jesse-c) in https://github.com/SeldonIO/MLServer/pull/1729
* build: Add granularity in types generation by [@jesse-c](https://github.com/jesse-c) in https://github.com/SeldonIO/MLServer/pull/1749
* Migrate to Pydantic v2 by [@jesse-c](https://github.com/jesse-c) in https://github.com/SeldonIO/MLServer/pull/1748
* Re-generate License Info by [@github-actions](https://github.com/github-actions) in https://github.com/SeldonIO/MLServer/pull/1753
* Revert "build(deps): bump uvicorn from 0.28.0 to 0.29.0" by [@jesse-c](https://github.com/jesse-c) in https://github.com/SeldonIO/MLServer/pull/1758
* refactor(pydantic): Remaining migrations for deprecated functions by [@jesse-c](https://github.com/jesse-c) in https://github.com/SeldonIO/MLServer/pull/1757
* Fixed openapi dataplane.yaml by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in https://github.com/SeldonIO/MLServer/pull/1752
* fix(pandas): Use Pydantic v2 compatible type by [@jesse-c](https://github.com/jesse-c) in https://github.com/SeldonIO/MLServer/pull/1760
* Fix Pandas codec decoding from numpy arrays by [@lhnwrk](https://github.com/lhnwrk) in https://github.com/SeldonIO/MLServer/pull/1751
* build: Bump versions for Read the Docs by [@jesse-c](https://github.com/jesse-c) in https://github.com/SeldonIO/MLServer/pull/1761
* docs: Remove quotes around local TOC by [@jesse-c](https://github.com/jesse-c) in https://github.com/SeldonIO/MLServer/pull/1764
* Spawn worker in custom environment by [@lhnwrk](https://github.com/lhnwrk) in https://github.com/SeldonIO/MLServer/pull/1739
* Re-generate License Info by [@github-actions](https://github.com/github-actions) in https://github.com/SeldonIO/MLServer/pull/1767
* basic contributing guide on contributing and opening a PR by [@bohemia420](https://github.com/bohemia420) in https://github.com/SeldonIO/MLServer/pull/1773
* Inference streaming support by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in https://github.com/SeldonIO/MLServer/pull/1750
* Re-generate License Info by [@github-actions](https://github.com/github-actions) in https://github.com/SeldonIO/MLServer/pull/1779
* build: Lock GitHub runners' OS by [@jesse-c](https://github.com/jesse-c) in https://github.com/SeldonIO/MLServer/pull/1765
* Removed text-model form benchmarking by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in https://github.com/SeldonIO/MLServer/pull/1790
* Bumped mlflow to 2.13.1 and gunicorn to 22.0.0 by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in https://github.com/SeldonIO/MLServer/pull/1791
* Build(deps): Update to poetry version 1.8.3 in docker build by [@sakoush](https://github.com/sakoush) in https://github.com/SeldonIO/MLServer/pull/1792
* Bumped werkzeug to 3.0.3 by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in https://github.com/SeldonIO/MLServer/pull/1793
* Docs streaming by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in https://github.com/SeldonIO/MLServer/pull/1789
* Bump uvicorn 0.30.1 by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in https://github.com/SeldonIO/MLServer/pull/1795
* Fixes for all-runtimes by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in https://github.com/SeldonIO/MLServer/pull/1794
* Fix BaseSettings import for pydantic v2 by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in https://github.com/SeldonIO/MLServer/pull/1798
* Bumped preflight version to 1.9.7 by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in https://github.com/SeldonIO/MLServer/pull/1797
* build: Install dependencies only in Tox environments by [@jesse-c](https://github.com/jesse-c) in https://github.com/SeldonIO/MLServer/pull/1785
* Bumped to 1.6.0.dev2 by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in https://github.com/SeldonIO/MLServer/pull/1803
* Fix CI/CD macos-huggingface by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in https://github.com/SeldonIO/MLServer/pull/1805
* Fixed macos kafka CI by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in https://github.com/SeldonIO/MLServer/pull/1807
* Update poetry lock by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in https://github.com/SeldonIO/MLServer/pull/1808
* Re-generate License Info by [@github-actions](https://github.com/github-actions) in https://github.com/SeldonIO/MLServer/pull/1813
* Fix/macos all runtimes by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in https://github.com/SeldonIO/MLServer/pull/1823
* fix: Update stale reviewer in licenses.yml workflow by [@sakoush](https://github.com/sakoush) in https://github.com/SeldonIO/MLServer/pull/1824
* ci: Merge changes from master to release branch by [@sakoush](https://github.com/sakoush) in https://github.com/SeldonIO/MLServer/pull/1825

## New Contributors
* [@paulb-seldon](https://github.com/paulb-seldon) made their first contribution in https://github.com/SeldonIO/MLServer/pull/1636
* [@ReveStobinson](https://github.com/ReveStobinson) made their first contribution in https://github.com/SeldonIO/MLServer/pull/1679
* [@vivekk0903](https://github.com/vivekk0903) made their first contribution in https://github.com/SeldonIO/MLServer/pull/1703
* [@RobertSamoilescu](https://github.com/RobertSamoilescu) made their first contribution in https://github.com/SeldonIO/MLServer/pull/1752
* [@lhnwrk](https://github.com/lhnwrk) made their first contribution in https://github.com/SeldonIO/MLServer/pull/1751
* [@bohemia420](https://github.com/bohemia420) made their first contribution in https://github.com/SeldonIO/MLServer/pull/1773

**Full Changelog**: https://github.com/SeldonIO/MLServer/compare/1.5.0...1.6.0

[Changes][1.6.0]


<a name="1.5.0"></a>
## [1.5.0](https://github.com/SeldonIO/MLServer/releases/tag/1.5.0) - 05 Mar 2024

<!-- Release notes generated using configuration in .github/release.yml at 1.5.0 -->

## What's Changed

* Update CHANGELOG by [@github-actions](https://github.com/github-actions) in https://github.com/SeldonIO/MLServer/pull/1592
Expand Down Expand Up @@ -427,6 +509,7 @@ To learn more about how to use MLServer directly from the MLflow CLI, check out
[Changes][1.1.0]


[1.6.0]: https://github.com/SeldonIO/MLServer/compare/1.5.0...1.6.0
[1.5.0]: https://github.com/SeldonIO/MLServer/compare/1.4.0...1.5.0
[1.4.0]: https://github.com/SeldonIO/MLServer/compare/1.3.5...1.4.0
[1.3.5]: https://github.com/SeldonIO/MLServer/compare/1.3.4...1.3.5
Expand Down
2 changes: 1 addition & 1 deletion docs/examples/cassava/model/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
tensorflow==2.12.0
tensorflow==2.12.1
tensorflow-hub==0.13.0
2 changes: 1 addition & 1 deletion docs/examples/cassava/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
mlserver==1.3.2
tensorflow==2.12.0
tensorflow==2.12.1
tensorflow-hub==0.13.0
59 changes: 39 additions & 20 deletions docs/examples/streaming/README.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 1,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -121,7 +121,7 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 2,
"metadata": {},
"outputs": [
{
Expand All @@ -138,8 +138,7 @@
"{\n",
" \"debug\": false,\n",
" \"parallel_workers\": 0,\n",
" \"gzip_enabled\": false,\n",
" \"metrics_endpoint\": null\n",
" \"gzip_enabled\": false\n",
"}\n"
]
},
Expand All @@ -150,8 +149,7 @@
"Note the currently there are three main limitations of the streaming support in MLServer:\n",
"\n",
"- distributed workers are not supported (i.e., the `parallel_workers` setting should be set to `0`)\n",
"- `gzip` middleware is not supported for REST (i.e., `gzip_enabled` setting should be set to `false`)\n",
"- metrics endpoint is not available (i.e. `metrics_endpoint` is also disabled for streaming for gRPC)"
"- `gzip` middleware is not supported for REST (i.e., `gzip_enabled` setting should be set to `false`)"
]
},
{
Expand All @@ -163,7 +161,7 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 3,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -227,14 +225,14 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Writing generate-request.json\n"
"Overwriting generate-request.json\n"
]
}
],
Expand Down Expand Up @@ -272,9 +270,22 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 5,
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['What']\n",
"[' is']\n",
"[' the']\n",
"[' capital']\n",
"[' of']\n",
"[' France?']\n"
]
}
],
"source": [
"import httpx\n",
"from httpx_sse import connect_sse\n",
Expand All @@ -301,9 +312,22 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 6,
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['What']\n",
"[' is']\n",
"[' the']\n",
"[' capital']\n",
"[' of']\n",
"[' France?']\n"
]
}
],
"source": [
"import grpc\n",
"import mlserver.types as types\n",
Expand All @@ -315,7 +339,7 @@
"inference_request = types.InferenceRequest.parse_file(\"./generate-request.json\")\n",
"\n",
"# need to convert from string to bytes for grpc\n",
"inference_request.inputs[0] = StringCodec.encode_input(\"prompt\", inference_request.inputs[0].data.__root__)\n",
"inference_request.inputs[0] = StringCodec.encode_input(\"prompt\", inference_request.inputs[0].data.root)\n",
"inference_request_g = converters.ModelInferRequestConverter.from_types(\n",
" inference_request, model_name=\"text-model\", model_version=None\n",
")\n",
Expand All @@ -338,11 +362,6 @@
"source": [
"Note that for gRPC, the request is transformed into an async generator which is then passed to the `ModelStreamInfer` method. The response is also an async generator which can be iterated over to get the response."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
}
],
"metadata": {
Expand All @@ -361,7 +380,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.14"
"version": "3.10.12"
}
},
"nbformat": 4,
Expand Down
8 changes: 2 additions & 6 deletions docs/examples/streaming/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,8 +78,7 @@ The next step will be to create 2 configuration files:
{
"debug": false,
"parallel_workers": 0,
"gzip_enabled": false,
"metrics_endpoint": null
"gzip_enabled": false
}

```
Expand All @@ -88,7 +87,6 @@ Note the currently there are three main limitations of the streaming support in

- distributed workers are not supported (i.e., the `parallel_workers` setting should be set to `0`)
- `gzip` middleware is not supported for REST (i.e., `gzip_enabled` setting should be set to `false`)
- metrics endpoint is not available (i.e. `metrics_endpoint` is also disabled for streaming for gRPC)

#### model-settings.json

Expand Down Expand Up @@ -195,7 +193,7 @@ import mlserver.grpc.dataplane_pb2_grpc as dataplane
inference_request = types.InferenceRequest.parse_file("./generate-request.json")

# need to convert from string to bytes for grpc
inference_request.inputs[0] = StringCodec.encode_input("prompt", inference_request.inputs[0].data.__root__)
inference_request.inputs[0] = StringCodec.encode_input("prompt", inference_request.inputs[0].data.root)
inference_request_g = converters.ModelInferRequestConverter.from_types(
inference_request, model_name="text-model", model_version=None
)
Expand All @@ -213,5 +211,3 @@ async with grpc.aio.insecure_channel("localhost:8081") as grpc_channel:
```

Note that for gRPC, the request is transformed into an async generator which is then passed to the `ModelStreamInfer` method. The response is also an async generator which can be iterated over to get the response.


3 changes: 1 addition & 2 deletions docs/examples/streaming/settings.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,5 @@
{
"debug": false,
"parallel_workers": 0,
"gzip_enabled": false,
"metrics_endpoint": null
"gzip_enabled": false
}
13 changes: 0 additions & 13 deletions docs/examples/streaming/text_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,19 +7,6 @@

class TextModel(MLModel):

async def predict(self, payload: InferenceRequest) -> InferenceResponse:
text = StringCodec.decode_input(payload.inputs[0])[0]
return InferenceResponse(
model_name=self._settings.name,
outputs=[
StringCodec.encode_output(
name="output",
payload=[text],
use_bytes=True,
),
],
)

async def predict_stream(
self, payloads: AsyncIterator[InferenceRequest]
) -> AsyncIterator[InferenceResponse]:
Expand Down
18 changes: 17 additions & 1 deletion docs/user-guide/custom.md
Original file line number Diff line number Diff line change
Expand Up @@ -215,7 +215,8 @@ In these cases, to load your custom runtime, MLServer will need access to these
dependencies.

It is possible to load this custom set of dependencies by providing them
through an [environment tarball](../examples/conda/README), whose path can be
through an [environment tarball](../examples/conda/README) or by giving a
path to an already exisiting python environment. Both paths can be
specified within your `model-settings.json` file.

```{warning}
Expand Down Expand Up @@ -277,6 +278,21 @@ Note that, in the folder layout above, we are assuming that:
}
```

If you want to use an already exisiting python environment, you can use the parameter `environment_path` of your `model-settings.json`:

```
---
emphasize-lines: 5
---
{
"model": "sum-model",
"implementation": "models.MyCustomRuntime",
"parameters": {
"environment_path": "~/micromambda/envs/my-conda-environment"
}
}
```

## Building a custom MLServer image

```{note}
Expand Down
1 change: 0 additions & 1 deletion docs/user-guide/streaming.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,4 +32,3 @@ There are three main limitations of the streaming support in MLServer:

- the `parallel_workers` setting should be set to `0` to disable distributed workers (to be addressed in future releases)
- for REST, the `gzip_enabled` setting should be set to `false` to disable GZIP compression, as streaming is not compatible with GZIP compression (see issue [here]( https://github.com/encode/starlette/issues/20#issuecomment-704106436))
- `metrics_endpoint` is also disabled for streaming for gRPC (to be addressed in future releases)
Loading

0 comments on commit 507aa06

Please sign in to comment.