Releases: cortexlabs/cortex
Releases · cortexlabs/cortex
v0.25.0
v0.25.0
New features
- Support server-side micro batching for the Python predictor (docs) #1653 #1382 (miguelvr)
- Add timeout configuration for batch jobs (docs) #1712 #1324 (vishalbollu)
- Support batch retries (docs) #1713 #1540 (lapaniku, vishalbollu)
- Support sending failed batches to a dead-letter queue (docs) #1713 #1541 (lapaniku, vishalbollu)
- Support installing the cortex Python client in predictors #1709 #1670 #1206 (RobertLucian)
Breaking changes
- The
predictor.model_path
field of the realtime api configuration has been moved topredictor.models.path
. In addition, for the Python predictor type,predictor.models
has been renamed topredictor.multi_model_reloading
. Here is the entire API configuration schema.
Bug fixes
- Misc batch reliability improvements #1705 #1718 #1729 (vishalbollu)
Docs
- Reorganize the docs structure #1696 #1701 #1704 #1719 #1675 (ospillinger)
- Add GCP to the contributing guide #1720 #1654 (deliahu)
- Add docs for setting up kubectl on GCP 759b4b1 (deliahu)
Misc
- Parse the request body as a string when content type
text/plain
is specified #1714 (deliahu) - Support paths to single ONNX files in API configuration #1711 #1686 (RobertLucian)
- Support deploying public S3 models on GCP, and public GCS models on AWS #1694 #1684 (RobertLucian)
- Pre-download docker images when creating GCP clusters #1721 #1658 (deliahu)
- Speed up the validation processes for multi-model APIs #1690 #1663 (RobertLucian)
v0.24.1
v0.24.0
v0.24.0
New features
- Add GCP support: our initial release supports all three predictor types (Python, TensorFlow, ONNX), on CPU or GPU, with live reloading, multi-model caching, and cluster autoscaling #1655 #1672 #1667 #1661 #114 #1600 #1602 #1616 #1624 (RobertLucian, deliahu, vishalbollu)
- Add the patch command to the CLI and Python client, which can be used to update an API using only the API configuration (without needing to provide the predictor's Python implementation) #1651 #1666 #1329 (vishalbollu)
- Support deploying predictor Python classes from the Python client #1587 #1617 (see the tutorial for an example) (vishalbollu)
Breaking changes
- The Python client's
deploy()
function has been renamed tocreate_api()
, and some of the argument names have changed (docs)
Bug fixes
- Enable CORS for APIs accessed via API Gateway or load balancer #1649 #1234 (RobertLucian, deliahu)
- Fix local TensorFlow models when live reloading is enabled #1668 #1554 (RobertLucian)
- Prevent TensorFlow multi-model caching from attempting to download local models from S3 #1669 #1598 (RobertLucian)
Docs
- Miscellaneous docs improvements (vishalbollu, ospillinger)
Misc
- Improve Python client cross Python version compatibility #1640 (vishalbollu)
- Reinstall TensorFlow and ONNX dependencies when the Python version is overridden #1652 (vishalbollu)
- Terminate container when bootloader script fails #1639 (vishalbollu)
v0.23.0
v0.23.0
New features
- Update Python client
deploy()
to accept a Python dictionary for API configuration (previously, only a file path was supported) (docs) #1587 (vishalbollu) - Show API deployment history in
cortex get API_NAME
command #1544 #1496 (deliahu) - Add
cortex export API_NAME
andcortex export API_NAME API_ID
commands to export specific and historical API deployments #1544 #1497 (deliahu) - Build and push
python-predictor-gpu-slim
image with different combinations of cuda and cudnn (cuda10.0-cudnn7
,cuda10.1-cudnn7
,cuda10.1-cudnn8
,cuda10.2-cudnn7
,cuda10.2-cudnn8
,cuda11.0-cudnn8
,cuda11.1-cudnn8
) (docs) #1575 #1574 (deliahu)
Bug fixes
- Allow local deployments of public S3 models without requiring AWS credentials #1589 #1588 (RobertLucian)
Docs
- Add guide for avoiding Docker Hub rate limits #1576 (RobertLucian, deliahu)
- Add guide for self-hosting Cortex's Docker images #1579 (RobertLucian, deliahu)
Misc
- Remove API request maximum payload size limit #1583 (deliahu)
- Switch to Quay docker container registry #1578 (deliahu, RobertLucian)
v0.22.1
v0.22.1
Bug fixes
- Set the predictor's working directory to the root Cortex project directory #1573 #1572 (deliahu)
- Allow
max_instances
to be updated viacortex cluster configure
#1568 #1567 (deliahu) - Gracefully stop the serving container when a multi-processed cron throws exception #1560 #1552 (RobertLucian)
Docs
- Demonstrate how to make API requests with various payload types (binary, form fields, etc), and show how to access them in
predict()
#1566 (docs) - Misc docs improvements #1551 #1556 c3dab40 #1557 (deliahu, RobertLucian)
Misc
- Build and upload the Python package/CLI to a public S3 bucket #1562 (vishalbollu)
v0.22.0
v0.22.0
New features
- Multi-model caching: serve a collection of models that is collectively bigger than what will fit in memory (via LRU cache eviction) (docs) #1428 #619 (RobertLucian)
- Live reloading: support updating models in running APIs by adding new versions to the model's S3 directory (docs) #1428 #1252 (RobertLucian)
- Inter-process fairness: distribute requests within an API replica evenly across all processes #1526 #839 #1298 (RobertLucian)
- Support requests between APIs within the same cluster (docs) #1503 #1241 (deliahu)
- Allow overriding of CLI install path and config directory (via
$CORTEX_INSTALL_PATH
and$CORTEX_CLI_CONFIG_DIR
) (docs) #1521 #1222 (deliahu)
Breaking changes
- ONNX model paths in API configuration files must now point to a directory containing a single ONNX file, rather than the onnx file itself. For example
model_path: s3://cortex-examples/onnx/yolov5-youtube/yolov5s.onnx
becomesmodel_path: s3://cortex-examples/onnx/yolov5-youtube
. - The
--env/-e
flag in allcortex cluster
commands has been renamed to--configure-env/-e
, and if not provided, the environment namedaws
will no longer be configured in thecortex cluster info
command
Bug fixes
- Fix intermittent failed requests during rolling updates #1526 #814 (RobertLucian)
- Prevent CLI environments from getting overwritten when multiple
cortex cluster
commands are run concurrently #1520 #1410 (deliahu)
Docs
- Add Python client docs #1519 #1502 (deliahu)
- Add guide for running in production #1513 #1464 #1257 (deliahu)
- Add guide for low-cost clusters #1514 #1425 (deliahu)
- Add guide for using a REST API Gateway #1505 #1228 (deliahu)
- Add guide for troubleshooting
cortex cluster down
failures #1515 #1319 (deliahu)
Misc
- Stagger Predictor
__init__()
calls to reduce peak memory consumption #1543 #1450 (RobertLucian) - Add
--name/-n
and--region/-r
flags tocortex cluster info
,cortex cluster export
, andcortex cluster down
commands #1492 #1363 (RobertLucian) - Rename
--env/-e
flag to--configure-env/-e
incortex cluster
commands and update its behavior #1533 #1412 (deliahu) - Disallow ARM-based instances, which are not currently supported #1536 (deliahu)
- Validate AWS vCPU quota is sufficient for up to
max_instances
instances when runningcortex cluster up
andcortex cluster configure
#1537 #1461 (deliahu)
v0.21.0
New features
- Add Python client: pypi.org/project/cortex #1449 #684 (vishalbollu)
- Add support for private docker image registries (docs) #1460 #1113 (deliahu)
Bug fixes
- Fix minor BatchAPI bugs #1471 #1468 #1480 #1473 (vishalbollu, RobertLucian)
- Bypass instance limit check if AWS's API doesn't provide quota information (this was blocking cluster creation in
eu-north-1
) #1439 #1438 (deliahu)
Docs
- Add a guide for how to install the CLI on Windows #1476 #715 (RobertLucian)
Misc
- Change default local port from 8888 to 8890 to avoid port conflicts with Jupyter #1456 (vishalbollu)
- Disallow instance types that aren't supported by NLB #1436 #1433 (deliahu)
- Add
--cluster-aws-key
and--cluster-aws-secret
flags tocortex cluster configure
command #1404 (deliahu) - Add
--output
flag tocortex env list
command #1444 (vishalbollu)
v0.20.0
v0.20.0
New features
- Add
cortex cluster export
command to export all APIs running in a cluster (docs) #1368 #1255 (vishalbollu) - Enable users to specify CIDR ranges for the cluster's VPC (docs) #1388 (vishalbollu)
- Support json output for CLI commands (via
-o/--output json
) #1365 #1161 (vishalbollu) - Support the nvidia device driver (nvidia-container-toolkit) when running locally #1366 #1223 (vishalbollu)
Breaking changes
- The valid values for
api_gateway
in the cluster configuration file have been changed fromenabled
/disabled
topublic
/none
(to match the values fornetworking.api_gateway
in the API configuration file).
Bug fixes
- Support AWS tags with spaces and valid special characters #1374 #1355 #1380 #1385 #1373 (deliahu)
- Fix tensor shape validation for the TensorFlow predictor #1311 #1310 (RobertLucian)
- Allow
cortex cluster *
commands to be run from within a docker container #1370 #1361 #1325 (deliahu)
New examples
- pytorch/question-generator to generate questions given text and the correct answer (uses transformers and spacy) #1308 (ismaelc)
Docs
- Add documentation for how to install a specific version of the CLI #1386 #1244 (vishalbollu)
- Add sections for overprovisioning and responsiveness to autoscaling docs #1397 (deliahu)
- Add documentation for how to allow IAM users who did not create the cortex cluster to run
cortex cluster *
commands #1392 #1391 (deliahu) - Add guide for setting up
kubectl
to access the cluster #1344 #1343 (RobertLucian)
Misc
- Update sources of AWS credentials for
cortex cluster *
commands, and improve transparency (docs) #1378 #1229 (vishalbollu) - Rename cluster
api_gateway
config values to match API config #1335 #1334 (deliahu) - Set the default value for
networking.api_gateway
in the API configuration tonone
if api gateway is disabled cluster-wide #1337 #1336 (deliahu) - Support c6g and r6g instances #1332 #809 (deliahu)
- Display autoscaling group activity history when
cortex cluster up
fails #1342 #1340 (deliahu) - Print debug info if
cortex cluster up
times out #1396 (deliahu) - Add Inferentia compute statistics to
cortex cluster info
command #1354 #1304 (RobertLucian) - Disable prompts in
get-cli.sh
if not running interactively #1372 #1371 (deliahu) - Update
cortex help
output #1398 (deliahu)
v0.19.0
New features
- Support batch APIs docs #1203 #523 (vishalbollu)
- Support traffic splitting (enables A/B testing, multi-armed bandit, etc) docs #1213 #1270 #1132 #275 #1089 (tthebst)
- Support server-side request batching for the TensorFlow Predictor docs #1193 #1060 (RobertLucian)
- Add
post_predict()
method to Predictor interface (runs after the response has been sent) docs #1237 #954 (RobertLucian) - Support disabling API Gateway cluster-wide docs #1259 #1198 (deliahu)
- Support different CUDA versions for the slim Python Predictor image docs #1263 #923 #1254 (RobertLucian)
- Add additional widgets to the CloudWatch Dashboard (avg in-flight requests per replica, active replicas) docs #1181 (RobertLucian)
Breaking changes
kind
is now a required top-level field for all API configurations. Existing APIs should addkind: RealtimeAPI
. This release adds support forkind: BatchAPI
andkind: TrafficSplitter
.
Bug fixes
- Fix
python_path
config field #1202 (deliahu) - Fix local TensorFlow deploy from parent directory #1274 (deliahu)
- Improve error response for invalid payloads #1212 #1208 (RobertLucian)
New examples
- onnx/yolov5-youtube #1201 (dsuess)
- Update PyTorch text generator example to use Hugging Face transfomers GPT-2 model #1177 (ospillinger)
Docs
- Update tutorial to use the pytorch text-generator example #1278 #1256 (deliahu)
- Improve instructions for updating cluster without downtime #1261 (deliahu)
- Mention API Gateway timeout in 404/503 API responses guide #1264 #1225 (deliahu)
Misc
- Set tags on log groups #1164 #1078 (tthebst)
- Display API metrics in the CLI by API ID (rather than by API name) #1216 (vishalbollu)
- Fix recursive error message for deploy/delete CLI commands #1247 #1218 (RobertLucian)
- Add shell completion to .zshrc file during CLI installation #1265 #1221 (deliahu)
- Handle OOM error when project files are too large #1217 (RobertLucian)
- Display image pull errors #1167 #955 (deliahu)
- Display local Docker image pull error when out of space #1238 #1236 (zouyee)
v0.18.1
Bug fixes
- Fix dynamic axes for ONNX models #1187 #1186 (RobertLucian)
- Fix memory node capacity calculation for multi-api configuration files #1185 (deliahu)
- Check cluster-name tag when choosing load balancer for VPC Link integration #1173 (deliahu)
New guides
- Troubleshooting: API request errors (deliahu)
- Troubleshooting: TensorFlow session in predict() (RobertLucian)
Misc
- Delete API Gateway if
cluster up
fails #1172 (deliahu) - Move image version verification from serve.py to run.sh #1180 #1183 (vishalbollu)
- Add retries for resource tagging during
cluster up
#1188 (deliahu) - Use info log level when TensorFlow model is being loaded #1171 (RobertLucian)
- Increase max number of processes per API replica to 100 #1166 (RobertLucian)
- Allow empty cluster config #1179 (deliahu)