This release includes several hotfixes.
New Features
- Max Tokens Limit Option: Added
--max-tokens-limit MAX_TOKENS_LIMIT
option. You can now adjust the upper limit of max tokens. If exceeded, a pydantic error will be triggered.
Enhancements
- Docker Image Update: Removed the PORT environment variable. You can now customize the port using the
docker run
command and--port
option.
Bug Fixes
-
CUDA Memory Error: If a CUDA-related error occurs, a
MemoryError
is raised to automatically terminate the worker process. Subsequent worker processes can be automatically generated. -
Unix Lifespan Bug: Fixed a bug where the process pool would not close and deadlock would occur when terminating the fastapi app in a Unix environment.
-
Langchain Compatibility: Resolved a type conflict issue causing a pydantic validation error when using ChatOpenAI in Langchain if the request body contained
None
.None
values are now ignored.
Usage Example for Docker
docker run -d --name my-container --port 8080:8080 my-image
Usage Example for Max Tokens Limit Option
python -m main --max-tokens-limit 500