Highlights
- Added SGLang worker for vision language models, lower latency and higher throughput #2928
- Vision langauge WebUI #2960
- OpenAI-compatible API server now supports image input #2928
- Added LightLLM worker for higher throughput https://github.com/lm-sys/FastChat/blob/main/docs/lightllm_integration.md
- Added Apple MLX worker #2940
What's Changed
- fix specify local path issue use model from www.modelscope.cn by @liuyhwangyh in #2934
- support openai embedding for topic clustering by @CodingWithTim in #2729
- Remove duplicate API endpoint by @surak in #2949
- Update Hermes Mixtral by @teknium1 in #2938
- Enablement of REST API Usage within Google Colab Free Tier by @ggcr in #2940
- Create a new worker implementation for Apple MLX by @aliasaria in #2937
- feat: support Model Yuan2.0, a new generation Fundamental Large Language Model developed by IEIT System by @cauwulixuan in #2936
- Fix the pooling method of BGE embedding model by @staoxiao in #2926
- SGLang Worker by @BabyChouSr in #2928
- Update mlx_worker to be async by @aliasaria in #2958
- Integrate LightLLM into serve worker by @zeyugao in #2888
- Copy button by @surak in #2963
- feat: train with template by @congchan in #2951
- fix content maybe a str by @zhouzaida in #2968
- Adding download folder information in README by @dheeraj-326 in #2972
- use cl100k_base as the default tiktoken encoding by @bjwswang in #2974
- Update README.md by @merrymercy in #2975
- Fix tokenizer for vllm worker by @Michaelvll in #2984
- update yuan2.0 generation by @wangpengfei1013 in #2989
- fix: tokenization mismatch when training with different templates by @congchan in #2996
- fix: inconsistent tokenization by llama tokenizer by @congchan in #3006
- Fix type hint for play_a_match_single by @MonkeyLeeT in #3008
- code update by @infwinston in #2997
- Update model_support.md by @infwinston in #3016
- Update lightllm_integration.md by @eltociear in #3014
- Upgrade gradio to 4.17 by @infwinston in #3027
- Update MLX integration to use new generate_step function signature by @aliasaria in #3021
- Update readme by @merrymercy in #3028
- Update gradio version in
pyproject.toml
and fix a bug by @merrymercy in #3029 - Update gradio demo and API model providers by @merrymercy in #3030
- Gradio Web Server for Multimodal Models by @BabyChouSr in #2960
- Migrate the gradio server to openai v1 by @merrymercy in #3032
- Update version to 0.2.36 by @merrymercy in #3033
New Contributors
- @teknium1 made their first contribution in #2938
- @ggcr made their first contribution in #2940
- @aliasaria made their first contribution in #2937
- @cauwulixuan made their first contribution in #2936
- @staoxiao made their first contribution in #2926
- @zhouzaida made their first contribution in #2968
- @dheeraj-326 made their first contribution in #2972
- @bjwswang made their first contribution in #2974
- @MonkeyLeeT made their first contribution in #3008
Full Changelog: v0.2.35...v0.2.36