v1.5.0 - Gemma series models supported.
v1.5.0 - Gemma series models supported.
Functionality
- Support Gemma series medels, including Gemma and CodeGemma, and DeepSeek model.
- Llama Converter support convert quantized huggingface model by params
from_quantized_model='gptq'
into xFt format INT8/INT4 model files. - Support loading INT4 data weights directly from local files.
- Optimize memory usage during QWen model conversion, particularly for QWen 72B.
Dependency
- Bump
transformers
to4.38.1
to support Gemma models. - Add
protobuf
to support new behavier intokenzier
.
Performance
- Update xDNN to release
v1.4.5
- Add GPU kernel library gpuDNN v0.1 to support Intel Arc GPU series.
- Optimize ROPE perfermance by reducing repeated sin and cos embedding table data.
- Accelerate KVCache copy by increasing parallelism in self attention.
- Accelerate addreduce operation in long sequence case by transposing KVCache and tuned comm.
BUG fix
- Fix a incorrect computing which should be in float, but was in integer.
- Fix timeline is disordered.
- Fix runtime issue of Qwen when seq_length is bigger than 32768.
Generated release nots
What's Changed
- [Kernel] Fix the incorrect computing which should be in float, but was in integer by @pujiang2018 in #267
- [Layer] Reduce repeated sin and cos embedding table data to optimize ROPE perf. by @changqi1 in #266
- [Kernel] increase parallelism for KV cache copy in self attention by @pujiang2018 in #268
- [Include] Fix include not work. by @Duyi-Wang in #271
- Issue qwen72b seq length by @a3213105 in #273
- [Common] Unify memory allocation into xft::alloc by @pujiang2018 in #272
- [Timeline] Fix disordered timeline. by @changqi1 in #277
- [model] Add deepseek model. by @marvin-Yu in #274
- [Bug] Fix incorrect context parameter order. by @changqi1 in #280
- [CI] Check for UT status. by @marvin-Yu in #278
- [CMake] Check existence of MKL & oneDNN directory before installation. by @Duyi-Wang in #283
- Add KVCache trans for long sequence && tuned comm for faster Addreduce by @abenmao in #279
- [Dependency] Add protobuf in requirements.txt by @Duyi-Wang in #284
- [xDNN] Release v1.4.5. by @changqi1 in #285
- [CI] Add rls test case. by @marvin-Yu in #286
- [Bug] fix baichuan model test issue. by @marvin-Yu in #287
- [Fix] Fix baichuan2-13 without rope. by @marvin-Yu in #289
- [Tools] Add convert tool for Llama models quantized by AutoGPTQ by @xiangzez in #276
- [Common] Support loading int4 weights by @xiangzez in #275
- [KVCache] KV Cache refactor and related unit test case fix by @pujiang2018 in #290
- [Model] Update isMaster func. by @changqi1 in #292
- [Bug] Fix oneDNN GPU build issue. by @changqi1 in #293
- [UT] add unit test for selfAttention, and a small fix by @pujiang2018 in #294
- [gpuDNN] Add gpuDNN v0.1.0 library files. by @feng-intel in #291
- [UT] MLP unit test case fix by @abenmao in #296
- [Fix] Reduce convert memory usage. by @marvin-Yu in #297
- [ENV] Use Meyers' Singleton Env object. by @Duyi-Wang in #295
- [fix] fix compile issue. by @marvin-Yu in #299
- [Example] Add gemma model config and web demo. by @marvin-Yu in #304
- [Model] Add gemma model support. by @marvin-Yu in #259
- [example] add gemma model support with example. by @marvin-Yu in #307
- Bump transformers from 4.36.0 to 4.38.0 in /examples/web_demo by @dependabot in #308
- Fix timeline compile issue by @xiangzez in #309
- [Build] Fix build issues. by @changqi1 in #310
- [Version] v1.5.0. by @Duyi-Wang in #311
New Contributors
- @feng-intel made their first contribution in #291
Full Changelog: v1.4.0...v1.5.0