Release v1.5.0 - Gemma series models supported. · intel/xFasterTransformer

v1.5.0 - Gemma series models supported.

Functionality

Support Gemma series medels, including Gemma and CodeGemma, and DeepSeek model.
Llama Converter support convert quantized huggingface model by params from_quantized_model='gptq' into xFt format INT8/INT4 model files.
Support loading INT4 data weights directly from local files.
Optimize memory usage during QWen model conversion, particularly for QWen 72B.

Dependency

Bump transformers to 4.38.1 to support Gemma models.
Add protobuf to support new behavier in tokenzier.

Performance

Update xDNN to release v1.4.5
Add GPU kernel library gpuDNN v0.1 to support Intel Arc GPU series.
Optimize ROPE perfermance by reducing repeated sin and cos embedding table data.
Accelerate KVCache copy by increasing parallelism in self attention.
Accelerate addreduce operation in long sequence case by transposing KVCache and tuned comm.

BUG fix

Fix a incorrect computing which should be in float, but was in integer.
Fix timeline is disordered.
Fix runtime issue of Qwen when seq_length is bigger than 32768.

Generated release nots

What's Changed

[Kernel] Fix the incorrect computing which should be in float, but was in integer by @pujiang2018 in #267
[Layer] Reduce repeated sin and cos embedding table data to optimize ROPE perf. by @changqi1 in #266
[Kernel] increase parallelism for KV cache copy in self attention by @pujiang2018 in #268
[Include] Fix include not work. by @Duyi-Wang in #271
Issue qwen72b seq length by @a3213105 in #273
[Common] Unify memory allocation into xft::alloc by @pujiang2018 in #272
[Timeline] Fix disordered timeline. by @changqi1 in #277
[model] Add deepseek model. by @marvin-Yu in #274
[Bug] Fix incorrect context parameter order. by @changqi1 in #280
[CI] Check for UT status. by @marvin-Yu in #278
[CMake] Check existence of MKL & oneDNN directory before installation. by @Duyi-Wang in #283
Add KVCache trans for long sequence && tuned comm for faster Addreduce by @abenmao in #279
[Dependency] Add protobuf in requirements.txt by @Duyi-Wang in #284
[xDNN] Release v1.4.5. by @changqi1 in #285
[CI] Add rls test case. by @marvin-Yu in #286
[Bug] fix baichuan model test issue. by @marvin-Yu in #287
[Fix] Fix baichuan2-13 without rope. by @marvin-Yu in #289
[Tools] Add convert tool for Llama models quantized by AutoGPTQ by @xiangzez in #276
[Common] Support loading int4 weights by @xiangzez in #275
[KVCache] KV Cache refactor and related unit test case fix by @pujiang2018 in #290
[Model] Update isMaster func. by @changqi1 in #292
[Bug] Fix oneDNN GPU build issue. by @changqi1 in #293
[UT] add unit test for selfAttention, and a small fix by @pujiang2018 in #294
[gpuDNN] Add gpuDNN v0.1.0 library files. by @feng-intel in #291
[UT] MLP unit test case fix by @abenmao in #296
[Fix] Reduce convert memory usage. by @marvin-Yu in #297
[ENV] Use Meyers' Singleton Env object. by @Duyi-Wang in #295
[fix] fix compile issue. by @marvin-Yu in #299
[Example] Add gemma model config and web demo. by @marvin-Yu in #304
[Model] Add gemma model support. by @marvin-Yu in #259
[example] add gemma model support with example. by @marvin-Yu in #307
Bump transformers from 4.36.0 to 4.38.0 in /examples/web_demo by @dependabot in #308
Fix timeline compile issue by @xiangzez in #309
[Build] Fix build issues. by @changqi1 in #310
[Version] v1.5.0. by @Duyi-Wang in #311

New Contributors

@feng-intel made their first contribution in #291

Full Changelog: v1.4.0...v1.5.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.5.0 - Gemma series models supported.

Functionality

Dependency

Performance

BUG fix

What's Changed

New Contributors

Contributors