0.0.50 Mixture of EasyDeL experts

erfanzar released this 08 Feb 11:40

· 759 commits to main since this release

What's Changed

Optimize mean loss and accuracy calculation by @yhavinga in #100
Mixtral Models are fully supported and they are PJIT-compatible
A Wider range of models now support FlashAttention on TPU
Qwen 1, Qwen 2, PHI 2, Robert is new Added Models which support FlashAttention on TPU and EasyBIT
LoRA support for the trainer is now Added (EasyDeLXRapTureConfig)
Adding EasyDel Serve Engine APIs
Adding Prompter (Beta and might be removed in future updates)
The Training Process is now 21 % Faster in 0.0.50 than 0.0.42.
Transform Functions are now Automated for all the models (Except MosaicMPT for this one you still have to use static methods)
The Trainer APIs have changed and now it's faster, more dynamic, and more hackable.
Default Version of the JAX now changed to 0.4.22 for FJFormer custom Pallas kernels usage.

New Contributors

@yhavinga made their first contribution at #100

Full Changelog: 0.0.42...0.0.50

Contributors

yhavinga

Assets 2