0.0.50 Mixture of EasyDeL experts
What's Changed
- Optimize mean loss and accuracy calculation by @yhavinga in #100
- Mixtral Models are fully supported and they are
PJIT-compatible
- A Wider range of models now support FlashAttention on TPU
- Qwen 1, Qwen 2, PHI 2, Robert is new Added Models which support FlashAttention on TPU and
EasyBIT
- LoRA support for the trainer is now Added (
EasyDeLXRapTureConfig
) - Adding EasyDel Serve Engine APIs
- Adding Prompter (Beta and might be removed in future updates)
- The Training Process is now 21 % Faster in
0.0.50
than0.0.42
. - Transform Functions are now Automated for all the models (Except
MosaicMPT
for this one you still have to use static methods) - The Trainer APIs have changed and now it's faster, more dynamic, and more hackable.
- Default Version of the JAX now changed to 0.4.22 for
FJFormer
custom Pallas kernels usage.
New Contributors
Full Changelog: 0.0.42...0.0.50