Skip to content

Releases: TransformerLensOrg/TransformerLens

v2.9.1

19 Nov 14:34
3267a43
Compare
Choose a tag to compare

Minor dependency change to address a change in an outside dependency

What's Changed

Full Changelog: v2.9.0...v2.9.1

v2.9.0

16 Nov 00:28
dc19c08
Compare
Choose a tag to compare

Lot's of accuracy improvements! A number of models are behaving closer to how they behave in Transformers, and a new internal configuration has been added to allow for more ease of use!

What's Changed

  • fix the bug that attention_mask and past_kv_cache cannot work together by @yzhhr in #772
  • Set prepend_bos to false by default for Bloom model family by @degenfabian in #775
  • Fix that if use_past_kv_cache is set to True models from the Bloom family produce weird outputs. by @degenfabian in #777

New Contributors

Full Changelog: v2.8.1...v2.9.0

v2.8.1

26 Oct 21:12
8f482fc
Compare
Choose a tag to compare

New notebook for comparing models, and bug fix with dealing with newer LLaMA models!

What's Changed

  • Logit comparator tool by @curt-tigges in #765
  • Add support for NTK-by-Part Rotary Embedding & set correct rotary base for Llama-3.1 series by @Hzfinfdu in #764

New Contributors

Full Changelog: v2.8.0...v2.8.1

v2.8.0

22 Oct 00:32
b6e19d6
Compare
Choose a tag to compare

What's Changed

  • add transformer diagram by @akozlo in #749
  • Demo colab compatibility by @bryce13950 in #752
  • Add support for Mistral-Nemo-Base-2407 model by @ryanhoangt in #751
  • Fix the bug that tokenize_and_concatenate function not working for small dataset by @xy-z-code in #725
  • added new block for recent diagram, and colab compatibility notebook by @bryce13950 in #758
  • Add warning and halt execution for incorrect T5 model usage by @vatsalrathod16 in #757
  • New issue template for reporting model compatibility by @bryce13950 in #759
  • Add configurations for Llama 3.1 models(Llama-3.1-8B and Llama-3.1-70B) by @vatsalrathod16 in #761

New Contributors

Full Changelog: v2.7.1...v2.8.0

v2.7.1

04 Oct 23:12
1d8b1d8
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v2.7.0...v2.7.1

v2.7.0

26 Sep 23:56
395b237
Compare
Choose a tag to compare

Model 3.2 support! There is also a new compatibility added to the function test_promt to allow for multiple prompts, as well as a minor typo.

What's Changed

Full Changelog: v2.6.0...v2.7.0

v2.6.0

13 Sep 13:29
e64888d
Compare
Choose a tag to compare

Another nice little feature update! You now have the ability to ungroup the grouped query attention head component through a new config parameter ungroup_grouped_query_attention!

What's Changed

Full Changelog: v2.5.0...v2.6.0

v2.5.0

10 Sep 17:04
be334fb
Compare
Choose a tag to compare

Nice little release! This release adds a new parameter named first_n_layers that will allow you to specify how many layers of a model you want to load.

What's Changed

Full Changelog: v2.4.1...v2.5.0

v2.4.1

05 Sep 17:26
dd8c1e0
Compare
Choose a tag to compare

Little update to the code usage, but huge update for memory consumption! TransformerLens now needs almost half the memory it needed previously to boot thanks to a change with how the TransformerLens models are loaded.

What's Changed

  • removed einsum causing error when use_atten_result is enabled by @oliveradk in #660
  • revised loading to recycle state dict by @bryce13950 in #706

New Contributors

Full Changelog: v2.4.0...v2.4.1

v2.4.0

14 Aug 01:11
cb5017a
Compare
Choose a tag to compare

Nice little update! This gives users a little bit more control over attention masks, as well as adds a new demo.

What's Changed

New Contributors

Full Changelog: v2.3.1...v2.4.0