Releases · TransformerLensOrg/TransformerLens

Lot's of accuracy improvements! A number of models are behaving closer to how they behave in Transformers, and a new internal configuration has been added to allow for more ease of use!

What's Changed

fix the bug that attention_mask and past_kv_cache cannot work together by @yzhhr in #772
Set prepend_bos to false by default for Bloom model family by @degenfabian in #775
Fix that if use_past_kv_cache is set to True models from the Bloom family produce weird outputs. by @degenfabian in #777

New Contributors

@yzhhr made their first contribution in #772
@degenfabian made their first contribution in #775

Full Changelog: v2.8.1...v2.9.0

Contributors

yzhhr and degenfabian

Assets 2

26 Oct 21:12

bryce13950

v2.8.1

8f482fc

v2.8.1

New notebook for comparing models, and bug fix with dealing with newer LLaMA models!

What's Changed

Logit comparator tool by @curt-tigges in #765
Add support for NTK-by-Part Rotary Embedding & set correct rotary base for Llama-3.1 series by @Hzfinfdu in #764

New Contributors

@Hzfinfdu made their first contribution in #764

Full Changelog: v2.8.0...v2.8.1

Contributors

curt-tigges and Hzfinfdu

Assets 2

22 Oct 00:32

bryce13950

v2.8.0

b6e19d6

v2.8.0

What's Changed

add transformer diagram by @akozlo in #749
Demo colab compatibility by @bryce13950 in #752
Add support for Mistral-Nemo-Base-2407 model by @ryanhoangt in #751
Fix the bug that tokenize_and_concatenate function not working for small dataset by @xy-z-code in #725
added new block for recent diagram, and colab compatibility notebook by @bryce13950 in #758
Add warning and halt execution for incorrect T5 model usage by @vatsalrathod16 in #757
New issue template for reporting model compatibility by @bryce13950 in #759
Add configurations for Llama 3.1 models(Llama-3.1-8B and Llama-3.1-70B) by @vatsalrathod16 in #761

New Contributors

@akozlo made their first contribution in #749
@ryanhoangt made their first contribution in #751
@xy-z-code made their first contribution in #725
@vatsalrathod16 made their first contribution in #757

Full Changelog: v2.7.1...v2.8.0

Contributors

bryce13950, akozlo, and 3 other contributors

Assets 2

04 Oct 23:12

bryce13950

v2.7.1

1d8b1d8

v2.7.1

What's Changed

Updated broken Slack link by @neelnanda-io in #742
from_pretrained has correct return type (i.e. HookedSAETransformer.from_pretrained returns HookedSAETransformer) by @callummcdougall in #743
Avoid warning in utils.download_file_from_hf by @albertsgarde in #739

New Contributors

@albertsgarde made their first contribution in #739

Full Changelog: v2.7.0...v2.7.1

Contributors

albertsgarde, callummcdougall, and neelnanda-io

Assets 2

26 Sep 23:56

bryce13950

v2.7.0

395b237

v2.7.0

Model 3.2 support! There is also a new compatibility added to the function test_promt to allow for multiple prompts, as well as a minor typo.

What's Changed

Typo hooked encoder by @bryce13950 in #732
utils.test_prompt compares multiple prompts by @callummcdougall in #733
Model llama 3.2 by @bryce13950 in #734

Full Changelog: v2.6.0...v2.7.0

Contributors

bryce13950 and callummcdougall

Assets 2

13 Sep 13:29

bryce13950

v2.6.0

e64888d

v2.6.0

Another nice little feature update! You now have the ability to ungroup the grouped query attention head component through a new config parameter ungroup_grouped_query_attention!

What's Changed

Ungrouping GQA by @hannamw & @FlyingPumba in #713

Full Changelog: v2.5.0...v2.6.0

Contributors

FlyingPumba and hannamw

Assets 2

10 Sep 17:04

bryce13950

v2.5.0

be334fb

v2.5.0

Nice little release! This release adds a new parameter named first_n_layers that will allow you to specify how many layers of a model you want to load.

What's Changed

Fix typo in bug issue template by @JasonGross in #715
HookedTransformerConfig docs string: weight_init_mode => init_mode by @JasonGross in #716
Allow loading only first n layers. by @joelburget in #717

Full Changelog: v2.4.1...v2.5.0

Contributors

joelburget and JasonGross

Assets 2

05 Sep 17:26

bryce13950

v2.4.1

dd8c1e0

v2.4.1

Little update to the code usage, but huge update for memory consumption! TransformerLens now needs almost half the memory it needed previously to boot thanks to a change with how the TransformerLens models are loaded.

What's Changed

removed einsum causing error when use_atten_result is enabled by @oliveradk in #660
revised loading to recycle state dict by @bryce13950 in #706

New Contributors

@oliveradk made their first contribution in #660

Full Changelog: v2.4.0...v2.4.1

Contributors

bryce13950 and oliveradk

Assets 2

14 Aug 01:11

bryce13950

v2.4.0

cb5017a

v2.4.0

Nice little update! This gives users a little bit more control over attention masks, as well as adds a new demo.

What's Changed

Improve attention masking by @UFO-101 in #699
add a demo for Patchscopes and Generation with Patching by @HenryCai11 in #692

New Contributors

@HenryCai11 made their first contribution in #692

Full Changelog: v2.3.1...v2.4.0

Contributors

HenryCai11 and UFO-101

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

Releases: TransformerLensOrg/TransformerLens

v2.9.1

What's Changed

Contributors

v2.9.0

What's Changed

New Contributors

Contributors

v2.8.1

What's Changed

New Contributors

Contributors

v2.8.0

What's Changed

New Contributors

Contributors

v2.7.1

What's Changed

New Contributors

Contributors

v2.7.0

What's Changed

Contributors

v2.6.0

What's Changed

Contributors

v2.5.0

What's Changed

Contributors

v2.4.1

What's Changed

New Contributors

Contributors

v2.4.0

What's Changed

New Contributors

Contributors