Switch to beartype #325

dkamm · 2023-06-16T01:32:53Z

Description

Switching to beartype from typeguard due to incompatibility with jaxtyping on the latest version

Notes:

typeguard is still in the lockfile because jaxtyping requires it
had to use forward references in these spots to get beartype to work
- devices.py - this is a circular import solely due to typehints
- utils.py - this is a true circular import that we might want to remove
- FactoredMatrix.py - had to use it for methods returning Union["FactoredMatrix", ...]

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

Screenshots

N/A

Checklist:

I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I have not rewritten tests relating to key interfaces which would affect backward compatibility

dkamm · 2023-06-16T21:16:34Z

@jbloomAus can you rerun the CI? think it failed due to an unrelated issue with connecting to huggingface

dkamm · 2023-06-17T02:40:02Z

@jbloomAus thanks! this is ready for review. Feel free to change the composition_scores typehint as I'm not sure how it works

jbloomAus · 2023-06-19T11:41:53Z

Thanks David, will review in the next couple of days :)

jbloomAus · 2023-06-20T02:06:01Z

@dkamm, looks great! I just want to double check/compare equivalent error messages and also have a summary here of why we're switching IRC, it's because we had testing issues and then install issues with typeguard? Feels like a long story but you have most context. Once that's written, I'll share with Neel just to be sure he's cool with it.

dkamm · 2023-06-28T23:34:33Z

@jbloomAus

We're switching because there have been issues getting the newer versions of typeguard to work on this codebase for various reasons.

Typeguard 3 - have to add @typeguard_ignore to class properties, doesn't work with the jaxtyping pytest plugin
Typeguard 4 - incompatible with jaxtyping due to how it treats annotations as forward references in its ast transformation

In contrast, the latest version of beartype works with the small number changes described above.

None of these changes preclude switching to typeguard 4 when that issue is resolved, but it's not a trivial issue to solve. I figured this solution is the best option right now.

jbloomAus · 2023-06-28T23:57:13Z

Thanks @dkamm!

@alan-cooney This seems like a good thing to me, are you able to verify/double check please?

alan-cooney

Thanks for this PR - nice work!

Just to check - have you checked that providing an incorrect jaxtyping type causes beartype to throw an error (when running pytest)?

Otherwise just one question to confirm in the review - where I think I'm missing something.

transformer_lens/FactoredMatrix.py

alan-cooney

One small thing to check - otherwise good to go.

Thanks again for this!
Alan

transformer_lens/FactoredMatrix.py

transformer_lens/utils.py

jbloomAus · 2023-07-24T07:08:41Z

@dkamm, just pinging you to see if you've had a chance to action @alan-cooney's suggestions.

dkamm · 2023-07-26T01:43:04Z

Just added it! Sorry I missed it earlier

alan-cooney

Sorry - it's out of sync actually with the latest changes to master. Are you able to update and then ping me and I'll get it merged right away?

dkamm · 2023-08-01T06:23:34Z

transformer_lens/components.py

 self,
- q: Float[torch.Tensor, "batch pos head_index d_head"],
- k: Float[torch.Tensor, "batch pos head_index d_head"],
+ q: Float[torch.Tensor, "batch q_pos head_index d_head"],
+ k: Float[torch.Tensor, "batch k_pos head_index d_head"],
 past_kv_pos_offset,
 ) -> Tuple[
- Float[torch.Tensor, "batch pos head_index d_head"],
- Float[torch.Tensor, "batch pos head_index d_head"],
+ Float[torch.Tensor, "batch q_pos head_index d_head"],
+ Float[torch.Tensor, "batch k_pos head_index d_head"],
 ]:


@alan-cooney @jbloomAus I had to change this typehint to get one of the tests to pass (test_hooked_transformer::test_dtypes). It failed because this method got passed q, k with different pos dimensions and jaxtyping checks for that as configured. I'm not familiar enough with this method and its usage to say whether q, k should be allowed to have different pos dimensions.

Are you sure it's actually getting passed different sized queries and keys, or are the defined type signatures at this point just different?

I think this may be an issue with Beartype getting confused about the size of one of these tensors at a previous point (perhaps in the if past_kv_cache_entry is not None: section)?

I'm not very familiar with rotary embeddings either but as far as I can tell the q and k passed in here should be the same size (and they should also be the same size afterwards as they're immediately dot producted).

Are you sure it's actually getting passed different sized queries and keys, or are the defined type signatures at this point just different?

Yep, I confirmed that they were different sizes.

I think this may be an issue with Beartype getting confused about the size of one of these tensors at a previous point (perhaps in the if past_kv_cache_entry is not None: section)?

Hmm, I don't think this is the case. Beartype/Jaxtyping should only look at the arguments passed to the call in this case

Here's a minimal script to reproduce (you'll have to change the typehints back):

from jaxtyping import install_import_hook with install_import_hook("transformer_lens", "beartype.beartype"): from transformer_lens import HookedTransformer model = HookedTransformer.from_pretrained("EleutherAI/pythia-70m") _ = model.generate("Hello, World!")

Actually, reading the code more, I think my change is correct and we should allow different sizes in the pos dim. The reason is that when we're generating completion, after we feed through the initial input, we're using only the latest token as the query vec and using the kv cache for the key vecs to compute attention scores. It's ok for q and k to have different sizes in the pos dim because they are getting dotted over the d_head dim. Finally, the past_kv_pos_offset is just so we get the right positional embedding for q.

Makes sense thanks!

dkamm · 2023-08-01T06:28:00Z

@alan-cooney - I've updated the branch with latest changes from main. There's one new comment that requires review

alan-cooney · 2023-08-03T10:08:15Z

Thanks for the PR - merged now!

dkamm added 9 commits June 13, 2023 16:26

Use beartype

7c331b4

Updates to get tests to pass

adfdaec

Remove typeguard imports

5dfa779

Update poetry.lock

eeb8b97

Remove unused import

4e17d79

Fix import

224e8fe

Format

fee6af3

Fix typehint

07a1cf0

Sort imports

e41b4a0

jbloomAus added the seen_by_maintainers Confirms that a maintainer is aware of this card. label Jun 20, 2023

alan-cooney reviewed Jun 29, 2023

View reviewed changes

transformer_lens/FactoredMatrix.py Show resolved Hide resolved

alan-cooney requested changes Jul 5, 2023

View reviewed changes

transformer_lens/FactoredMatrix.py Show resolved Hide resolved

transformer_lens/utils.py Show resolved Hide resolved

This comment was marked as outdated.

Sign in to view

alan-cooney requested changes Jul 27, 2023

View reviewed changes

dkamm added 3 commits July 31, 2023 01:32

Merge branch 'main' into beartype

53ec807

Update test

50bbb2c

Change typehint

c0e442b

dkamm commented Aug 1, 2023

View reviewed changes

alan-cooney approved these changes Aug 3, 2023

View reviewed changes

alan-cooney merged commit 10d2f8a into TransformerLensOrg:main Aug 3, 2023
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch to beartype #325

Switch to beartype #325

dkamm commented Jun 16, 2023 •

edited

Loading

dkamm commented Jun 16, 2023

dkamm commented Jun 17, 2023

jbloomAus commented Jun 19, 2023

jbloomAus commented Jun 20, 2023

dkamm commented Jun 28, 2023

jbloomAus commented Jun 28, 2023

alan-cooney left a comment

alan-cooney left a comment

jbloomAus commented Jul 24, 2023

dkamm commented Jul 26, 2023

This comment was marked as outdated.

alan-cooney left a comment

dkamm Aug 1, 2023

alan-cooney Aug 1, 2023

dkamm Aug 2, 2023

alan-cooney Aug 3, 2023

dkamm commented Aug 1, 2023

alan-cooney commented Aug 3, 2023

Switch to beartype #325

Switch to beartype #325

Conversation

dkamm commented Jun 16, 2023 • edited Loading

Description

Type of change

Screenshots

Checklist:

dkamm commented Jun 16, 2023

dkamm commented Jun 17, 2023

jbloomAus commented Jun 19, 2023

jbloomAus commented Jun 20, 2023

dkamm commented Jun 28, 2023

jbloomAus commented Jun 28, 2023

alan-cooney left a comment

Choose a reason for hiding this comment

alan-cooney left a comment

Choose a reason for hiding this comment

jbloomAus commented Jul 24, 2023

dkamm commented Jul 26, 2023

This comment was marked as outdated.

alan-cooney left a comment

Choose a reason for hiding this comment

dkamm Aug 1, 2023

Choose a reason for hiding this comment

alan-cooney Aug 1, 2023

Choose a reason for hiding this comment

dkamm Aug 2, 2023

Choose a reason for hiding this comment

alan-cooney Aug 3, 2023

Choose a reason for hiding this comment

dkamm commented Aug 1, 2023

alan-cooney commented Aug 3, 2023

dkamm commented Jun 16, 2023 •

edited

Loading