Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ForwardContext is None with gradient checkpointing enabled #732

Open
calpt opened this issue Aug 9, 2024 · 0 comments
Open

ForwardContext is None with gradient checkpointing enabled #732

calpt opened this issue Aug 9, 2024 · 0 comments
Assignees
Labels
bug Something isn't working

Comments

@calpt
Copy link
Member

calpt commented Aug 9, 2024

Environment info

  • adapters version: latest main

Information

Model I am using (Bert, XLNet ...): any

Language I am using the model on (English, Chinese ...): any

Adapter setup I am using (if any): Affects all adapter methods reliant on ForwardContext: Reft, Prefix-Tuning, Prompt Tuning, Fusion, Parallel composition

To reproduce

When enabling gradient checkpointing before adapter training, ie:

model.gradient_checkpointing_enable()

ForwardContext will not be correctly set during forward/ backward passes. This means all functionality depending on ForwardContext will not work together gradient checkpointing. This affects some adapter types (reft, prompt tuning, prefix tuning; these won't work with gradient checkpointing currently) but not others (lora, bottleneck), also affects composition such as fusion and parallel.

E.g. will throw this error:

AttributeError: 'NoneType' object has no attribute 'output_adapter_gating_scores'

Also see #677.

To reproduce, try training ReFT using the QLoRA Llama notebook and gradient checkpointing enabled.

@calpt calpt added the bug Something isn't working label Aug 9, 2024
@lenglaender lenglaender self-assigned this Oct 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants