Conflict between Lightning and Huggingface Transformers (device_map). #17878
richarddwang
started this conversation in
General
Replies: 3 comments 1 reply
-
how can i use this function? |
Beta Was this translation helpful? Give feedback.
1 reply
-
Question, how can i make this one works with peft model as:
as for now getting this error:
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hugginface will add
AlignDeviceHook
under certain cases, intercept nn.modules' forward, and send input tensors to some devices that is automatically decided by their logics. It, in many cases, will conflict withdevices
we set forL.Trainer
.For example,
When
e.g.
e.g.
In these cases huggingface secretly send input tensors to cuda:0 or cuda:1, which conflicts with
L.trainer(devices=[2])
that send model weights to cuda:2. All of this results in error of that tensors are at differnet devices.This conflict make lightning hard to train hugginface transformers in many cases, currently I manually remove huggingface's hook that send tensors to differnet device by
wonder if there is better way for use lightning and huggingface together...
Beta Was this translation helpful? Give feedback.
All reactions