Replies: 1 comment
-
You just need to wrap the transformer layer with custom nn.Module and set the engine has_attention_mask. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi, there. I'm working on using pipeline parallelism in DeepSpeed with models in transformers (like llama or mistral). I found that the
pipeline engine
in DeepSpeed only supports send/receive tensor between layers. While the output of transformers model's layer is a tuple of tensor.How can I use pipeline parallelism with huggingface models? Is there any elegant way to apply patches?
Thanks for any suggestions.
Beta Was this translation helpful? Give feedback.
All reactions