-
Notifications
You must be signed in to change notification settings - Fork 309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
layer normalization after Linear #1150
Comments
We're not aware of any kernel fusions that would help this pattern, so there's no harm in putting a LayerNorm module after the linear: linear = te.Linear(...)
norm = te.LayerNorm(...)
y = norm(linear(x)) Using the operation-based API (see #1033), this could be:
If we do find some kernel fusions, we would probably implement it using the operation-based API instead of implementing a new module. |
Hello, @timmoon10 I am working on a two-layer MLP setup within a tensor parallelism context (e.g., tp=2), which consists of a TEColumnParallelLinear followed by a TERowParallelLinear. I’m looking to perform a LayerNorm operation right after the output from TEColumnParallelLinear, and importantly, I want each tensor parallel split (tp slice) to have its own independent LayerNorm. Could you provide any suggestions on whether there is an existing module that supports this, or how one might go about adding this functionality? Thank you for your assistance! |
Is there any module for layer normalization after linear transformation? Thanks.
The text was updated successfully, but these errors were encountered: