layer normalization after Linear #1150

ftgreat · 2024-08-30T08:16:01Z

Is there any module for layer normalization after linear transformation? Thanks.

timmoon10 · 2024-08-30T17:46:19Z

We're not aware of any kernel fusions that would help this pattern, so there's no harm in putting a LayerNorm module after the linear:

linear = te.Linear(...)
norm = te.LayerNorm(...)
y = norm(linear(x))

Using the operation-based API (see #1033), this could be:

layer = te.ops.Sequential(
    te.ops.Linear(...),
    te.ops.LayerNorm(...),
)
y = layer(x)

If we do find some kernel fusions, we would probably implement it using the operation-based API instead of implementing a new module.

ftgreat · 2024-09-05T12:00:02Z

Hello, @timmoon10

I am working on a two-layer MLP setup within a tensor parallelism context (e.g., tp=2), which consists of a TEColumnParallelLinear followed by a TERowParallelLinear. I’m looking to perform a LayerNorm operation right after the output from TEColumnParallelLinear, and importantly, I want each tensor parallel split (tp slice) to have its own independent LayerNorm.

Could you provide any suggestions on whether there is an existing module that supports this, or how one might go about adding this functionality? Thank you for your assistance!

timmoon10 added the question Further information is requested label Aug 30, 2024

ftgreat closed this as completed Sep 3, 2024

ftgreat reopened this Sep 5, 2024

ftgreat closed this as completed Sep 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

layer normalization after Linear #1150

layer normalization after Linear #1150

ftgreat commented Aug 30, 2024

timmoon10 commented Aug 30, 2024 •

edited

Loading

ftgreat commented Sep 5, 2024

layer normalization after Linear #1150

layer normalization after Linear #1150

Comments

ftgreat commented Aug 30, 2024

timmoon10 commented Aug 30, 2024 • edited Loading

ftgreat commented Sep 5, 2024

timmoon10 commented Aug 30, 2024 •

edited

Loading