Skip to content

Actions: NVIDIA/TransformerEngine

Deploy nightly docs

Actions

Loading...
Loading

Show workflow options

Create status badge

Loading
197 workflow run results
197 workflow run results

Filter by Event

Filter by Status

Filter by Branch

Filter by Actor

Use fused implementation of RoPE in MultiHeadAttention (#658)
Deploy nightly docs #350: Commit 8d62d5c pushed by ksivaman
February 15, 2024 19:06 1m 43s main
February 15, 2024 19:06 1m 43s
[PyTorch] Add Float8Tensor option to avoid updating transpose cache w…
Deploy nightly docs #349: Commit 1e78094 pushed by timmoon10
February 15, 2024 18:53 1m 34s main
February 15, 2024 18:53 1m 34s
Use arguments instead of env vars for TP comm overlap (#649)
Deploy nightly docs #348: Commit bdf1afe pushed by timmoon10
February 14, 2024 21:45 1m 22s main
February 14, 2024 21:45 1m 22s
Support GEMM-GELU fusion with split AG overlap (#661)
Deploy nightly docs #347: Commit a174985 pushed by timmoon10
February 12, 2024 22:55 1m 44s main
February 12, 2024 22:55 1m 44s
Implement fused kernel for FP8 scale update (#593)
Deploy nightly docs #346: Commit a950061 pushed by timmoon10
February 8, 2024 22:03 1m 35s main
February 8, 2024 22:03 1m 35s
Update example to use new TE_DType path (#660)
Deploy nightly docs #345: Commit 379c1ee pushed by ptrendx
February 8, 2024 21:22 1m 45s main
February 8, 2024 21:22 1m 45s
[PyTorch] Fix pipeline parallel execution by using cloned scale inver…
Deploy nightly docs #344: Commit 91d52ac pushed by ksivaman
February 8, 2024 20:51 1m 43s main
February 8, 2024 20:51 1m 43s
[common] Added new unfused softmax cuda kernel to support causal atte…
Deploy nightly docs #343: Commit d9eb199 pushed by timmoon10
February 8, 2024 18:32 1m 55s main
February 8, 2024 18:32 1m 55s
[C++/PyTorch] Add alibi_slopes support (#608)
Deploy nightly docs #342: Commit 94de051 pushed by cyanguwa
February 8, 2024 18:02 1m 34s main
February 8, 2024 18:02 1m 34s
[PyTorch] Refactor caching of cumulative sequence lengths (#630)
Deploy nightly docs #341: Commit da30634 pushed by timmoon10
February 6, 2024 04:06 1m 32s main
February 6, 2024 04:06 1m 32s
[common][pyTorch]Add zero_centered_gamma option to RMSNorm (#631)
Deploy nightly docs #340: Commit d68028c pushed by ksivaman
February 3, 2024 06:42 1m 8s main
February 3, 2024 06:42 1m 8s
Recomputation fixes with native fp8 (#646)
Deploy nightly docs #339: Commit 5b155fb pushed by ksivaman
February 3, 2024 04:36 1m 14s main
February 3, 2024 04:36 1m 14s
Update cudnn-frontend to 1.0.3 to fix cuDNN v9 SDPA NaNs (#650)
Deploy nightly docs #338: Commit 2aee059 pushed by ksivaman
February 3, 2024 04:36 2m 2s main
February 3, 2024 04:36 2m 2s
[JAX] Support SP + RoPE + GeLU (#602)
Deploy nightly docs #337: Commit ce163f9 pushed by denera
February 2, 2024 18:31 1m 49s main
February 2, 2024 18:31 1m 49s
[JAX] Fix unfused GQA performance (#643)
Deploy nightly docs #336: Commit 29b0c9c pushed by denera
February 1, 2024 17:17 1m 20s main
February 1, 2024 17:17 1m 20s
Update FindCUDNN.cmake for cuDNN 9 (#640)
Deploy nightly docs #335: Commit e2803b1 pushed by ksivaman
January 31, 2024 16:20 1m 54s main
January 31, 2024 16:20 1m 54s
Fused rope compute in fp32 (#645)
Deploy nightly docs #334: Commit 70bd26e pushed by ksivaman
January 31, 2024 16:19 1m 20s main
January 31, 2024 16:19 1m 20s
[PyTorch] Do not allocate FP8 workspace buffers when params are FP8 (…
Deploy nightly docs #333: Commit 8641ab7 pushed by ksivaman
January 31, 2024 16:18 1m 39s main
January 31, 2024 16:18 1m 39s
Update readme about integration with Sagemaker model parallel library…
Deploy nightly docs #332: Commit b5e13a1 pushed by ptrendx
January 30, 2024 20:39 3m 9s main
January 30, 2024 20:39 3m 9s
[Paddle] Replace paddle.fluid imports with paddle.base (#633)
Deploy nightly docs #331: Commit 8d3b62d pushed by timmoon10
January 30, 2024 18:30 1m 13s main
January 30, 2024 18:30 1m 13s
Add user to TE CI (#639)
Deploy nightly docs #330: Commit ef4b1d1 pushed by timmoon10
January 30, 2024 18:24 1m 25s main
January 30, 2024 18:24 1m 25s
Fixed offloading for PyT version/ Added Attention activation offloadi…
Deploy nightly docs #329: Commit 44574de pushed by ptrendx
January 30, 2024 00:00 1m 37s main
January 30, 2024 00:00 1m 37s
[JAX] Custom Op Workspace Tensors from XLA Buffers (#532)
Deploy nightly docs #328: Commit 4077ccc pushed by denera
January 29, 2024 18:03 1m 15s main
January 29, 2024 18:03 1m 15s
[Paddle] Support GQA (#595)
Deploy nightly docs #327: Commit bd7fd0a pushed by timmoon10
January 26, 2024 23:29 1m 10s main
January 26, 2024 23:29 1m 10s
[PyTorch] Fix MultiheadAttention docstring (#634)
Deploy nightly docs #326: Commit e531cd2 pushed by timmoon10
January 26, 2024 21:59 1m 39s main
January 26, 2024 21:59 1m 39s