NVIDIA / TransformerEngine Public

Notifications You must be signed in to change notification settings
Fork 308
Star 1.8k

Code
Issues 145
Pull requests 39
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Issues: NVIDIA/TransformerEngine

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

145 Open 179 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

Passing nonexistent argument when flash_attn version is >= 2.5.7

#1213 opened Sep 27, 2024 by MaciejBalaNV

Punctuation and Capitalization Model not working

#1210 opened Sep 27, 2024 by ican24

No option to change FP8 status in graphed module after using "make_graphed_callables" bug

Something isn't working

#1207 opened Sep 26, 2024 by MaciejBalaNV

No "pool" argument in make_graphed_callables function

#1201 opened Sep 24, 2024 by MaciejBalaNV

[PyTorch] fused CUDNN attention kernel and sliding window attention

#1197 opened Sep 23, 2024 by Marks101

[PyTorch] fused CUDNN attention kernel not properly handling strides

#1195 opened Sep 23, 2024 by Marks101

FP8 for norm inputs and residuals?

#1193 opened Sep 21, 2024 by cbcase

[PyTorch] FP8 and activation checkpointing causes training instabilities

#1190 opened Sep 18, 2024 by Marks101

FSDP: How to do all-gather using FP8?

#1188 opened Sep 17, 2024 by vgoklani

[Question] fp8 amax_history setup

#1169 opened Sep 9, 2024 by tylaar

AssertionError: Outputs not close enough in tensor in test_numerics.py bug

Something isn't working

#1165 opened Sep 6, 2024 by sirutBuasai

AssertionError: Device compute capability 8.9 or higher required for FP8 execution.

#1159 opened Sep 5, 2024 by kamrul-NSL

installation guide for NVHPC SDK

#1156 opened Sep 4, 2024 by jinz2014

The question about flash attention and fused attention

#1153 opened Sep 3, 2024 by HenHenry-Z

Dose the FA3 commit of TE support bf16 or mixed precision？

#1144 opened Aug 28, 2024 by Desperadoze

fp8_model_init doesn't work with DDP

#1135 opened Aug 26, 2024 by MaciejBalaNV

RMSNorm precision different from HF implementation

#1132 opened Aug 23, 2024 by void-main

Transformer Engine using FlashAttention V3

#1125 opened Aug 21, 2024 by heavyrain-lzy

How to debug CUDNN_STATUS_EXECUTION_FAILED?

#1116 opened Aug 15, 2024 by vedantroy

TE 1.4 fused attn caused a NaN when backward

#1101 opened Aug 13, 2024 by Jack47

How to use var_len attention in context parallel?

#1080 opened Aug 6, 2024 by liuzhaowen1218

stuck at building wheel

#1077 opened Aug 5, 2024 by neurosynapse

When will comm-gemm-overlap support multi nodes?

#1071 opened Aug 2, 2024 by umiswing

[PyTorch] Bug in FP8 buffer update causing training instabilities

#1047 opened Jul 26, 2024 by Marks101

why context parallel do not support alibi?

#1046 opened Jul 26, 2024 by Monekyzoon

Previous 1 2 3 4 5 6 Next

Previous Next

ProTip! Type g i on any issue or pull request to go back to the issue listing page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly