-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Benchmarks][Upstream PyTorch 2.5] Triton
and XeTLA
softmax performance degrades in comparison with torch 2.1
/ ipex 2.1
test proxies
#2106
Comments
ipex
in benchmarksTriton
and XeTLA
softmax performance degrades in comparison with torch 2.1
/ ipex 2.1
test proxies
@ESI-SYD what is the root cause for this issue? can you pin point it to a particular @anmyachev to proceed further with analysis / triaging please create a minimal reproducer for the Triton kernel path. |
There are two main differences in benchmark time method change after applying the Draft
|
At the moment, the degradation of absolute numbers has been fixed. The geometric mean difference is ~2% (between #1 and #2), which can be considered within the margin of error, I believe.
The new approach to measuring performance is less precise and is more influenced by the operations that are performed in the functions we benchmark, before and after the kernel is launched. This influence is stronger where the kernel execution time is very small. For example, for the first combinations of To sum up, for large dimensions the new benchmarking method is suitable and tells us that with upstream pytorch there is no degradation, however for small dimensions it cannot be used with reliability and we have to wait for a working solution kineto + intel gpu pti. |
Triton
/XeTLA
keep same except for attention caused byXeTLA
attention absolute number degradedTriton
andXeTLA
softmax cases degraded, soTriton
/XeTLA
not changed.details: #1905 (comment)
The text was updated successfully, but these errors were encountered: