Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change default benchmark mode to upstream PyTorch #2298

Merged
merged 1 commit into from
Sep 23, 2024

Conversation

anmyachev
Copy link
Contributor

@anmyachev anmyachev commented Sep 19, 2024

Current state (https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/10950264922 vs https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/10949253321):

  • Softmax: triton geomean diff: 13%, xetla geomean diff: 8%, ratio geomean diff: 5%
    • The performance deteriorates most on small data. For example, for N=256, the average value of milliseconds, from which teraflops are calculated, changes as follows: 0.0055 vs 0.0116. The difference is about 5-6 microseconds. This is approximately the time spent on the host to run the kernel and cannot be avoided. An option may be to increase the data volume, which will reduce the impact of associated time losses.
  • FA advanced: triton geomean diff: 2%, xetla geomean diff: 3%, ratio geomean diff: 2%. Correct numbers are: triton geomean diff: -2.3%, xetla geomean diff: -4%, ratio (triton/xetla) geomean diff: 1.7%
  • GEMM advanced: triton geomean diff: 2%, xetla geomean diff: 2%, ratio geomean diff: 4%. Correct numbers are: triton geomean diff: -4.1%, xetla geomean diff: -2.7%, ratio (triton/xetla) geomean diff: -1.4%
  • Float conversion microbenchmark: BF16 geomean diff: << 1%, FP16 geomean diff: << 1%

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
@anmyachev
Copy link
Contributor Author

Let's move on. This change can be easily rolled back if necessary.

@anmyachev anmyachev merged commit 782aecf into main Sep 23, 2024
4 checks passed
@anmyachev anmyachev deleted the amyachev/change-default-bench-mode branch September 23, 2024 16:26
@whitneywhtsang
Copy link
Contributor

whitneywhtsang commented Sep 24, 2024

@anmyachev Should the ratio diff be recalculated for FA as well? FA advanced: triton geomean diff: 2%, xetla geomean diff: 3%, ratio geomean diff: 2%.
Do you know why the degradation of Triton is much higher than XeTLA for Softmax?

@anmyachev
Copy link
Contributor Author

anmyachev commented Sep 24, 2024

Do you know why the degradation of Triton is much higher than XeTLA for Softmax?

Most likely due to different time spent on the host, in the case of Triton there is more Python code, which is slower than C++.

@whitneywhtsang since GEMM on Triton it became worse than planned, I suppose it is worth rolling back this change? Or is this an acceptable change?

UPD: about overhead in Triton: triton-lang/triton#3166

@etiotto
Copy link
Contributor

etiotto commented Sep 25, 2024

Do you know why the degradation of Triton is much higher than XeTLA for Softmax?

Most likely due to different time spent on the host, in the case of Triton there is more Python code, which is slower than C++.

@whitneywhtsang since GEMM on Triton it became worse than planned, I suppose it is worth rolling back this change? Or is this an acceptable change?

UPD: about overhead in Triton: triton-lang/triton#3166

We should report the performance number with IPEX on by default because without IPEX the timing taken by upstream PyTorch is not precise (not just the kernel time). IMHO we should revert this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants