-
Notifications
You must be signed in to change notification settings - Fork 667
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
flip memory coalescing for last dim case #10310
Conversation
Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally. |
View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/10310/ |
Speed stats:
|
CI failed when running job: cuda-module. PR label automerge has been removed |
CI failed when running job: cuda-module. PR label automerge has been removed |
CI failed when running job: cuda-module. PR label automerge has been removed |
CI failed when running job: cuda-module. PR label automerge has been removed |
CI failed when running job: cuda-module. PR label automerge has been removed |
CI failed when running job: cuda-module. PR label automerge has been removed |
4 similar comments
CI failed when running job: cuda-module. PR label automerge has been removed |
CI failed when running job: cuda-module. PR label automerge has been removed |
CI failed when running job: cuda-module. PR label automerge has been removed |
CI failed when running job: cuda-module. PR label automerge has been removed |
…nc/oneflow into optimize_flip_last_dim
View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/10310/ |
Speed stats:
|
View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/10310/ |
Speed stats:
|
Speed stats:
|
Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally. |
CI failed when running job: cpu-module. PR label automerge has been removed |
Speed stats:
|
CI failed when running job: cuda-module. PR label automerge has been removed |
Speed stats:
|
View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/10310/ |
Speed stats:
|
针对 dim = -1 时候访存无法合并的情况做了优化。
实现方式是先 flip 后写到 shared memory,然后统一从 shm 中顺序写到 global memory 中,此时可以合并访存。
对比:
torch kernel:
oneflow kernel(优化后):
oneflow kernel(优化前):