Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance of small sums could be improved #921

Open
seberg opened this issue Mar 29, 2023 · 1 comment
Open

Performance of small sums could be improved #921

seberg opened this issue Mar 29, 2023 · 1 comment

Comments

@seberg
Copy link

seberg commented Mar 29, 2023

I don't have a concrete high priority issue that needs solving, but it may be surprising to users that avoiding the cub segmented sum is much faster here.

The following cupy code uses CUB by default on newer versions (ensure with CUPY_ACCELERATORS=cub):

import cupy as cp
x = cp.ones((1000000, 2))
from cupyx.profiler import benchmark

# sum over the last axes which has only two elements:
benchmark(lambda: x.sum(-1), n_repeat=100)
# GPU time spend: 1927.055 us

# Manually do the sum:
benchmark(lambda: x[..., 0] + x[..., 1], n_repeat=100)
# GPU time spend: 56.361 us

Which means a factor of 35 slower than what would be close to optimal.

Now, as a NumPy dev, I accept that NumPy is also still bad at this: by about a factor of 10! CuPy without CUB was good at it, though.

But, maybe there is an easy win here that would remove the surprise of having to rewrite the code.

@gevtushenko
Copy link
Collaborator

The following PR partially addresses the issue. In the offline discussion, we concluded that providing an overload that takes a single segment size would be preferable. This overload would significantly reduce temporary storage size and improve performance.

@jarmak-nv jarmak-nv transferred this issue from NVIDIA/cub Nov 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

2 participants