Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate if reducing registers by modifying code generation is possible #268

Open
denisalevi opened this issue Feb 15, 2022 · 1 comment

Comments

@denisalevi
Copy link
Member

The COBAHH example uses more than 32 registers per threads even in single precision, reducing the theoretical occupancy of the stateupdate kernel, see #266. I'm wondering if there is a way to easily reduce the register usage by modifying the way the code is generated. Currently, there are many intermediate variables produced (the lio variables in the generated code). I guess this is optimized for C++ performance, that means to generate code with as few operations as possible. Is there a way to instead optimize for as few intermediate results that have to remain in registers as possible? For the GPU it would be much more important to reach 100% theoretical occupancy than to reduce the number of arithmetic operations.

@denisalevi
Copy link
Member Author

denisalevi commented Apr 5, 2022

Try disabling loop invariant optimizations. They make sense for C++, where constants used for all indices of a loop are precomputed once in order to reduce computation time in the loop. Makes no sense for GPU, where each thread computes those constants. And this likely increases register usage.

See https://github.com/denisalevi/brian2cuda-paper/issues/21

@denisalevi denisalevi added the easy label Apr 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant