-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[nvptx-run] Add --verbose/-v #27
base: master
Are you sure you want to change the base?
Conversation
Consider test.c: ... int main (int argc, char **argv) { printf ("argc: %u\n", argc); return 0; } ... such that we have: ... $ nvptx-none-run a.out argc: 1 $ nvptx-none-run a.out bla argc: 2 ... Given that the usage indicates that the program seperates the nvptx options and the program arguments: ... $ nvptx-none-run --help Usage: nvptx-none-run [option...] program [argument...] ... I'd expect: ... $ nvptx-none-run a.out bla -V argc: 3 ... but instead we get: ... $ ./run.sh a.out bla -V nvtpx-none-run (nvptx-tools) 1.0 <COPYRIGHT> $ ... Fix this by calling getopt_long with optstring starting with '+'.
Add a --verbose flag to nvptx-run, such that we have: ... $ gcc ~/hello.c $ nvptx-none-run -v ./a.out Total device memory: 4242604032 (3.95 GiB) Initial free device memory: 4222156800 (3.93 GiB) Program args reservation (effective): 1048576 (1.00 MiB) Set stack size limit: 131072 (128.00 KiB) Stack size limit reservation (estimated): 1342177280 (1.25 GiB) Stack size limit reservation (effective): 1423966208 (1.32 GiB) Free device memory: 2797142016 (2.60 GiB) Set heap size limit: 268435456 (256.00 MiB) hello ...
Note: contains "[nvptx-run] Fix greedy option parsing" to avoid merge conflict. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vries, thanks. I have a few questions, please have a look.
|
||
size_t free_mem; | ||
size_t dummy; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should dummy
move inside the if (verbose)
?
r = cuCtxSetLimit(CU_LIMIT_STACK_SIZE, 0); | ||
fatal_unless_success (r, "could not set stack limit"); | ||
|
||
r = cuMemGetInfo (&free_mem, &dummy); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, doesn't dummy
here (when given a better name) make obsolete the earlier cuDeviceTotalMem
call?
cuMemGetInfo
: https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g808f555540d0143a331cc42aa98835c0cuDeviceTotalMem
: https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__DEVICE.html#group__CUDA__DEVICE_1gc6a0d6551335a3780f9f3c967a0fde5d
Or, is total amount of memory available for allocation by the CUDA context vs. total amount of memory available on the device intentional?
/* Set stack size limit to 0 to get more accurate free_mem. */ | ||
r = cuCtxSetLimit(CU_LIMIT_STACK_SIZE, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From cuCtxSetLimit
: https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1g0651954dfb9788173e60a9af7201e65a I can't easily tell the rationale here.
So, should we add more commentary for this, or point to an external URL if that makes sense?
size_t free_mem_update; | ||
r = cuMemGetInfo (&free_mem_update, &dummy); | ||
fatal_unless_success (r, "could not get free memory"); | ||
report_val (stderr, "Program args reservation (effective)", | ||
free_mem - free_mem_update); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this difference computation implicitly assume that nothing else is using the GPU concurrently? (Which is a wrong assumption?) Or, does every process/CUDA context always have available all the GPU memory -- I don't remember the details, and have not yet looked that up.
size_t free_mem_update; | ||
r = cuMemGetInfo (&free_mem_update, &dummy); | ||
fatal_unless_success (r, "could not get free memory"); | ||
report_val (stderr, "Stack size limit reservation (effective)", | ||
free_mem - free_mem_update); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same concern as above.
No description provided.