[nvptx-run] Add --verbose/-v #27

vries · 2020-10-13T14:35:53Z

No description provided.

Consider test.c: ... int main (int argc, char **argv) { printf ("argc: %u\n", argc); return 0; } ... such that we have: ... $ nvptx-none-run a.out argc: 1 $ nvptx-none-run a.out bla argc: 2 ... Given that the usage indicates that the program seperates the nvptx options and the program arguments: ... $ nvptx-none-run --help Usage: nvptx-none-run [option...] program [argument...] ... I'd expect: ... $ nvptx-none-run a.out bla -V argc: 3 ... but instead we get: ... $ ./run.sh a.out bla -V nvtpx-none-run (nvptx-tools) 1.0 <COPYRIGHT> $ ... Fix this by calling getopt_long with optstring starting with '+'.

Add a --verbose flag to nvptx-run, such that we have: ... $ gcc ~/hello.c $ nvptx-none-run -v ./a.out Total device memory: 4242604032 (3.95 GiB) Initial free device memory: 4222156800 (3.93 GiB) Program args reservation (effective): 1048576 (1.00 MiB) Set stack size limit: 131072 (128.00 KiB) Stack size limit reservation (estimated): 1342177280 (1.25 GiB) Stack size limit reservation (effective): 1423966208 (1.32 GiB) Free device memory: 2797142016 (2.60 GiB) Set heap size limit: 268435456 (256.00 MiB) hello ...

vries · 2020-10-13T14:37:40Z

Note: contains "[nvptx-run] Fix greedy option parsing" to avoid merge conflict.

tschwinge

@vries, thanks. I have a few questions, please have a look.

tschwinge · 2020-11-18T14:37:04Z

nvptx-run.c

+
+  size_t free_mem;
+  size_t dummy;


Should dummy move inside the if (verbose)?

tschwinge · 2020-11-18T14:43:00Z

nvptx-run.c

+      r = cuCtxSetLimit(CU_LIMIT_STACK_SIZE, 0);
+      fatal_unless_success (r, "could not set stack limit");
+
+      r = cuMemGetInfo (&free_mem, &dummy);


Actually, doesn't dummy here (when given a better name) make obsolete the earlier cuDeviceTotalMem call?

cuMemGetInfo: https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g808f555540d0143a331cc42aa98835c0

cuDeviceTotalMem: https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__DEVICE.html#group__CUDA__DEVICE_1gc6a0d6551335a3780f9f3c967a0fde5d

Or, is total amount of memory available for allocation by the CUDA context vs. total amount of memory available on the device intentional?

tschwinge · 2020-11-18T14:45:31Z

nvptx-run.c

+      /* Set stack size limit to 0 to get more accurate free_mem.  */
+      r = cuCtxSetLimit(CU_LIMIT_STACK_SIZE, 0);


From cuCtxSetLimit: https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1g0651954dfb9788173e60a9af7201e65a I can't easily tell the rationale here.

So, should we add more commentary for this, or point to an external URL if that makes sense?

tschwinge · 2020-11-18T14:55:38Z

nvptx-run.c

+      size_t free_mem_update;
+      r = cuMemGetInfo (&free_mem_update, &dummy);
+      fatal_unless_success (r, "could not get free memory");
+      report_val (stderr, "Program args reservation (effective)",
+		  free_mem - free_mem_update);


Doesn't this difference computation implicitly assume that nothing else is using the GPU concurrently? (Which is a wrong assumption?) Or, does every process/CUDA context always have available all the GPU memory -- I don't remember the details, and have not yet looked that up.

tschwinge · 2020-11-18T14:56:33Z

nvptx-run.c

+      size_t free_mem_update;
+      r = cuMemGetInfo (&free_mem_update, &dummy);
+      fatal_unless_success (r, "could not get free memory");
+      report_val (stderr, "Stack size limit reservation (effective)",
+		  free_mem - free_mem_update);


Same concern as above.

vries added 2 commits October 13, 2020 10:42

vries changed the title ~~Verbose 2~~ [nvptx-run] Add --verbose/-v Oct 13, 2020

tschwinge reviewed Nov 18, 2020

View reviewed changes

tschwinge mentioned this pull request Oct 11, 2022

nvptx-none-run: CU_LIMIT_STACK_SIZE #8

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[nvptx-run] Add --verbose/-v #27

[nvptx-run] Add --verbose/-v #27

vries commented Oct 13, 2020

vries commented Oct 13, 2020 •

edited

Loading

tschwinge left a comment

tschwinge Nov 18, 2020

tschwinge Nov 18, 2020

tschwinge Nov 18, 2020

tschwinge Nov 18, 2020

tschwinge Nov 18, 2020

		/* Set stack size limit to 0 to get more accurate free_mem. */
		r = cuCtxSetLimit(CU_LIMIT_STACK_SIZE, 0);

[nvptx-run] Add --verbose/-v #27

Are you sure you want to change the base?

[nvptx-run] Add --verbose/-v #27

Conversation

vries commented Oct 13, 2020

vries commented Oct 13, 2020 • edited Loading

tschwinge left a comment

Choose a reason for hiding this comment

tschwinge Nov 18, 2020

Choose a reason for hiding this comment

tschwinge Nov 18, 2020

Choose a reason for hiding this comment

tschwinge Nov 18, 2020

Choose a reason for hiding this comment

tschwinge Nov 18, 2020

Choose a reason for hiding this comment

tschwinge Nov 18, 2020

Choose a reason for hiding this comment

vries commented Oct 13, 2020 •

edited

Loading