We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Now the launch overhead is more than 99%
➜ modules git:(GPUdemo) ✗ nvprof julia QCBMS.jl ==22279== NVPROF is profiling process 22279, command: julia QCBMS.jl ==22279== Profiling application: julia QCBMS.jl ==22279== Profiling result: Type Time(%) Time Calls Avg Min Max Name GPU activities: 70.36% 77.0104s 810000 95.074us 74.113us 279.72us ptxcall_simple_kernel_2 28.96% 31.6927s 720000 44.017us 32.896us 113.19us ptxcall_simple_kernel_3 0.68% 748.96ms 10000 74.895us 72.801us 79.361us ptxcall_anonymous23_1 0.00% 1.1371ms 4 284.27us 1.7600us 1.0389ms [CUDA memcpy HtoD] API calls: 99.11% 90.5692s 1540000 58.811us 6.5610us 9.6723ms cuLaunchKernel 0.43% 389.37ms 1540034 252ns 145ns 649.65us cuCtxGetCurrent 0.23% 210.94ms 1 210.94ms 210.94ms 210.94ms cuCtxCreate 0.14% 129.13ms 1 129.13ms 129.13ms 129.13ms cuCtxDestroy 0.07% 65.987ms 3 21.996ms 47.171us 65.891ms cuModuleUnload 0.01% 13.700ms 27 507.41us 439.26us 724.08us cuMemAlloc 0.00% 2.5056ms 3 835.19us 348.68us 1.7719ms cuModuleLoadDataEx 0.00% 1.4557ms 4 363.94us 43.000us 1.1706ms cuMemcpyHtoD 0.00% 36.489us 8 4.5610us 3.6320us 8.1710us cuDeviceGetPCIBusId 0.00% 15.972us 30 532ns 167ns 2.4170us cuDeviceGetAttribute 0.00% 9.0610us 9 1.0060us 283ns 4.6000us cuDeviceGet 0.00% 3.2120us 3 1.0700us 1.0430us 1.0890us cuModuleGetFunction 0.00% 2.6260us 3 875ns 707ns 1.0060us cuCtxGetDevice 0.00% 2.4400us 1 2.4400us 2.4400us 2.4400us cuDriverGetVersion 0.00% 2.0020us 2 1.0010us 282ns 1.7200us cuDeviceGetCount
The text was updated successfully, but these errors were encountered:
does not make sense.
Sorry, something went wrong.
No branches or pull requests
Now the launch overhead is more than 99%
The text was updated successfully, but these errors were encountered: