Merge kernels #3

GiggleLiu · 2018-11-20T14:08:47Z

Now the launch overhead is more than 99%

➜  modules git:(GPUdemo) ✗ nvprof julia QCBMS.jl
==22279== NVPROF is profiling process 22279, command: julia QCBMS.jl
==22279== Profiling application: julia QCBMS.jl
==22279== Profiling result:
            Type  Time(%)      Time     Calls       Avg       Min       Max  Name
 GPU activities:   70.36%  77.0104s    810000  95.074us  74.113us  279.72us  ptxcall_simple_kernel_2
                   28.96%  31.6927s    720000  44.017us  32.896us  113.19us  ptxcall_simple_kernel_3
                    0.68%  748.96ms     10000  74.895us  72.801us  79.361us  ptxcall_anonymous23_1
                    0.00%  1.1371ms         4  284.27us  1.7600us  1.0389ms  [CUDA memcpy HtoD]
      API calls:   99.11%  90.5692s   1540000  58.811us  6.5610us  9.6723ms  cuLaunchKernel
                    0.43%  389.37ms   1540034     252ns     145ns  649.65us  cuCtxGetCurrent
                    0.23%  210.94ms         1  210.94ms  210.94ms  210.94ms  cuCtxCreate
                    0.14%  129.13ms         1  129.13ms  129.13ms  129.13ms  cuCtxDestroy
                    0.07%  65.987ms         3  21.996ms  47.171us  65.891ms  cuModuleUnload
                    0.01%  13.700ms        27  507.41us  439.26us  724.08us  cuMemAlloc
                    0.00%  2.5056ms         3  835.19us  348.68us  1.7719ms  cuModuleLoadDataEx
                    0.00%  1.4557ms         4  363.94us  43.000us  1.1706ms  cuMemcpyHtoD
                    0.00%  36.489us         8  4.5610us  3.6320us  8.1710us  cuDeviceGetPCIBusId
                    0.00%  15.972us        30     532ns     167ns  2.4170us  cuDeviceGetAttribute
                    0.00%  9.0610us         9  1.0060us     283ns  4.6000us  cuDeviceGet
                    0.00%  3.2120us         3  1.0700us  1.0430us  1.0890us  cuModuleGetFunction
                    0.00%  2.6260us         3     875ns     707ns  1.0060us  cuCtxGetDevice
                    0.00%  2.4400us         1  2.4400us  2.4400us  2.4400us  cuDriverGetVersion
                    0.00%  2.0020us         2  1.0010us     282ns  1.7200us  cuDeviceGetCount

The text was updated successfully, but these errors were encountered:

GiggleLiu · 2024-01-15T08:50:58Z

does not make sense.

GiggleLiu closed this as completed Jan 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge kernels #3

Merge kernels #3

GiggleLiu commented Nov 20, 2018

GiggleLiu commented Jan 15, 2024

Merge kernels #3

Merge kernels #3

Comments

GiggleLiu commented Nov 20, 2018

GiggleLiu commented Jan 15, 2024