Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge kernels #3

Closed
GiggleLiu opened this issue Nov 20, 2018 · 1 comment
Closed

Merge kernels #3

GiggleLiu opened this issue Nov 20, 2018 · 1 comment

Comments

@GiggleLiu
Copy link
Member

Now the launch overhead is more than 99%

➜  modules git:(GPUdemo) ✗ nvprof julia QCBMS.jl
==22279== NVPROF is profiling process 22279, command: julia QCBMS.jl
==22279== Profiling application: julia QCBMS.jl
==22279== Profiling result:
            Type  Time(%)      Time     Calls       Avg       Min       Max  Name
 GPU activities:   70.36%  77.0104s    810000  95.074us  74.113us  279.72us  ptxcall_simple_kernel_2
                   28.96%  31.6927s    720000  44.017us  32.896us  113.19us  ptxcall_simple_kernel_3
                    0.68%  748.96ms     10000  74.895us  72.801us  79.361us  ptxcall_anonymous23_1
                    0.00%  1.1371ms         4  284.27us  1.7600us  1.0389ms  [CUDA memcpy HtoD]
      API calls:   99.11%  90.5692s   1540000  58.811us  6.5610us  9.6723ms  cuLaunchKernel
                    0.43%  389.37ms   1540034     252ns     145ns  649.65us  cuCtxGetCurrent
                    0.23%  210.94ms         1  210.94ms  210.94ms  210.94ms  cuCtxCreate
                    0.14%  129.13ms         1  129.13ms  129.13ms  129.13ms  cuCtxDestroy
                    0.07%  65.987ms         3  21.996ms  47.171us  65.891ms  cuModuleUnload
                    0.01%  13.700ms        27  507.41us  439.26us  724.08us  cuMemAlloc
                    0.00%  2.5056ms         3  835.19us  348.68us  1.7719ms  cuModuleLoadDataEx
                    0.00%  1.4557ms         4  363.94us  43.000us  1.1706ms  cuMemcpyHtoD
                    0.00%  36.489us         8  4.5610us  3.6320us  8.1710us  cuDeviceGetPCIBusId
                    0.00%  15.972us        30     532ns     167ns  2.4170us  cuDeviceGetAttribute
                    0.00%  9.0610us         9  1.0060us     283ns  4.6000us  cuDeviceGet
                    0.00%  3.2120us         3  1.0700us  1.0430us  1.0890us  cuModuleGetFunction
                    0.00%  2.6260us         3     875ns     707ns  1.0060us  cuCtxGetDevice
                    0.00%  2.4400us         1  2.4400us  2.4400us  2.4400us  cuDriverGetVersion
                    0.00%  2.0020us         2  1.0010us     282ns  1.7200us  cuDeviceGetCount
@GiggleLiu
Copy link
Member Author

does not make sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant