nvortexCuda

Simple n-body program to test Cuda performance

Compile and run

As long CUDA is installed and nvcc is in your PATH, you should be able to do:

make
./nvCuda01.bin
./nvCuda02.bin
./nvCuda03.bin

Description

This repository contains a few progressive examples of a compute-only n-body calculation of the Biot-Savart influence of N desingularized vorticies on one another.

Program nvCuda01.cu is the simplest implementation. On the CPU side, the program parallelizes with a basic OpenMP parallel for loop over the target particles. On the GPU side, we use CUDA without unified or pinned memory (full transfers), with one target particle per "thread."

Program nvCuda02.cu speeds this up considerably. The CPU now uses omp simd to vectorize the inner loop over source particles. The GPU uses shared memory to load blocks of source particles in a coalesced manner before all threads operate on that block. This program represents the "80" part of the "80-20 rule": that you can go most of the way with some simple methods.

Program nvCuda03 adds some enhancements in an attempt to eke out even more performance, though only on the GPU side. First, we moved the GPU timers to not count allocation and deallocation, in order to be more consistent with the CPU timers. Second, we now break the computation up along the source-particle dimension, to allow for greater concurrency, which requires atomicAdd to write results back to main GPU memory. Finally, we added support for multiple GPU systems.

Finally, nvCuda04 saves one flop per inner loop by presquaring the target radius, but adds six more by performing Kahan summation on the accumulators. This further reduces errors inherent in summing large arrays of numbers, but seems incompatible with the omp simd clause.

Other codes

If you want to see how other libraries and methodologies improve performance on this problem, look at some of my other repositories:

nvortexVc - using Vc for explicit vectorization
onbody - using CPU treecodes with better order of operations
Omega2D - a complete 2D vortex methods simulator

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
ngCuda03.cu		ngCuda03.cu
ngCuda04.cu		ngCuda04.cu
nvCuda01.cu		nvCuda01.cu
nvCuda02.cu		nvCuda02.cu
nvCuda03.cu		nvCuda03.cu
nvCuda04.cu		nvCuda04.cu
perf_zen0_3070ti_cuda112		perf_zen0_3070ti_cuda112

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

nvortexCuda

Compile and run

Description

Other codes

About

Releases

Languages

License

markstock/nvortexCuda

Folders and files

Latest commit

History

Repository files navigation

nvortexCuda

Compile and run

Description

Other codes

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Languages