Introduction to heterogeneous systems
- GPU architecture
- CUDA
- Programming and execution models
- Memory organization in CUDA
- Task level parallelism streams, events, dynamic parallelism
- Tools CUDA compiler, profiler and debugger
Optimizing parallel patterns in CUDA with some case studies:
- Convolution
- Scan
- Stencil
- Merge
- Graph Traversal
- Extra: Electrostatic Potential Map and GPU Computation
Heterogeneous systems
- Architectural aspects
- Software aspects (programmability and runtime resource management)
Brief overview on OpenACC and OpenCL
CSE 599I: Accelerated Computing - Programming GPUs (tschmidt23.github.io)