High Performance Computing for Weather and Climate Course
Day 1 Single core performance, stencil program, performance metrics, memory hierarchy, memory bandwidth, peak floating-point performance, arithmetic intensity, roofline model, array storage in memory, data-locality optimizations (blocking, fusion, inlining).
Day 2 Shared memory parallelism, OpenMP, speedup, Amdahl's Law, parallelization, synchronization, variable scoping (private, shared)
Day 3 Distributed memory parallelism, MPI, message passing, point-to-point communication, deadlock, non-blocking communication, gather/scatter operation, domain decomposition, halo-points and halo-updates
Day 4 Graphics Processing Units, hybrid node architecture, high-level GPU programming with CuPy, data managment and offload model, synchronization, vectorization, platform agnostic code, low-level GPU programming
Day 5 High-level programming models, domain-specific language, GT4Py, performance portability, abstraction for stencil computations