Skip to content

RWTH-HPC/rmasanitizer-artifact

Repository files navigation

RMASanitizer: Generalized Runtime Detection of Data Races in Remote Memory Access Applications - Computational Artifact

This is the computational artifact for the paper "RMASanitizer: Generalized Runtime Detection of Data Races in Remote Memory Access Applications" submitted to the ICPP'24 conference.

Authors: Simon Schwitanski, Yussur Mustafa Oraji, Cornelius Pätzold, Joachim Jenke, Felix Tomski, Matthias S. Müller, High Performance Computing, RWTH Aachen University

Repository Structure

  • RMASanitizer: Source code of RMASanitizer (for a detailed explanation, see below)
  • MUST-RMA: Source code of MUST-RMA
  • classification_quality: Results of RMASanitizer, MUST-RMA and PARCOACH-{dynamic,static} on RMARaceBench (Table 3, Section 6.1)
    • The exact version of RMARaceBench used is also included in this repo.
  • overhead_evaluation: Results of the large-scale experiment on RMASanitizer and MUST-RMA on the different proxy apps considered in the paper (Figure 10, Section 6.2)
  • overhead_evaluation/plots: Resulting plots (Figure 10, Section 6.2)
  • classification_quality.sh: Script to reproduce classification quality results (Table 3, Section 6.1)
  • overhead_submit.sh / overhead_result.sh: Scripts to reproduce the overhead results (Figure 10, Section 6.2)

Reproducing Results

The results of this artifact can be reproduced on an HPC cluster or using ChameleonCloud.

Optional: Starting a ChameleonCloud Node

For simplified execution of the experiments, we provide a ChameleonCloud script that sets up a machine suitable for running all evaluations. The script can be executed with

./reserve_chameleon_node.sh

It will start up a properly configured node (compute_zen3) using the infrastructure of CHI@TACC with openstack. The image used for the nodes is ubuntu2204-rmasan (CHI@TACC).

If openstack is not installed on the executing system, a corresponding Docker image can be built and executed with

./start_openstack_image.sh

After reserving the node, the script automatically connects via SSH to the machine. On the machine, the following script available on the machine itself downloads the artifact to the node:

./bootstrap.sh

Classification Quality Results

To reproduce the results, run

./classification_quality.sh

The results will be available in the folder cq-results-YYMMDD-HHMMSS for further investigation. The files in the summaries folder can be used to compare the reproduced results with the reference results (see classification_quality). The script will also print out the summarized results to the command line.

Overhead Study

To reproduce the results of the small-scale experiment, run

./overhead_evaluation_chameleon.sh

The results will be available in the folder perf-results-YYMMDD-HHMMSS. In particular, the resulting plotted PNG file is contained within the folder. For reference, the paper results (large-scale experiment) can be found at overhead_evaluation/plots/results_largescale.png and the Chameleon cloud results (small-scale experiment) can be found at overhead_evaluation/plots/results_smallscale.png.

Qualitatively, the results of the small-scale experiment are similar to that of the large-scale experiment in the paper: RMASanitizer has a significant smaller slowdown than MUST-RMA. With an increasing number of processes, the slowdown of RMASanitizer (and also MUST-RMA) increases.

Optional: Copy results back

To access the result files from outside container, the script

./copy_results_to_objectstorage.sh

copies the result to the ChameleonCloud object storage. The files can then be viewed with the ChameleonCloud object storage file viewer.

Input Sizes

We provide input sizes for a large-scale experiment (M) (as done in the paper) and for a small-scale experiment (S). The input sizes can also be set in the overhead_submit.sh script.

The input sizes of the large-scale experiment are:

  • PRK_Stencil: 1000 iterations, 48*10^6 elements per processor, weak scaling
  • NPB BT-RMA: Class D (408 x 408 x 408), strong scaling
  • LULESH: 20^3, 8000 elements per processor, weak scaling
  • miniMD: 400 timesteps, LJ, 260000 atoms per processor, weak scaling
  • PRK_Stencil_shmem: 1000 iterations, 48*10^6 elements per processor, weak scaling
  • NPB BT-SHMEM: Class D (408 x 408 x 408), strong scaling
  • CFD-Proxy: 1000 iterations, F6 airplane mesh, strong scaling

The input sizes of the small-scale experiment are:

  • PRK_Stencil: 100 iterations, 32*10^6 elements per processor, weak scaling
  • NPB BT-RMA: Class C (162 x 162 x 162), strong scaling
  • LULESH: 12^3, 1728 elements per processor, weak scaling
  • miniMD: 40 timesteps, LJ, 260000 atoms per processor, weak scaling
  • PRK_Stencil_shmem: 100 iterations, 32*10^6 elements per processor, weak scaling
  • NPB BT-SHMEM: Class C (162 x 162 x 162), strong scaling
  • CFD-Proxy: 1000 iterations, F6 airplane mesh, strong scaling

JUBE Commands

Large-Scale Experiment (Paper)

The following JUBE commands are used for each benchmark (input size: M):

# Run benchmark with MUST-RMA
jube run overhead_evaluation/jube/<benchmark>/<benchmark>.xml -o MUST-RMA/ --tag M ignorelist pnmpi memusage rebuild_source must-rma 

# Run benchmark with RMASanitizer
jube run overhead_evaluation/jube/<benchmark>/<benchmark>.xml -o RMASanitizer/ --tag M ignorelist pnmpi memusage rebuild_source tsan-opt

Small-Scale Experiment (Chameleon)

The following JUBE commands are used for each benchmark (input size: S):

# Run benchmark with MUST-RMA
jube run overhead_evaluation/jube/<benchmark>/<benchmark>.xml -o MUST-RMA/ --tag S ignorelist memusage rebuild_source must-rma chameleon

# Run benchmark with RMASanitizer
jube run overhead_evaluation/jube/<benchmark>/<benchmark>.xml -o RMASanitizer/ --tag S ignorelist ignorelist memusage rebuild_source tsan-opt chameleon

Software Architecture of RMASanitizer

RMASanitizer has been integrated as a tool within the MPI correctness checking framework MUST. It utilizes ThreadSanitizer for the RMA race detection. The software components mentioned in Section 5 / in Figure 9 of the paper can be found in the following directories:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published