Skip to content

Latest commit

 

History

History
26 lines (22 loc) · 2.05 KB

README.md

File metadata and controls

26 lines (22 loc) · 2.05 KB

DeepLearningProfiling

This repository includes three components:

Scripts for profiling, post-processing and Roofline plotting are added on top of the original repositories. Some of the profiling scripts are based on:

The new hierarchical Roofline methodology is: [Based on Nsight Compute from CUDA 11]

  • Time: sm__cycles_elapsed.avg / sm__cycles_elapsed.avg.per_second
  • FLOPs:
    sm__sass_thread_inst_executed_op_dadd_pred_on.sum + 2 x sm__sass_thread_inst_executed_op_dfma_pred_on.sum + sm__sass_thread_inst_executed_op_dmul_pred_on.sum + sm__sass_thread_inst_executed_op_fadd_pred_on.sum + 2 x sm__sass_thread_inst_executed_op_ffma_pred_on.sum + sm__sass_thread_inst_executed_op_fmul_pred_on.sum + sm__sass_thread_inst_executed_op_hadd_pred_on.sum + 2 x sm__sass_thread_inst_executed_op_hfma_pred_on.sum + sm__sass_thread_inst_executed_op_hmul_pred_on.sum + 512 x sm__inst_executed_pipe_tensor.sum
  • Bytes: dram__bytes.sum, lts__t_bytes.sum, and l1tex__t_bytes.sum

Some notes on the file structure:

  • TF-xxx and PT-xxx contain the results for all 12 metrics for different configurations
  • code-xxx contains the source code, same as the original repository or modified
  • scripts-xxx contains the job scripts run on Cori, for different configurations
  • genJobScripts-xx.ipynb generates Slurm job scripts for scripts-xxx folder
  • plotRoofline.py contains the roofline() function for plotting
  • postprocess-xx.ipynb processes the results in TF-xxx and PT-xxx folders, and put them in relevant Pandas dataframes, dfcsfw and dfcsbw for example.