Skip to content

cyanguwa/DeepLearningProfiling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DeepLearningProfiling

This repository includes three components:

Scripts for profiling, post-processing and Roofline plotting are added on top of the original repositories. Some of the profiling scripts are based on:

The new hierarchical Roofline methodology is: [Based on Nsight Compute from CUDA 11]

  • Time: sm__cycles_elapsed.avg / sm__cycles_elapsed.avg.per_second
  • FLOPs:
    sm__sass_thread_inst_executed_op_dadd_pred_on.sum + 2 x sm__sass_thread_inst_executed_op_dfma_pred_on.sum + sm__sass_thread_inst_executed_op_dmul_pred_on.sum + sm__sass_thread_inst_executed_op_fadd_pred_on.sum + 2 x sm__sass_thread_inst_executed_op_ffma_pred_on.sum + sm__sass_thread_inst_executed_op_fmul_pred_on.sum + sm__sass_thread_inst_executed_op_hadd_pred_on.sum + 2 x sm__sass_thread_inst_executed_op_hfma_pred_on.sum + sm__sass_thread_inst_executed_op_hmul_pred_on.sum + 512 x sm__inst_executed_pipe_tensor.sum
  • Bytes: dram__bytes.sum, lts__t_bytes.sum, and l1tex__t_bytes.sum

Some notes on the file structure:

  • TF-xxx and PT-xxx contain the results for all 12 metrics for different configurations
  • code-xxx contains the source code, same as the original repository or modified
  • scripts-xxx contains the job scripts run on Cori, for different configurations
  • genJobScripts-xx.ipynb generates Slurm job scripts for scripts-xxx folder
  • plotRoofline.py contains the roofline() function for plotting
  • postprocess-xx.ipynb processes the results in TF-xxx and PT-xxx folders, and put them in relevant Pandas dataframes, dfcsfw and dfcsbw for example.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published