We provide the code/scripts to replicate Lotus experiment results in the cloudlab testbed using a c4130 node available in Wisconsin cluster.

The following experiments are targetting an Intel Processor chip with 4x V100 GPUs. The experiments are performed on the ImageNet dataset for the Image Classification task. We focus on a single configuration for the below experiments because the same process/method can be applied to each of them. Please note that the figures generated via below experiments correspond to one configuration of the figures found in the paper.

We have setup software dependencies such as CUDA, CuDNN, Intel VTune, Anaconda, and ImageNet dataset on the c4130 node. You can check out the instructions to do so in the SETUP.md file.

Installation steps

Clone this repository

# Clone in below work directory because some scripts have absolute paths
cd /mydata/iiswc24
# Below command will take some time
git clone --depth 1 --recurse-submodules https://github.com/rajveerb/lotus.git -b iiswc24ae
cd lotus

Create a conda environment

conda create -n lotus python=3.10 -y
conda activate lotus

Install itt-python using build instructions below:

pushd code/itt-python
export ITT_LIBRARY_DIR=/opt/intel/oneapi/vtune/latest/lib64/
export ITT_INCLUDE_DIR=/opt/intel/oneapi/vtune/latest/include
python setup.py install
# Check if installed
pip list | grep "itt"
popd

Install PyTorch (LotusTrace):

sudo apt install -y g++
bash install_lotustrace.sh
# Sanity check
pip list | grep "torch" | grep "2.0.0a0"

Install torchvision:

bash install_torchvision.sh
# Sanity check
pip list | grep "torchvision" | grep "0.15.1a0"

Install below packages:

conda install ipykernel pandas=2.0.3 -y
pip install matplotlib==3.9.0 natsort==8.4.0 seaborn==0.13.2

Experiment steps

Get the mapping logs for the preprocessing operations:

# Activate VTune, command will fail an error if it is already activated
source /opt/intel/oneapi/setvars.sh
# Sanity check
vtune --version
bash code/image_classification/LotusMap/Intel/LotusMap.sh

Generate JSON file with mapping info by running all cells in code/image_classification/LotusMap/Intel/logsToMapping.ipynb
You have successfully obtained the mapping (mapping_funcs.json) using LotusMap (Table 1)!
Run the Image Classification pipeline experiment where batch size and number of gpus are varied and LotusTrace is enabled:
```
bash scripts/cloudlab/LotusTrace_imagenet.sh
```
Note: # of DataLoader workers is equal to # of gpus in this experiment.

Run the below commands for observations in High variance in Preprocessing Time for fig 4 (a) and the statistics:

python code/image_classification/analysis/LotusTrace_imagenet_vary_batch_and_gpu/preprocessing_time_stats.py\
 --remove_outliers\
 --data_dir lotustrace_result/512_gpu4/\
 --output_file lotustrace_result/preprocessing_time_stats.log 
python code/image_classification/analysis/LotusTrace_imagenet_vary_batch_and_gpu/box_plot_preprocessing_time.py\
 --remove_outliers\
 --data_dir lotustrace_result/512_gpu4\
 --output_file lotustrace_result/box_plot_preprocessing_time.png

Run the below commands for observations in Significant wait time for fig 4 (b), (c) and the statistics:

python code/image_classification/analysis/LotusTrace_imagenet_vary_batch_and_gpu/delay_and_wait_time_stats_and_plot.py\
 --sort_criteria duration\
 --data_dir lotustrace_result/b512_gpu4\
 --fig_dir lotustrace_result/figures\
 --output_file lotustrace_result/delay_and_wait_time_stats_and_plot.log

Run the visualization script for Fig 2:

python code/visualize_LotusTrace/visualization_augmenter.py\
 --coarse\
 --lotustrace_trace_dir lotustrace_result/b512_gpu4\
 --custom_log_prefix lotustrace_log\
 --output_lotustrace_viz_file lotustrace_result/viz_file.lotustrace

Open the file in chrome trace viewer for visualization (Navigate to chrome://tracing URL in Google Chrome, upload the viz_file.lotustrace and visualize the trace)

Run the below command for Image Classification pipeline to generate hardware performance numbers for Fig 5:
```
source /opt/intel/oneapi/setvars.sh
bash scripts/cloudlab/LotusTrace_imagenet_vtune.sh
```
Follow the below steps to get a CSV of hw performance numbers (has to be performed manually):
```
# Below step will provide a link, open a browser window, and login to the VTune GUI (set the password to anything you like)
vtune-backend --web-port 8080 --data-directory ./vtune_mem_access_vary_dataloader/b1024_gpu4_dataloader20
```
- Navigate to Microarchitecture Exploration tab
- Perform grouping by Source Function / Function / Call Stack
- Select all cells and paste it in a CSV file called code/image_classification/analysis/combine_lotus/lotustrace_uarch/b1024_gpu4_dataloader20.csv
Plot Fig 5 (a) by running code/image_classification/analysis/combine_lotus/elapsed_time_plot.ipynb notebook Check out the plot at the bottom of the notebook.
Plot Fig 5 (b) by running code/image_classification/analysis/combine_lotus/per_python_func_plot_vary_dataloaders.ipynb notebook Check out the plot at the bottom of the notebook.

Plot Fig 5 (c) by running below command:

python code/image_classification/analysis/combine_lotus/hw_event_analyzer.py\
 --mapping_file code/image_classification/LotusMap/Intel/mapping_funcs.json\
 --uarch_dir code/image_classification/analysis/combine_lotus/lotustrace_uarch\
 --combined_hw_events code/image_classification/analysis/combine_lotus/combined_lotustrace_uarch.csv\
 --cpp_hw_events_plot_dir code/image_classification/analysis/combine_lotus/cpp_hw_events_figs

Check out the code/image_classification/analysis/combine_lotus/cpp_hw_events_figs directory for the plots.

Plot Fig 5 (e)-(h) by running code/image_classification/analysis/combine_lotus/c_to_python_analyser.ipynb notebook Check out the plots in the code/image_classification/analysis/combine_lotus/mapped_python_figs directory.
That completes the experiment for LotusTrace on ImageNet dataset for Image Classification task!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REPLICATE.md

REPLICATE.md

Installation steps

Experiment steps

Files

REPLICATE.md

Latest commit

History

REPLICATE.md

File metadata and controls

Installation steps

Experiment steps