Skip to content

Latest commit

 

History

History
99 lines (83 loc) · 4.99 KB

File metadata and controls

99 lines (83 loc) · 4.99 KB

DLRM v1 Training

DLRM v1 Training best known configurations with Intel® Extension for PyTorch.

Model Information

Use Case Framework Model Repo Branch/Commit/Tag Optional Patch
Training PyTorch https://github.com/facebookresearch/dlrm - -

Pre-Requisite

  • Installation of PyTorch and Intel Extension for PyTorch

  • Installation of Build PyTorch + IPEX + TorchVision Jemalloc and TCMalloc

  • Installation of oneccl-bind-pt (if running distributed)

  • Set Jemalloc and tcmalloc Preload for better performance

    The jemalloc and tcmalloc should be built from the General setup section.

    export LD_PRELOAD="<path to the jemalloc directory>/lib/libjemalloc.so":"path_to/tcmalloc/lib/libtcmalloc.so":$LD_PRELOAD
    export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:9000000000,muzzy_decay_ms:9000000000"
    
  • Set IOMP preload for better performance

  pip install packaging intel-openmp
  export LD_PRELOAD=path/lib/libiomp5.so:$LD_PRELOAD
  • Set ENV to use fp16 AMX if you are using a supported platform
  export DNNL_MAX_CPU_ISA=AVX512_CORE_AMX_FP16

Prepare Dataset

Criteo Terabyte Dataset

The Criteo Terabyte Dataset is used to run DLRM. To download the dataset, you will need to visit the Criteo website and accept their terms of use: https://labs.criteo.com/2013/12/download-terabyte-click-logs/. Copy the download URL into the command below as the and replace the <dir/to/save/dlrm_data> to any path where you want to download and save the dataset.

export DATASET_DIR=<dir/to/save/dlrm_data>
mkdir ${DATASET_DIR} && cd ${DATASET_DIR}
curl -O <download url>/day_{$(seq -s , 0 23)}.gz
gunzip day_*.gz

The raw data will be automatically preprocessed and saved as day_*.npz to the DATASET_DIR when DLRM is run for the first time. On subsequent runs, the scripts will automatically use the preprocessed data.

Training

  1. git clone https://github.com/IntelAI/models.git
  2. cd models/models_v2/pytorch/dlrm/training/cpu
  3. Create virtual environment venv and activate it:
    python3 -m venv venv
    . ./venv/bin/activate
    
  4. Install general model requirements
    pip install -r requirements.txt
    
  5. Install the latest CPU versions of torch, torchvision and intel_extension_for_pytorch.
  6. Setup required environment paramaters
Parameter export command
DISTRIBUTED (leave unset if training single-node) export DISTRIBUTED=true
NODE (leave unset if training single-node) export NODE=2
NUM_CCL_WORKER (leave unset if training single-node) export NUM_CCL_WORKER=4
HOSTFILE (leave unset if training single-node) export HOSTFILE=<your host file>
OUTPUT_DIR export OUTPUT_DIR=$PWD
DATASET_DIR export DATASET_DIR=<path-to-dlrm_data> or <path-to-preprocessed-data>
BATCH_SIZE export BATCH_SIZE=10000
PRECISION export PRECISION=fp32 <specify the precision to run: fp32, bf32 or bf16>
NUM_BATCH export NUM_BATCH=<10000 for test performance and 50000 for testing convergence trend>
(optional)Compile model with PyTorch Inductor backend export TORCH_INDUCTOR=1
  1. Run run_model.sh

Output

Single-tile output will typically look like:

accuracy 76.215 %, best 76.215 %
dlrm_inf latency:  0.11193203926086426  s
dlrm_inf avg time:  0.007462135950724284  s, ant the time count is : 15
dlrm_inf throughput:  4391235.996821996  samples/s

Final results of the training run can be found in results.yaml file.

results:
 - key: throughput
   value: 4391236.0
   unit: inst/s
 - key: latency
   value: 0.007462135950724283
   unit: s
 - key: accuracy
   value: 76.215
   unit: accuracy