The following sections will have steps to evaluate a given model on a given dataset. As an example, we have used config iclrw/cough/v9.7/adamW_1e-4.yml
but the steps hold for any other config. Please see details of configs used as part of our ICLRW submission.
Given the model checkpoint and corresponding config file, you can evaluate on a given dataset.
- Checkpoint:
assets/models/iclrw/cough/v9.7/adamW_1e-4/checkpoints/113_ckpt.pth.tar
- Corresponding config:
configs/experiments/iclrw/cough/v9.7/adamW_1e-4.yml
- Dataset:
wiai-facility | v9.7 | test
- Copy model checkpoint in appropriate output folder (run inside docker):
# copies from assets/models/ckpt_path/ to /output/experiments/ckpt_path/
ckpt_path=experiments/iclrw/cough/v9.7/adamW_1e-4/checkpoints/113_ckpt.pth.tar
python training/copy_model_ckpts.py -p $ckpt_path --dst_prefix experiments
- Run forward pass and store metrics
cfg=iclrw/cough/v9.7/adamW_1e-4.yml
python evaluation/inference.py -v $cfg -e 113 -dn wiai-facility -dv v9.7 -m test --at softmax -t 0.1317
The results are published on the terminal with key metric being AUC-ROC. Here, explanation of args:
-v
: experiment version (config file)-u
: corresponds to the user who trained the model, no need to pass this when you have config file inconfigs/
folder.-e
: epoch/checkpoint number of the trained model-dn
: dataset name-dv
: dataset version (name of.yml
file stored)-m
: mode, train/test/val-at
: point of the outputs where aggregation is applied, e.g. aftersoftmax
-t
: threshold at which the model is evaluated against at the given mode
Model | Dataset | Config file | W&B link | Best val AUC/epoch/threshold | ILA threshold |
---|---|---|---|---|---|
Cough | V9.4 | experiments/iclrw/cough/v9.4/adamW_1e-4_cyclic.yml |
Link | 0.6558/38/0.1565 | 0.2827 |
Cough | V9.7 | experiments/iclrw/cough/v9.7/adamW_1e-4.yml |
Link | 0.6293/113/0.06858 | 0.1317 |
Cough | V9.8 | experiments/iclrw/cough/v9.8/adamW_1e-4.yml |
Lnk | 0.789/47/0.1604 | 0.2170 |
Context | V9.4 | experiments/iclrw/context/v9.4/context-neural.yml |
Link | 0.6849/9/0.2339 | 0.2339 |
Context | V9.7 | experiments/iclrw/context/v9.7/context-neural.yml |
Link | 0.6054/31/0.2069 | 0.2069 |
Context | V9.8 | experiments/iclrw/context/v9.8/context-neural.yml |
Link | 0.6484/44/0.2282 | 0.2282 |
Note: W&B link may not work for you since it is within Wadhwani AI W&B account.
Given you ran training with following config, you can run evaluation as follows:
- Corresponding config:
configs/experiments/iclrw/cough/v9.7/adamW_1e-4.yml
- Dataset:
wiai-facility | v9.7 | test
- Run forward pass and store metrics
cfg=iclrw/cough/v9.7/adamW_1e-4.yml
python evaluation/inference.py -v $cfg -e 113 -dn wiai-facility -dv v9.7 -m test --at softmax
Note: Here, you do not need to copy checkpoint since checkpoints are saved during training itself. Plus, you do not need to explicitly pass
-threshold
since it picks it up from validation set logs saved during training.
-
Before running evaluation of ensemble of predictions, you need to run inference for the individual models. Follow aforementioned steps.
-
Create a meta config for ensembling models (e.g.). In this example, we are ensembling a cough-based model and context-based models with ensemling weights of 0.5 each.
models:
cough:
version: experiments/iclrw/cough/v9.7/adamW_1e-4.yml
epoch: 113
user: null
weight: 0.5
agg_method: max
context:
version: experiments/iclrw/context/v9.7/context-neural.yml
epoch: 31
user: null
weight: 0.5
agg_method: max
data:
mode: test
- Run the ensembling to see result metrics
python evaluation/ensemble.py -c experiments/iclrw/ensemble/cough_context_v9.7.yml