Skip to content

yugjerry/conformal-alignment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 

Repository files navigation

conformal-alignment

This repo implements the Conformal Alignment procedure in the tasks of question answering and radiology report generation. Given any pre-trained model and new units with model-generated outputs, Conformal Alignment leverages a set of reference data with ground-truth alignment status to train an alignment predictor. It then selects new units whose predicted alignment scores surpass a data-dependent threshold, certifying their corresponding outputs as trustable. It is guaranteed that on average, a prescribed fraction of selected units indeed meet the alignment criterion, regardless of the foundation model or the data distribution.

Answer generating in question answering and calculation of confidence/uncertainty scores follow the implementation in https://github.com/zlin7/UQ-NLG.

Question answering (qa/)

Dataset and LLM preparation

This repo supports the TriviaQA and CoQA datasets, which will be prepared in qa/pipeline/generate.py. Two large language models (LLM) used in the paper are OPT-13B and LLaMA-2-13B-chat.

Answer generating

You need to specify the LLM and dataset in use, e.g. model='llama-2-13b-chat-hf', dataset='triviaqa'. Use the following command to generate answers by batch (idx is the index of each batch).

python3 -m pipeline/generate --model $model --dataset $dataset --batch_size 20 --idx $SGE_TASK_ID

Scores extraction

After the generation step, use the following command to obtain self-evaluation scores and uncertainty/confidence scores.

python3 -m dataeval/load_run.py --batch_size $bsize --data $data --model $model --idx $SGE_TASK_ID

Conformal Alignment implementation

The script _fdr.py implements the Conformal Alignment procedure and calculates FDR and power.

python3 -m _fdr.py --data "triviaqa" --model "llama-2-13b-chat-hf" --N 2000 --split_pr 0.5 --split_pr_tune 0.2

notebook/run_qa.ipynb reproduces figures and examples in the paper.

Chest X-ray (CXR) report generating (cxr/)

Dataset and LLM preparation

CXR image preprocessing and vision-language model fine-tuning in the notebook cxr/vlm_finetune.ipynb follow the implementation in conformal language modeling, in which we use a Vision Transformer (ViT) pretrained on ImageNet-21k as the image encoder and GPT2 as the text decoder.

MIMIC-CXR dataset needs access. See the PhysioNet project page.

Report generating

After specifying the fine-tuned model (model='trained') and dataset (data='cxr'), use the following command to generate and concatenate outputs (bnum is the number of batches and bsize is the batch size).

python3 -m pipeline/generate --idx $SGE_TASK_ID --batch_size $bsize
python3 -m pipeline/generate_encode.py --num_batch $bnum

Scores extraction

After the generation step, use the following command to obtain self-evaluation scores and uncertainty/confidence scores.

python3 -m dataeval/load_run.py --idx $SGE_TASK_ID --batch_size $bsize --data $data --model $model

Conformal Alignment implementation

The script _fdr.py implements the Conformal Alignment procedure and calculates FDR and power.

python3 -m _fdr.py --data "cxr" --model "trained" --N 2000 --split_pr 0.5 --split_pr_tune 0.2

notebook/run_cxr.ipynb presents examples of report generating using the fine-tuned model and also reproduces figures and examples in the paper.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published