SGSeg: Enabling Text-free Inference in Language-guided Segmentation of Chest X-rays via Self-guidance
Shuchang Ye, Mingyuan Meng, Mingjian Li, Dagan Feng, Jinman Kim
TL;DR: The first language-guided segmentation framework enabling text-free inference.
This is a simplified implementation of SGSeg, where the localization-enhanced report generation module is replaced by a classification-based text synthesis module.
The main mandatory dependency versions are as follows:
torch==2.0.1
pytorch_lightning==1.9.0
torchmetrics==1.3.0.post0
transformers==4.24.0
ultralytics==8.1.15
numpy==1.24.3
pandas==2.0.3
pillow==9.4.0
monai==1.0.1
einops==0.7.0
nltk==3.8.1
To evaluate the synthesized text:
>>> import nltk
>>> nltk.download("wordnet")
-
QaTa-COV19 Dataset(images & segmentation mask)
QaTa-COV19 Dataset See Kaggle: https://www.kaggle.com/datasets/aysendegerli/qatacov19-dataset -
QaTa-COV19 Text Annotations(from thrid party)
Check out the related content in LViT: https://github.com/HUANGLIZI/LViT -
The pre-processed and cleaned version of QaTa dataset is available at
./data/QaTa
To evaluate the performace of our model:
-
Specify the path of the pretrained model in
checkpoint_path
parameter inconfig/training.yaml
-
Run evaluation
python evaluate.py
The evaluation would be conducted on the test set. The result would be summarized into 3 tables:
-
Segmentation metrics -
loss
,segmentation accuracy
,dice similarity coefficient
,MIoU
-
Detection metrics -
detection accuracy
,detection recall
,detection precision
,detection f1-score
-
Text generation metrics -
BLEU-1
,BLEU-2
,BLEU-3
,BLEU-4
,ROUGE
,METEOR
-
To finetune our pretrain model, specify the path of the pretrained model in
checkpoint_path
parameter inconfig/training.yaml
OR To train our model from scratch, set thecheckpoint_path
parameter inconfig/training.yaml
toNone
-
Customize the following parameters in
config/training.yaml
for customized training process:
train_batch_size
- the number of samples to be processed in an epochimage_size
- tuple of(H, W)
min_epochs
- minimum epochs of training (unaffected by validation metric)max_epochs
- maximum epochs of trainingpatience
- the number of epochs to wait before discontinuing the training process if the validation metric has not improved
- Run
python train.py
-
BERT Model Download the pre-trained model of CXR-BERT and ConvNeXt
CXR-BERT-specialized see: https://huggingface.co/microsoft/BiomedVLP-CXR-BERT-specialized/tree/main
ConvNeXt-tiny see: https://huggingface.co/facebook/convnext-tiny-224/tree/mainDownload the file 'pytorch_model.bin' to './lib/BiomedVLP-CXR-BERT-specialized/' and './lib/convnext-tiny-224'
Or just use these models online:
url = "microsoft/BiomedVLP-CXR-BERT-specialized" tokenizer = AutoTokenizer.from_pretrained(url,trust_remote_code=True) model = AutoModel.from_pretrained(url, trust_remote_code=True)
Please set the
bert_type
parameter inconfig/training.yaml
to the path of the BERT model (default:microsoft/BiomedVLP-CXR-BERT-specialized
) -
RTDETR model Download the rtdetr-l model weights from https://github.com/ultralytics/assets/releases/download/v8.1.0/rtdetr-l.pt and place it as follows
weights/rtdetr-l.pt
Or it would be automatically downloaded when training or evaluating.
We appreciate the LViT for its contributions in integrating language into medical image segmentation and providing annotated descriptive reports. We also thank the LanGuideMedSeg for the development of a simple yet efficient method for effectively fusing text and images in UNet.
@InProceedings{10.1007/978-3-031-72111-3_23,
author="Ye, Shuchang
and Meng, Mingyuan
and Li, Mingjian
and Feng, Dagan
and Kim, Jinman",
title="Enabling Text-Free Inference in Language-Guided Segmentation of Chest X-Rays via Self-guidance",
booktitle="Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024",
year="2024",
publisher="Springer Nature Switzerland",
address="Cham",
pages="242--252",
isbn="978-3-031-72111-3"
}