This repository contains source code for PL-Marker++, information extraction model for radiology reports used in our paper "A Novel Corpus of Annotated Medical Imaging Reports and Information Extraction Results Using BERT-based Language Models" (Accepted @ LREC-COLING2024, repo: https://github.com/uw-bionlp/CAMIR).
Example input, output data are not included (will be added upon IRB approval). PL-Marker++, which is the augmented version of PL-Marker, provides the classification of subtypes for extracted entities.
Original PL-Marker implementation can be found at https://github.com/thunlp/PL-Marker
- Download current directory
- Unzip transformers.zip
- Download all models from https://drive.google.com/drive/u/0/folders/1eyaqjrMNUJLxAIHxiYrqX4cCapxgZjPj and put in the same directory
- Create 2 seperate Conda environments (both using python=3.8.18) using mspert_req.txt (mspert) and plmarker_req.txt (plmarker). conda create -n mspert_test python=3.8.18 conda activate mspert_test pip install -r ./mspert_req.txt
conda create -n plmarker_test python=3.8.18 conda activate plmarker_test pip install -r ./plmarker_req.txt pip install --editable ./transformers
- Input radiology reports should be located in ./sample_data using .txt file format
- sample.txt is randomly selected from mtsamples radiology report (open-source radiology reports)
- bash ./run_plmarker.sh -> This shell script includes entity extraction, subtype extraction and relation extraction.
- Final output file with entity, subtype and relation information is "./example_input_ent_pred_test_normalized_with_RE.json"
- Output with only entity extraction can be found in "./incidentaloma_models/PL-Marker-incidentaloma-bertbase-45/example_input_ent_pred_test.json"
- Output with entity+subtype extraction can be found in "./example_input_ent_pred_test_normalized.json"
- All predictions are performed in sentence-level.
- Namu Park (npark95@uw.edu)