PL-Marker++

This repository contains source code for PL-Marker++, information extraction model for radiology reports used in our paper "A Novel Corpus of Annotated Medical Imaging Reports and Information Extraction Results Using BERT-based Language Models" (Accepted @ LREC-COLING2024, repo: https://github.com/uw-bionlp/CAMIR).

Example input, output data are not included (will be added upon IRB approval). PL-Marker++, which is the augmented version of PL-Marker, provides the classification of subtypes for extracted entities.

Original PL-Marker implementation can be found at https://github.com/thunlp/PL-Marker

Step 1. Download repo and required models

Download current directory
Unzip transformers.zip
Download all models from https://drive.google.com/drive/u/0/folders/1eyaqjrMNUJLxAIHxiYrqX4cCapxgZjPj and put in the same directory

Step 2. Create virtual enviroments

Create 2 seperate Conda environments (both using python=3.8.18) using mspert_req.txt (mspert) and plmarker_req.txt (plmarker). conda create -n mspert_test python=3.8.18 conda activate mspert_test pip install -r ./mspert_req.txt

conda create -n plmarker_test python=3.8.18 conda activate plmarker_test pip install -r ./plmarker_req.txt pip install --editable ./transformers

Step 3. Put radiology reports in "sample_data" folder

Input radiology reports should be located in ./sample_data using .txt file format
sample.txt is randomly selected from mtsamples radiology report (open-source radiology reports)

Step 4. Run shell script

bash ./run_plmarker.sh -> This shell script includes entity extraction, subtype extraction and relation extraction.
Final output file with entity, subtype and relation information is "./example_input_ent_pred_test_normalized_with_RE.json"
- Output with only entity extraction can be found in "./incidentaloma_models/PL-Marker-incidentaloma-bertbase-45/example_input_ent_pred_test.json"
- Output with entity+subtype extraction can be found in "./example_input_ent_pred_test_normalized.json"
All predictions are performed in sentence-level.

Contact

Namu Park (npark95@uw.edu)

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
sample_data		sample_data
LICENSE		LICENSE
README.md		README.md
example_input_ent_pred_test_normalized.json		example_input_ent_pred_test_normalized.json
example_input_ent_pred_test_normalized_with_RE.json		example_input_ent_pred_test_normalized_with_RE.json
inference_00_create_input.py		inference_00_create_input.py
inference_00_create_input.sh		inference_00_create_input.sh
inference_01_pred_entities.sh		inference_01_pred_entities.sh
inference_02_add_subtypes.py		inference_02_add_subtypes.py
inference_02_add_subtypes.sh		inference_02_add_subtypes.sh
inference_03_add_relations.py		inference_03_add_relations.py
inference_03_add_relations.sh		inference_03_add_relations.sh
mspert_req.txt		mspert_req.txt
plmarker_req.txt		plmarker_req.txt
run_ner.py		run_ner.py
run_plmarker.sh		run_plmarker.sh
transformers.zip		transformers.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PL-Marker++

Step 1. Download repo and required models

Step 2. Create virtual enviroments

Step 3. Put radiology reports in "sample_data" folder

Step 4. Run shell script

Contact

About

Releases

Packages

Languages

License

uw-bionlp/PL_Marker_Plus

Folders and files

Latest commit

History

Repository files navigation

PL-Marker++

Step 1. Download repo and required models

Step 2. Create virtual enviroments

Step 3. Put radiology reports in "sample_data" folder

Step 4. Run shell script

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages