Skip to content

Official repository of "Enhancing the Utility of Privacy-Preserving Cancer Classification using Synthetic Data"

License

Notifications You must be signed in to change notification settings

RichardObi/mammo_dp

Repository files navigation

In MICCAI 2024 Deep Breast Workshop on AI and Imaging for Diagnostic and Treatment Challenges in Breast Care.

overview

Getting Started

Datasets

If you prefer to directly use our processed dataset consisting of extracted malignant and benign masses, you can find our train, validation, and test dataset in dataset16062024.

If you would like to setup your own data processing pipeline, you can find the CBIS-DDSM Dataset used in this study on The Cancer Imaging Archive (TCIA). The Breast Cancer Digital Repository (BCDR) Dataset, which was used as external test set in this study, is available upon request at the BCDR Website.

Synthetic Data

You can find the synthetic data used in this study in the folder extension/synthetic_data/cbis-ddsm.

overview

If you would prefer to generate your own synthetic data using our MCGAN model, you can do so via the medigan library, which loads the model weights used in this study from Zenodo and generates malignant and benign masses.

To generate the masses, simply run:

pip install medigan
# import medigan and initialize Generators
from medigan import Generators
generators = Generators()

# generate 1000 samples with model 8 (00008_C-DCGAN_MMG_MASSES). 
# Also, auto-install required model dependencies.
generators.generate(model_id='00008_C-DCGAN_MMG_MASSES', num_samples=1000, install_dependencies=True)

Running Experiments

Classification Code

  • Script to create an environment and run all experiments reported in the paper.
  • Configs to run the different swin transformer experiments.
  • Config description excel file explaining the different dbr experiments alongside the respective experimental results.
  • Code to train, validate and test our swin transformer classification model with or without differentially-private stochastic gradient descent.
  • CBIS-DDSM Train-test-splits and BCDR external testset. Final dataset with splits is also available here.
  • Paths to the original datasets after downloading them locally.

Synthesis Code

  • Script to create an environment and train the Malignancy-Conditioned GAN (MCGAN) e.g. used to then create the synthetic data reported in the paper.
  • Config to define the setup and hyperparameters for a MCGAN training run.
  • Code to start an MCGAN training run.
  • Code and Checkpoint that can be used for local setup to run inference of MCGAN (by running the __ init__.py file).
  • FRD metric used in the paper to evaluate the synthetic data based on radiomics imaging biomarker variability between real and synthetic image distributions.

Summary

poster presentation

Reference

Please consider citing our work if you found it useful for your research:

@article{osuala2024enhancing,
  title={{Enhancing the Utility of Privacy-Preserving Cancer Classification using Synthetic Data}},
  author={Richard Osuala and Daniel M. Lang and Anneliese Riess and Georgios Kaissis and Zuzanna Szafranowska and Grzegorz Skorupko and Oliver Diaz and Julia A. Schnabel and Karim Lekadir},
  journal={arXiv preprint arXiv:2407.12669},
  url={https://arxiv.org/abs/2407.12669},
  year={2024}
  }

Acknowledgements

This repository borrows and extends the code from the mammo_gans repository.