Welcome to the repository for the survey paper, "Generative Models for Medical Data Synthesis: A Systematic Review". This repository provides links to the papers and code referenced in the survey, along with detailed summaries of the evaluation methods used across various data modalities.
Generative models such as GANs, VAEs, Diffusion Models, and LLMs have revolutionized the synthesis of medical data, including:
- EHR (Electronic Health Records) for tabular data.
- Signals such as ECG and PPG.
- Imaging data, including dermoscopic, mammographic, ultrasound, CT, MRI, and X-ray images.
- Text for clinical notes and radiology reports.
This repository is structured to:
- Provide easy access to the papers and code repositories.
- Highlight evaluation methods for generative models in each modality.
- Synthesis applications and purpose of synthesis
- Electronic Health Records (EHR)
- Signals
- Images
- Text
- Evaluation Metrics and Techniques
- Contributing
- License
Generative models in medical data synthesis can be broadly categorized into unconditional and conditional models:
- Unconditional Models: These models take a random variable as input and generate data without additional context or guidance.
- Conditional Models: These models incorporate external information, such as images, text, semantic maps, class labels, attributes, or signals, to guide the generation process.
Data Type | Synthesis Application | Description | Examples |
---|---|---|---|
EHR | Longitudinal EHR | Medical codes from multiple patient visits. | Patient diagnostic history across hospital visits. |
Aggregated EHR | Longitudinal data condensed into a single row. | Summary of all patient visits in a single record. | |
Time-dependent EHR | Time-series readings from a single patient visit. | Vitals recorded during a hospital stay. | |
Snapshot EHR | A single snapshot focusing on specific patient attributes. | Demographic details with selected health metrics. | |
Imaging & Signals | Inter-modal Translation | Converts data from one modality to another. | CT to MRI, ECG to PPG. |
Intra-modal Translation | Translates data within the same modality. | T1-weighted MRI to T2-weighted MRI, single-lead ECG to 12-lead ECG. | |
Class or Semantic Map Synthesis | Generates data based on class labels or segmentation masks. | Brain MRI with a tumor generated from a tumor mask, ECG labeled as "arrhythmia." | |
Attribute-based Synthesis | Generates data based on patient-specific attributes (e.g., age, sex, BMI). | Personalized synthetic brain MRIs or ECG signals. | |
Text-based Synthesis | Integrates clinical text into synthetic data generation. | Generating X-rays or ECG signals from textual descriptions like "moderate bilateral pleural effusion." | |
Text | NLP Enhancement | Improves tasks like NER, information extraction, summarization, and question answering. | Improving NER on clinical notes with synthetic text. |
Text Augmentation | Generates additional clinical notes, discharge summaries, or reports. | Augmenting patient reports when real data is limited. | |
Text De-identification | Removes or replaces PII while preserving utility and privacy. | Replacing names, addresses, or diagnoses in clinical notes. |
The repository contains 12 tables categorized as follows:
Table Number | Category | Modality |
---|---|---|
1 | Tabular Data | Electronic Health Records (EHR) |
2-3 | Signals | ECG, PPG, and other physiological signals |
4-11 | Images | Dermoscopic, mammographic, ultrasound, CT, MRI, X-ray, and other imaging |
12 | Text | Clinical notes and radiology reports |
Each table includes the application, model type, technology, Paperlinks, code repositories, and evaluation methods.
Application | Type | Technology | Paper | Code | Evaluation |
---|---|---|---|---|---|
Patient Demographics Gen. | GAN | Tabular GAN | Paper | Code | Fidelity (MSE) |
Disease Progression Model | VAE | Bayesian VAE | Paper | Code | Fidelity (KL Divergence), Privacy (k-Anonymity) |
Type | Application | Architecture | Paper Link | Code Link | Evaluation | Date |
---|---|---|---|---|---|---|
GAN | Intra-translation | Bi-LSTM and CNN | arXiv | Code | F | 2023-09 |
DM | Class-conditional | DSAT-ECG | Paper | U, F | 2023-09 | |
Other | Unconditonal | Bi-LSTM and CNN | Paper | U | 2023-08 | |
DM | Inter-translation | Region-Disentangled Diffusion Model (RDDM) | arXiv | Code | U, F, C | 2023-08 |
DM | conditioning on other ECG statements;prioir knowledge | SSSD-ECG | Paper | Code | U, Q | 2023-06 |
DM | Class-conditional | DDPM-based:DiffECG | arXiv | U, F, Q | 2023-06 | |
GAN | Intra-translation | StarGAN v2 | arXiv | Original | U, Q | 2023-06 |
DM | Unconditonal | image-based:DDPM | arXiv | Code | U, F | 2023-05 |
GAN | Unconditonal | LSTM-based:TS-GAN | Paper | U, F, Q | 2023-04 | |
GAN,VAE | Class-conditional | CVAE,CWGAN | Paper | U | 2023-04 | |
VAE,GAN | text-to-signal | Auto-TTE | arXiv | Code | U, F, D, Q | 2023-03 |
GAN,AE | Inter-translation | classical GAN,adversarial AE,modality transfer GAN | Paper | U, F, Q | 2023-02 | |
GAN | Class-conditional | WGAN-GP-based:AC-WGAN-GP | arXiv | Code | U | 2022-11 |
GAN | Clinical Knowledge | WGAN-GP-based:CardiacGen | arXiv | Code | U, F, C, P | 2022-11 |
GAN | Unconditonal | classic GAN , DC-DC GAN , BiLSTM-DC , AE/VAE-DC , WGAN | arXiv | Code | U, F, Q, C | 2022-08 |
GAN | Unconditonal | image-based:TTS-GAN | Paper | Code | F, Q | 2022-06 |
GAN | conditioning on other ECG statements;prioir knowledge | Conditional GAN | Paper | U, Q | 2022-05 | |
VAE | specific subject characteristics | cVAE | Paper | F | 2022-04 | |
GAN,VAE | Class-conditional | PHYSIOGAN | arXiv | Code | U, F, D | 2022-04 |
GAN | Class-conditional | DCCGAN (Deep convolutional condtional GAN) | Paper | U, F, Q | 2022-02 | |
GAN | Unconditonal | WaveGAN,Pulse2Pulse | Paper | Code | P | 2021-11 |
GAN | Unconditonal | Composite GAN:LSTM-GAN and DCGAN | Paper | U | 2021-08 | |
GAN | Unconditonal | LSTM-based:BiLSTM | Paper | F | 2021-06 |
Type | Application | Architecture | Paper Link | Code Link | Evaluation | Date |
---|---|---|---|---|---|---|
DM | Unconditional | LDM | Paper | Code | F | 2023-10 |
DM | Class-conditional | DDPM | Paper | U, F, C | 2023-08 | |
DM | Conditioned STFT spectrograms | DiffEEG | arXiv | U, F | 2023-06 | |
VAE | Unconditional | causal recurrent CAE (CRVAE) | arXiv | Code | U, F, Q | 2023-01 |
GAN | Class-conditional | Conditional Wasserstein GAN | Paper | U | 2022-03 | |
GAN | Unconditional | temporal GAN(TGAN) | Paper | U, Q | 2022-02 | |
Language Model | Unconditional | GPT2 | Paper | Code | U | 2021-02 |
This section includes papers for dermoscopic, mammographic, ultrasound, CT, MRI, X-ray, and multi-modal imaging data.
Type | Application | Architecture | Paper Link | Code Link | Evaluation | Date |
---|---|---|---|---|---|---|
GAN | Unconditional | Pgan | Paper | U, F | 2023-10 | |
Diffusion Model | text-to-image | LDM, Stable Diffusion , Fine tuned stable diffusion | arXiv | U | 2023-08 | |
GAN | Unconditional | StyleGAN2-ADA | arXiv | Code | F, Q | 2023-03 |
Diffusion Model | text-to-image | LDM | arXiv | U, Q | 2023-01 | |
Diffusion Model | text-to-image | DALL-E2 | arXiv | U | 2022-11 | |
GAN | majority to minority conversion | CycleGAN | Paper | U, F | 2022-09 | |
GAN | Class conditional | StyleGAN2-ADA | Paper | Code | U, F | 2022-09 |
GAN | Class conditional | StyleGAN2-ADA | arXiv | Code | U, F, D, Q, P | 2022-08 |
GAN | Unconditional | StyleGAN2 | Paper | U, F | 2022-04 | |
GAN | Class conditional | cGAN | Paper | U | 2021-12 | |
GAN | Unconditional | SLA-StyleGAN | Paper | U, F, Q | 2021-01 |
Type | Application | Architecture | Paper Link | Code Link | Evaluation | Date |
---|---|---|---|---|---|---|
GAN | Class conditional | Cgan | Paper | U | 2024-01 | |
Diffusion Model | text-to-image | FineTuned_StableDiffusion | Paper | Code | F, Q | 2023-06 |
GAN | Intra-translation | CycleGAN | Paper | Code | U, F, Q, P | 2023-01 |
GAN | Intra-translation | complete representation GAN (CR-GAN) | Paper | Original | F, P | 2022-11 |
GAN | Intra-translation | Pix2Pix | Paper | U, F, P | 2022-11 | |
GAN | Intra-translation | pGAN variant | Paper | U | 2022-06 | |
GAN | Class conditional | ROImammoGAN | Paper | F, P | 2022-04 | |
GAN | Intra-translation | HRGAN, based on CycleGAN | Paper | U | 2022-04 | |
GAN | Unconditional | DCGAN,WGAN-GP | Paper | U, P | 2022-03 | |
GAN | Intra-translation | Pix2Pix | Paper | U, P | 2021-12 | |
GAN | Intra-translation | DCGAN , InfillingGAN | Paper | Original | U, F, P | 2021-04 |
Type | Application | Architecture | Paper Link | Code Link | Evaluation | Date |
---|---|---|---|---|---|---|
GAN | inter-Translation | ApGAN | Paper | U, F, Q, C | 2023-10 | |
VAE | inter-Translation | MHVAE | Paper | Code | F | 2023-10 |
GAN | Class conditional | GAN-CA | Paper | Code | U | 2023-08 |
gan | Class conditional | Phased GAN | Paper | U | 2023-07 | |
Diffusion Models | Class conditional | DDPM | Paper | Code | U | 2023-05 |
Diffusion Model and GAN | Unconditional | DSR-GAN,TB-GAN | arXiv | Code | F | 2023-04 |
gan | intra-translation | U-net based gen | Paper | Code | U, F, Q | 2023-02 |
GAN | inter-Translation | CycleGAN based | Paper | U | 2022-12 | |
GAN | inter-Translation | 3D Pix2pix | Paper | U, Q | 2022-09 | |
GAN | Unconditional | StyleGAN2 variants | Paper | U, F | 2022-07 | |
gan | intra-translation | spGAN | Paper | Code | U, F | 2022-04 |
GAN | inter-Translation | CycleGAN | Paper | U, F | 2022-02 | |
gan | intra-translation | pix2pix based | Paper | F, Q | 2022-01 | |
gan | intra-translation | PSFFGAN | Paper | F, Q | 2022-01 | |
H:GAN,VAE | Unconditional | Improved α-WGAN-GP | Paper | U, F | 2021-11 | |
GAN | Unconditional | stylegan2-ada | Paper | Code | U, F | 2021-11 |
GAN | Unconditional | StackGAN | Paper | U, F, Q | 2021-07 | |
GAN | Unconditional | TripleGAN | Paper | U, Q | 2021-02 |
Application | Type | Modality | Technology | Paper | Code | Evaluation |
---|---|---|---|---|---|---|
Lesion Segmentation | GAN | Dermoscopic | DCGAN | Paper | Code | Fidelity (IoU), Diversity (IS), Clinical Review |
Application | Type | Modality | Technology | Paper | Code | Evaluation |
---|---|---|---|---|---|---|
Lesion Segmentation | GAN | Dermoscopic | DCGAN | Paper | Code | Fidelity (IoU), Diversity (IS), Clinical Review |
Application | Type | Modality | Technology | Paper | Code | Evaluation |
---|---|---|---|---|---|---|
Lesion Segmentation | GAN | Dermoscopic | DCGAN | Paper | Code | Fidelity (IoU), Diversity (IS), Clinical Review |
Application | Type | Modality | Technology | Paper | Code | Evaluation |
---|---|---|---|---|---|---|
Lesion Segmentation | GAN | Dermoscopic | DCGAN | Paper | Code | Fidelity (IoU), Diversity (IS), Clinical Review |
Type | Application | Architecture | Paper Link | Code Link | Evaluation | Date |
---|---|---|---|---|---|---|
DM | Conditioned on other images | Conditional DM | Paper | U, F, P | 2023-11 | |
GAN and DM | Unconditional | StyleGAN,DDPM | arXiv | F, P | 2023-10 | |
DM | Text guided conditioning | EMIT-Diff | arXiv | U, F, P | 2023-10 | |
DM | Conditioned on other Diffuison Models | TPDM | arXiv | Code | F, P | 2023-09 |
GAN | Combining tabular and imaging data | αGAN and CTGAN | arXiv | U, Q, P | 2023-08 | |
VAE,DDPM | Unconditional | VQ-GAN followed by DDPM | Paper | Code | U, D, Q, C, P | 2023-05 |
DM | Unconditional | MT-DDPM | Paper | U, F, D, Q, P | 2023-05 | |
DM | Attribute conditional | DDPM | Paper | U, P | 2023-04 | |
DM | Text guided conditioning | FineTuned StableDiffusion | arXiv | U, F, P | 2023-03 | |
DM | class or label conditional | based on Stable Diffusion | Paper | Code | U, F, D | 2022-12 |
GAN | class or label conditional | HA-GAN | Paper | Code | U, F, D, C | 2022-08 |
GAN | class or label conditional | Multiple GANs | Paper | U, F, Q, P | 2021-05 |
Application | Type | Technology | Paper | Code | Evaluation |
---|---|---|---|---|---|
Clinical Note Gen. | LLM | GPT-3 | Paper | Code | Language Coherence (BLEU), Privacy (k-Anonymity) |
Generative models in medical data synthesis are evaluated using a variety of metrics:
- Fidelity (F):
- Fidelity of synthetic data to real data, measured using metrics like:
- MSE, IoU, Dice Score, PSNR.
- Expert review for clinical validity.
- Fidelity of synthetic data to real data, measured using metrics like:
- Diversity (D):
- Diversity of generated data using metrics like:
- FID, Inception Score (IS), Variance measures.
- Diversity of generated data using metrics like:
- Clinical Validity (C):
- Reviewed by domain experts for clinical significance and utility.
- Privacy (P):
- Privacy-preserving evaluation methods such as:
- k-Anonymity, Differential Privacy (DP).
- Privacy-preserving evaluation methods such as:
We welcome contributions to keep this repository updated with new research and implementations. You can contribute by:
- Adding new papers or implementations.
- Proposing enhancements to the structure or content.
- Submitting pull requests for corrections.
Please follow our Contribution Guidelines.
This repository is licensed under the MIT License.
We thank all researchers and practitioners contributing to advancements in medical data synthesis through generative models. This repository is part of our commitment to promote collaboration and open research in this field.