Our project leverages SyntheaTM, an open-source tool developed by the MITRE Corporation, to create synthetic hospital discharge data. SyntheaTM uses research-based models to generate rich medical histories for synthetic patients. We extract the hospital visits and create datasets that match the format of administrative data available to healthcare organizations. This synthetic data allows students and researchers to explore patient records without privacy concerns and develop analyses for hospitals to run on their own real data. Our goal is to make it easier for hospitals, public health officials, and researchers to collaborate and gain insights from administrative hospital data, while keeping patient information private.
Version 1
- Project Timeline: Dec 2022 - Feb 2023
- Team Members: Riley Kwong (@rileeki), Ann Epstein (@aepstein27), Lisa Snortheim (@lisasnortheim), Stefanie Eddy
- v1 of the database is the first iteration and is really just a proof of concept. The overall format is there, but many of the fields are left blank.
Version 2
- Project Timeline: Spring 2023 - Fall 2023
- Team Members: Riley Kwong (@rileeki), Travis Haussler (@travishaussler), Ian Tang (@iqtang)
- v2 of the database fleshed out more of the fields that were still empty in v1, explored basic SNOMED - IDC mappings, and included fixed width output translations.
Version 3
- Project Timeline: Fall 2023 - TBD
- Team Members: TBD
- v3 finallizes used fields and documents unpopulatable fields and their failure reasons. The goal of this version will be able to produce data to support actual field experiments and analysis.
Explore our synthetic hospital discharge data with our downloadable dataset. This dataset contains synthetic patient records in the format that California hospitals use to submit abstracted patient records to the California Department of Health Care Access and Information (HCAI).
The Summary Statistic Workbook provides an overview of the synthetic hospital discharge data generated by our project. It includes aggregate statistics on patient demographics and fill rates for each field. The workbook is available for download in Excel format here. (Coming soon!)
- Want to learn more about the project's scope and objectives? Check out the project charter.
- Want to run the program to create your own database? The production guide is coming soon...
- Ready to contribute? Check out the developer guide to see our coding standards and get started contributing.
- Have questions or feedback? Join the discussion on our GitHub Discussions page. Or, you can use the GitHub issues page to report bugs and request new features.
- Looking for internal project management documents? Those are stored in this Google Drive folder.
- Project management: https://github.com/orgs/orchid-initiative/projects/1
- SyntheaTM: https://synthetichealth.github.io/synthea/
- Wiki: https://github.com/orchid-initiative/synthetic-database-project/wiki
- Slack channel (permissions required): synthetic_discharges
- AWS (permissions required): https://rileeki.signin.aws.amazon.com/console
- Note: .jar file is required in folder, but too large to add to our free GitHub account: https://github.com/synthetichealth/synthea/wiki/Basic-Setup-and-Running