-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat/sample data for demos #96
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure this is the best repo for this. It will definitely need to easily accessible in the sprout repo.
I made some comments on the data files. Could you also add a README or some file with instructions on what you did to get that data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This data we don't need. You can delete this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I brought it over so that we would have a small data set that we can use to work on primary and foreign keys. I agree that we don't need that type of data, but the rest of the files are too big to bring over.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a lot of variables in here that we don't need at all. Like name, address, Drivers license, SSN, Passport, race, birthplace, lat, and lon. Basically any really sensitive/specific "personally identifying" information, since Sprout isn't designed around those use cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fairly ok, but much too big of a dataset. More likely, the ID and Encounter variables are taking up most of the space. Is there any way to use smaller/simpler IDs formats? This looks like UUID or GUIDs are used, which are excessive for our purposes here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately no. The only way to bring it down is to go down in number of participants. I suspect that if we ran it with only 1000 patients we'd get a more manageable data set.
This has been superseded by the creation of the Data repo. |
@lwjohnst86 will you remove the branch associated with this, please? |
Yes for sure! |
This PR adds three csvs created with Synthea for demonstrating Seedcase (Sprout for now).
Link to data description: https://github.com/synthetichealth/synthea/wiki/CSV-File-Data-Dictionary