feat/sample data for demos #96

K-Beicher · 2024-01-10T14:28:06Z

This PR adds three csvs created with Synthea for demonstrating Seedcase (Sprout for now).

Link to data description: https://github.com/synthetichealth/synthea/wiki/CSV-File-Data-Dictionary

lwjohnst86

Not sure this is the best repo for this. It will definitely need to easily accessible in the sprout repo.

I made some comments on the data files. Could you also add a README or some file with instructions on what you did to get that data?

lwjohnst86 · 2024-01-11T10:44:49Z

synthea_data/organizations.csv

This data we don't need. You can delete this.

I brought it over so that we would have a small data set that we can use to work on primary and foreign keys. I agree that we don't need that type of data, but the rest of the files are too big to bring over.

lwjohnst86 · 2024-01-11T10:46:57Z

synthea_data/patients.csv

There's a lot of variables in here that we don't need at all. Like name, address, Drivers license, SSN, Passport, race, birthplace, lat, and lon. Basically any really sensitive/specific "personally identifying" information, since Sprout isn't designed around those use cases.

lwjohnst86 · 2024-01-11T10:48:52Z

synthea_data/conditions.csv

This is fairly ok, but much too big of a dataset. More likely, the ID and Encounter variables are taking up most of the space. Is there any way to use smaller/simpler IDs formats? This looks like UUID or GUIDs are used, which are excessive for our purposes here.

Unfortunately no. The only way to bring it down is to go down in number of participants. I suspect that if we ran it with only 1000 patients we'd get a more manageable data set.

K-Beicher · 2024-01-15T08:16:44Z

This has been superseded by the creation of the Data repo.

K-Beicher · 2024-01-15T08:17:27Z

@lwjohnst86 will you remove the branch associated with this, please?

lwjohnst86 · 2024-01-15T08:23:57Z

Yes for sure!

add three csv files

6c7653f

K-Beicher requested a review from a team as a code owner January 10, 2024 14:28

github-actions bot assigned K-Beicher Jan 10, 2024

K-Beicher linked an issue Jan 10, 2024 that may be closed by this pull request

Potential example dataset to use for testing/demo'ing seedcase #49

Closed

signekb changed the title ~~add three csv files~~ feat/sample data for demos Jan 10, 2024

lwjohnst86 requested changes Jan 11, 2024

View reviewed changes

K-Beicher closed this Jan 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat/sample data for demos #96

feat/sample data for demos #96

K-Beicher commented Jan 10, 2024 •

edited

Loading

lwjohnst86 left a comment

lwjohnst86 Jan 11, 2024

K-Beicher Jan 11, 2024

lwjohnst86 Jan 11, 2024

lwjohnst86 Jan 11, 2024

K-Beicher Jan 11, 2024

K-Beicher commented Jan 15, 2024

K-Beicher commented Jan 15, 2024

lwjohnst86 commented Jan 15, 2024

feat/sample data for demos #96

feat/sample data for demos #96

Conversation

K-Beicher commented Jan 10, 2024 • edited Loading

lwjohnst86 left a comment

Choose a reason for hiding this comment

lwjohnst86 Jan 11, 2024

Choose a reason for hiding this comment

K-Beicher Jan 11, 2024

Choose a reason for hiding this comment

lwjohnst86 Jan 11, 2024

Choose a reason for hiding this comment

lwjohnst86 Jan 11, 2024

Choose a reason for hiding this comment

K-Beicher Jan 11, 2024

Choose a reason for hiding this comment

K-Beicher commented Jan 15, 2024

K-Beicher commented Jan 15, 2024

lwjohnst86 commented Jan 15, 2024

K-Beicher commented Jan 10, 2024 •

edited

Loading