-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Align datasets
to models
#46
Comments
During the TA1 working group there were comments that - Some other conversation points - |
Implementing this as an endpoint requires the generation of embeddings over models and datasets which will first be addressed by this TDS issue so is currently blocked |
Challenge
How can we automatically align models to datasets? Specifically, how can we most effectively align elements of a model to features within datasets.
Currently, models and datasets are profiled separately by MIT and SKEMA. Both datasets and models end up having (optional)
groundings
which--for each feature of the data or model--tie it to an element in the TA2 Domain Knowledge Graph (DKG). As far as I know, DKG code lives here.For example, a model may have a compartment called
infected
which is grounded toLet's say there is a dataset that has the feature
infections
which is grounded toThere is no intersection between these groundings, but clearly there is a relationship between
infected
compartment in the model andinfections
feature in the dataset. This makes it potentially challenging to identify relevant data to use for model calibration/simulation since for calibration you must match data to specific model compartments/elements.Potential Solutions
/align_data_to_model
endpoint which, for a givenmodel_id
attempts to find relevant data features on an model element to data feature basis. For example, an SIR model'ssusceptible, infected, and recovered
compartments would be automatically matched and ranked to features (potentially from multiple datasets) based on groundings or whatever other information we can efficiently use.The first approach will fit best inside TDS and is something we may want to do anyway. Vector/semantic search over content besides papers seems quite useful. We could even support semantic code search which would be potentially very useful.
The second approach will fit best inside this repository since it mirrors some of the existing endpoints (e.g. aligning a model to its paper).
Considerations
It is likely that we will need multiple examples of
models
anddatasets
for testing and development. Here is an example model which are often referred to as anAMR
: ASKEM Model Representation.Here is an example data card but note that this data card is not in the canonical dataset format for TDS. We can generate/pull some in the appropriate format--but for now at least this helps get a sense of how DKG groundings roughly appear for data.
The text was updated successfully, but these errors were encountered: