Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model card #7

Merged
merged 5 commits into from
Aug 3, 2023
Merged

Model card #7

merged 5 commits into from
Aug 3, 2023

Conversation

brandomr
Copy link
Contributor

This addresses #6. The docstring is descriptive:

Profile model with MIT's profiling service. This takes in a paper and code artifact and updates a model (AMR) with the profiled metadata card. It requires that the paper has been extracted with /pdf_to_text and the code has been converted to an AMR with /code_to_amr

NOTE: if nothing the paper is not extracted and the model not created from code this WILL fail.

Expects:

  • model_id: the id of the model to profile
  • paper_artifact_id: the id of the paper artifact
  • code_artifact_id: the id of the code artifact

Returns:

  • the AMR associated with model_id will be enriched with a new description and the metadata will now have a field called card and so will look like:
 },
  "metadata": {
    "card": {
      "DESCRIPTION": "Mathematical model for the spread of COVID-19 in different communities.",
      "AUTHOR_INST": "UNKNOWN",
      "AUTHOR_AUTHOR": "Ian Cooper, Argha Mondal, Chris G. Antonopoulos",
      "AUTHOR_EMAIL": "UNKNOWN",
      "DATE": "UNKNOWN",
      "SCHEMA": "The SIR model with surge periods to accommodate new epicentres of the virus.",
      "PROVENANCE": "The model is based on the well-known susceptible-infected-removed (SIR) model and is used to investigate the spread of COVID-19 within communities.",
      "DATASET": "Data from China, South Korea, India, Australia, USA, Italy, and the state of Texas in the USA.",
      "COMPLEXITY": "UNKNOWN",
      "USAGE": "The model can provide insights into the time evolution of the spread of the virus and can be used to assess the impact of intervention measures.",
      "LICENSE": "UNKNOWN"
    }
  }

…ontainer name for the extraction rest container to avoid conflict with TDS
@YohannParis
Copy link
Member

Should we ditch the code_artifact_id parameters?

  • If the model hasn't one in its metadata TDS returns a 404 error explaining that the model doesn't have an existing code associated to it.
  • Then hmi-client could offer to the user to upload/paste one
  • hmi-server send the request to TDS with a model_id
  • TDS create a code artifact associated with the model
  • hmi-server can now re-send the original request to TDS

@brandomr
Copy link
Contributor Author

@YohannParis we could definitely do something like this but it implies that we will have a provenance relation between the code and the AMR ("AMR is extracted from the code"). We'd then need to check the provenance graph for such a relation to identify the correct code artifact to submit to MIT.

I can implement the lookup but will also need to implement setting provenance correctly on the /code_to_amr endpoint. I'll work on both these and can then remove code_artifact_id.

If there is no code associated with the AMR I'll test sending an empty string like No code associated with model to MIT.

Does that sound reasonable?

@tangopapatime
Copy link

Sorry just sliding into this conversation so I can track this feature :). Also, for what's its worth your latest proposal sounds reasonable to me.

@YohannParis
Copy link
Member

It all sounds reasonable, but I remembered you talked about doing provenance later at once. So I don't want to move faster on this. It was an idea. Let's keep it bare bone for now and improve on workflow later

@brandomr
Copy link
Contributor Author

I actually think this is a very reasonable and good use case for provenance. I'll do a bit of exploration by Wednesday and will report back--either with an update to this PR or by merging it as is.

@brandomr
Copy link
Contributor Author

brandomr commented Aug 3, 2023

Updates using Provenance

Now, when a model is created from code with /code_to_amr the extraction service adds a provenance relationship in TDS between the code artifact and the created model.

When the /profile_model endpoint (MIT model card) is called only a paper artifact ID and the model ID should be provided as it is assumed that a model was created from code and this code artifact can be fetched via a new provenance query (requires DARPA-ASKEM/data-service#298).

Graceful failure

If no associated code artifact is found via the provenance query, a "blank" code snippet is created No available code associated with model. After some testing, it seems this works just fine as the most information apparently used by MIT is just from the paper itself.

Utility

This is a good test case for how/when we might leverage provenance so is worth considering but does require updates to TDS to support the new provenance relationship between artifacts and models.

@brandomr brandomr merged commit 44a4c17 into main Aug 3, 2023
2 checks passed
@brandomr brandomr deleted the model-card branch August 3, 2023 16:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants