From 2e53df78e580131046dc8db7f7638063db1f5045 Mon Sep 17 00:00:00 2001
From: Eric Kerfoot <17726042+ericspod@users.noreply.github.com>
Date: Thu, 25 Jul 2024 18:56:25 +0100
Subject: [PATCH] Adding metadata schema to the code base itself (#7409)

Fixes #7303 #6959.

### Description

This adds the schema file into the code base (but this maybe should be
elsewhere). The changes implement a number of new things:

* Moved definitions into a `$defs` section per the JSON schema standard
* Permits multiple input arguments and return results from networks with
arbitrary names using the `patternProperties` mechanism
* Allows the types of inputs and outputs to be, additional to just
tensors, numbers, booleans, or strings
* Outputs after post processing can be specified with the
`post_processed_outputs` section if they are significantly changed with
the post-process transforms defined in scripts
* Multiple network IO formats can be specified in addition to
`network_data_format`, these must follow the pattern
`<name>_data_format`
* `required_packages_version` added in addition to
`optional_packages_version`

#7253 depends on this schema change.

### Types of changes
<!--- Put an `x` in all the boxes that apply, and remove the not
applicable items -->
- [x] Non-breaking change (fix or new feature that would not break
existing functionality).
- [ ] Breaking change (fix or new feature that would cause existing
functionality to change).
- [ ] New tests added to cover the changes.
- [ ] Integration tests passed locally by running `./runtests.sh -f -u
--net --coverage`.
- [ ] Quick tests passed locally by running `./runtests.sh --quick
--unittests --disttests`.
- [ ] In-line docstrings updated.
- [ ] Documentation updated, tested `make html` command in the `docs/`
folder.

---------

Signed-off-by: Eric Kerfoot <eric.kerfoot@kcl.ac.uk>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Yiheng Wang <68361391+yiheng-wang-nv@users.noreply.github.com>
---
 docs/source/mb_specification.rst | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/docs/source/mb_specification.rst b/docs/source/mb_specification.rst
index cedafa0d23..56d660e35c 100644
--- a/docs/source/mb_specification.rst
+++ b/docs/source/mb_specification.rst
@@ -63,12 +63,12 @@ This file contains the metadata information relating to the model, including wha
 * **monai_version**: version of MONAI the bundle was generated on, later versions expected to work.
 * **pytorch_version**: version of Pytorch the bundle was generated on, later versions expected to work.
 * **numpy_version**: version of Numpy the bundle was generated on, later versions expected to work.
-* **optional_packages_version**: dictionary relating optional package names to their versions, these packages are not needed but are recommended to be installed with this stated minimum version.
+* **required_packages_version**: dictionary relating required package names to their versions. These are packages in addition to the base requirements of MONAI which this bundle absolutely needs. For example, if the bundle must load Nifti files the Nibabel package will be required.
 * **task**: plain-language description of what the model is meant to do.
 * **description**: longer form plain-language description of what the model is, what it does, etc.
 * **authors**: state author(s) of the model.
 * **copyright**: state model copyright.
-* **network_data_format**: defines the format, shape, and meaning of inputs and outputs to the model, contains keys "inputs" and "outputs" relating named inputs/outputs to their format specifiers (defined below).
+* **network_data_format**: defines the format, shape, and meaning of inputs and outputs to the (primary) model, contains keys "inputs" and "outputs" relating named inputs/outputs to their format specifiers (defined below). There is also an optional "post_processed_outputs" key stating the format of "outputs" after postprocessing transforms are applied, this is used to describe the final output from the bundle if it varies from the raw network output. These keys can also relate to primitive values (number, string, boolean), instead of the tensor format specified below.
 
 Tensor format specifiers are used to define input and output tensors and their meanings, and must be a dictionary containing at least these keys:
 
@@ -89,6 +89,8 @@ Optional keys:
 * **data_source**: description of where training/validation can be sourced.
 * **data_type**: type of source data used for training/validation.
 * **references**: list of published referenced relating to the model.
+* **supported_apps**: list of supported applications which use bundles, eg. 'monai-label' would be present if the bundle is compatible with MONAI Label applications.
+* **\*_data_format**: defines the format, shape, and meaning of inputs and outputs to additional models which are secondary to the main model. This contains the same sort of information as **network_data_format** which describes networks providing secondary functionality, eg. a localisation network used to identify ROI in an image for cropping before data is sent to the primary network of this bundle.
 
 The format for tensors used as inputs and outputs can be used to specify semantic meaning of these values, and later is used by software handling bundles to determine how to process and interpret this data. There are various types of image data that MONAI is uses, and other data types such as point clouds, dictionary sequences, time signals, and others. The following list is provided as a set of supported definitions of what a tensor "format" is but is not exhaustive and users can provide their own which would be left up to the model users to interpret:
 
@@ -124,7 +126,7 @@ An example JSON metadata file:
       "monai_version": "0.9.0",
       "pytorch_version": "1.10.0",
       "numpy_version": "1.21.2",
-      "optional_packages_version": {"nibabel": "3.2.1"},
+      "required_packages_version": {"nibabel": "3.2.1"},
       "task": "Decathlon spleen segmentation",
       "description": "A pre-trained model for volumetric (3D) segmentation of the spleen from CT image",
       "authors": "MONAI team",