This document describes the data import JSON format as well as how to use it to import arbitrary data into the MIQA application.
The project contains a trivial sample dataset, located in the sample_data
directory. The sample.json
file in that directory is a descriptor file
that describes the organization of the sample dataset in sufficient detail
for MIQA to be able to import it. By looking at the layout of the sample_data
directory structure while examining the sample.json
file, you should get a
good idea of the meaning of the various keys in the descriptor file. However,
there is also a schema for the data import description format located in
server/miqa_server/schema/data_import.py
, and the schema specifies the
allowed syntax. The schema is also used to validate the json file before it
can be used to import a dataset into the MIQA application.
Data to be imported into MIQA can be thought of as existing in a session. A
session is composed of one or more experiments, and each experiment is made up
of one or more scans. A scan is either a 3D image file, or may have a time
component as well (e.g. FMRI). In either case, the heavy data associated with
a scan is usually one or more nifti files (.nii.gz
).
The first thing to note about the data import format is that it contains a
top level key data_root
, which should point to a root location on the file
system underneath which all the experiments and scans are housed. Usually
the json description file lives there as well, but that is not required.
The next thing to notice is that the scans
key is a list corresponding to
individual scans, and that each scan has a path
which should be the
relative path under data_root
where the images for that scan live. By
setting path
to the empty string, you're telling MIQA that all the scans
live together in the data_root
directory.
Within a scan, there are two methods for describing the images that make up
that scan: images
and imagePattern
. Using imagePattern
is more
efficient when MIQA is importing the data for the first time, but has some
constraints which prevent you from using it in some cases. images
is a
little less efficient, but works in all cases.
If the nifti files within a scan are named in such a way that sorting them
alphabetically results in the properly ordered sequence of images, and if you
can write a simple regular expression to find them all, then you should use
imagePattern
.
For example, if you have 12 images named as follows:
image01.nii.gz
image02.nii.gz
image03.nii.gz
image04.nii.gz
image05.nii.gz
image06.nii.gz
image07.nii.gz
image08.nii.gz
image09.nii.gz
image10.nii.gz
image11.nii.gz
Then your scan
item in the JSON descriptor can specify:
"imagePattern": "nii\\.gz"
Which means that any file in the path
directory that has "nii.gz" somewhere
in the filename belongs to the scan. The imagePattern
key gives you the full
power of regular expressions, in case a more complicated expression is required
to pick out the images associated with a particular scan.
Note that in the example filenames above, there is proper left zero padding in
the image numbers so that if the filenames are sorted alphabetically, they'll
come out in the order shown above, which is likely the right order for a time
sequence of images. Note also that without that left zero padding, image10.nii.gz
and image11.nii.gz
would come right after image1.nii.gz
and right before
image2.nii.gz
.
Consider the case where those same images are named another way, however:
one.nii.gz
two.nii.gz
three.nii.gz
four.nii.gz
five.nii.gz
six.nii.gz
seven.nii.gz
eight.nii.gz
nine.nii.gz
ten.nii.gz
eleven.nii.gz
In a situation like the one above, even though the same imagePattern
could be
used to pick out the images associated with the scan, alphabetically sorting the
files by name would not result in the proper ordering of the images. In this
situation, you should use images
instead of imagePattern
to describe the
images which make up the scan.
In cases like the one described above, when you cannot pick out the images
associated with a scan from within a directory using a pattern match, or when
the image order would be incorrect using alphabetical sorting, you can simply
provided a list of filenames using the images
key:
"images": [
"one.nii.gz",
"two.nii.gz",
"three.nii.gz",
"four.nii.gz"
]
This lets you specify unambiguously the images you want from the directory as well as the order in which they belong.
The JSON format also allows you to provide lists of sites
and experiments
,
each instance of which is identified by an id
field. Then scans can refer
to those sites or experiments using site_id
and experiment_id
keys.
If you have arbitrary data in the form of key/value pairs you want to associate
with a scan, those can be specified using the user_fields
key within the
scan
. For example:
"user_fields": {
"MQy:VRX": "0.94",
"MQy:VRY": "0.94",
"MQy:VRZ": "1.2",
"MQy:ROWS": "256"
}
These values will be picked up by the import process and added as individual metadata items to the folder containing the scan images.