Skip to content

Remote data (deprecated)

Brian Wandell edited this page Aug 31, 2024 · 2 revisions

The RDT methods have been deprecated.

See mrtInstallSampleData and related for the new approach which is based on the OSF download and API.

We need to write a page for OSF Sample Data

Deprecated

For several reasons, including tutorials and unit testing, it is useful to have access to specific data sets. The size of the MRI data needed for these purposes can be too large to store in a git repository. We use cloud storage, instead. And we interact with the data using the Matlab Remote Data Toolbox.

Data are stored on an Amazon machine running an Archiva server. You can browse the vistasoft repository. Scripts illustrating how to use the Remote Data Toolbox (RDT) are in the vistasoft/rdt directory.

Data used by vistasoft and related programs (e.g., AFQ) can be downloaded from the server freely. You must have permission to upload new data to the server. Contact Brian Wandell if you feel you need permission.

The remote data toolbox seems to require MATLABr2013b or later to work

Quick start

Here is a simple way to perform downloads.

To download the data in a matlab file, (the file "mrInit_params.mat" in the directory vistadata/function/mrBOLD_01), use these commands:

rd = RdtClient('vistasoft')                           % Create RDT object
rd.crp('/vistadata/functional/mrBOLD_01');            % Sets the default directory
data = rd.readArtifact('mrInit_params','type','mat'); % Retrieves the matlab data

The variables in the matlab file "mrInit_params" are returned as a struct, data.

To download a non-matlab file (the file "t1.nii.gz" in the directory vistadata/diffusion/sampleData/t1) use these commands

rd = RdtClient('vistasoft');    % If you already ran this, no need to run it again.
rd.crp('/vistadata/diffusion/sampleData/t1');
fname = rd.readArtifact('t1.nii','type','gz','destinationFolder',pwd);

The file t1.nii.gz is placed in the working directory (pwd), but you can specify another directory (e.g.,

fname = rd.readArtifact('t1.nii','type','gz','destinationFolder',tempdir);

In this case, you can read the data using

ni = niftiRead(fname);   % niftiRead is the vistasoft nifti reader

The RDT Explained

Initializing the RDT object

The RDT object for vistasoft data is created this way

rd = RdtClient('vistasoft')

There are several different repositories on this site in addition to the one for vistasoft. Hence, you need to specify the argument.

The rd object that is returned has the variables stored, and as a Matlab object it includes a set of methods that interact with the remote data from the Matlab command line. Creating the object with the 'vistasoft' argument populates the fields of the rd object with the appropriate values.

The repository values are specified in a json file stored in the vistasoft/rdt directory. These can be changed, but most users will not need to either login or change most of these values.

As you access the remote data, one parameter you will want to change the remote directory. More about this below.

Directories

The vistasoft files are stored in a collection of directories. You can list the full path of all the vistasoft directories using this command:

rd.listRemotePaths

The files that are used for vistasoft testing are in the vistadata directory. These used to be stored in the vistadata SVN repository that is now deprecated.

validation are stored in the repository inside the directory 'validate'. Files for different types of validation are contained in sub-folders, such as 'validate/fmri'.

You can change the current remote path (crp) using the command

rd.crp('/vistadata')

Inspecting the Vistasoft data repository

You can see the files in the vistasoft repository using a web-browser. When you use this command

rd.openBrowser

the browser will open to the current remote path. You will be able to see the various files and directory structure. Files can be downloaded by clicking on them in the browser.

I will explain this command another time rd.openBrowser('fancy',true)

Artifacts

The data are stored on the server as 'artifacts'. Typically, an artifact is a single file along with metadata d generated by the Archiva Maven server to describe the file. Artifacts are stored with version names (strings). It can take a while to get used to the syntax of the directory tree where the data are stored.

For example, there is a file stored inside of 'vistadata/validate/fmri' called 'epi01.nii.gz'. You can see the directory containing this artifact at this link. Notice the complexity. The principal file had two extensions (.nii.gz), so Archiva renamed it and called the artifact epi01.nii. The file type is 'gz'. The name of the file in the artifact is changed. There is a version number ('1') in the link. And Archiva generated many associated metadata files to make sure it didn't screw up.

Multiple files in an artifact

The RDT typically stores files with the same base name but different extensions as a single artifact. Thus, if you have a tif-file and a mat-file with the same name, they will be stored in the same artifact. To retrieve the different files, you must specify the 'type'.

File types and file names

To continue with the epi01 example, the original file has a concatenated .nii and .gz extension (.nii.gz). The file type is 'gz'. The Archiva database adjusts the file name for storage. To retrieve the file from the server, and have it be returned with its original name, you must use the flag 'destinationFolder', as in

fname = rd.readArtifact(artifactID,'type','gz','destinationFolder',folderName);

Listing the remote data

artifacts = rd.listArtifacts

Data types

Downloading mat-files

Downloading other files (e.g., nii.gz)

Downloading image files

Downloading json files

Clone this wiki locally