Group Surrogate Data Generating Model (GSDGM) and Multivariate Time-series Ensemble Similarity Score (MTESS) Toolbox for MATLAB (Python version is here)
The GSDGM and MTESS Toolbox is a powerful tool for surrogate data generation and multivariate time-series similarity analysis. Three GSDGM algorithms have been implemented to generate group surrogate data: 1) Vector Auto-Regression (VAR) surrogate (R. Liégeois, et al., 2017), 2) Principal Component VAR (PCVAR) surrogate, and 3) Vector Auto-Regressive Deep Neural Network (VARDNN) surrogate (T.Okuno and A.Woodward, 2021). The VAR, PCVAR and VARDNN surrogates all try to fit the common features of the whole data. Thus, the GSDGM approach has the very unique ability to generate group centroid multivariate time-series. This allows us to achieve the most biologically plausible standardized human brain dynamics model based on a large human rs-fMRI dataset from the Human Connectome Project.
MTESS can quantify the similarity between two multivariate time-series. It is composed of the normalized distance or cos-similarity of seven basic statistical properties. These are the standard deviation, auto-correlation, partial auto-correlation, correlation matrix, partial correlation matrix, cross-correlation matrix and partial cross-correlation matrix, and multivariate kurtosis. By considering each of these statistical properties, it is possible to identify which of them differ between two multivariate time-series using a “MTESS radar chart.” The total difference between two multivariate time-series is expressed by a single value [0, 5] (0: not similar to 5: similar)(MTESS), and the difference in each univariate time-series, which we consider as nodes of a network, is expressed as a Node MTESS value.
The combination of GSDGM and MTESS allows us to perform fingerprint analysis of a group of multivariate time-series. The GSDGM can generate the group centroid time-series and MTESS can quantify normality and abnormality based on a comparison between the group centroid and each individual multivariate time-series.
Command line tools
name | description |
---|---|
mtess | Calculate and plot MTESS for a group of multivariate time-series data. |
gsdgm | Generate a group surrogate model (VAR, PCVAR, VARDNN surrogate) and (multivariate time-series) group surrogate data. |
surrogate | Generate univariate and multivariate time-series surrogate data by Random Gaussian (RG), Random shuffling (RS), Fourier Transfor (FT), Amplitude Adjusted FT (AAFT)(J.Theilear et al., 1992), Iterated AAFT (IAAFT)(T.Schreiber and A.Schmitz, 1996), VAR, PCVAR, VARDNN surrogate. |
surrotest | Hypothesis testing based on the monte-carlo method (Linearity test, Gaussian distribution test, Independent and Identically Distributed (I.I.D) test) (J.Theilear and D.Prichard, 1996). |
nii2roisig | Extract fMRI ROI time-series data from NIfTI files (.nii, .nii.gz) with the ROI atlas file (.nii, .nii.gz). |
- MATLAB R2019b or later
- Deep Learning Toolbox ver12.1 or later
- Fuzzy Logic Toolbox ver2.6 or later
- Econometrics Toolbox ver5.3 or later
- Parallel Computing Toolbox ver7.1 or later
- VARDNN Toolbox
Please download the VARDNN Toolbox and "Add Path" in the MATLAB before using GSDGM and MTESS Toolbox.
- Download this Toolbox and VARDNN Toolbox zip files.
- Extract zip files under your working directory <work_path>.
- Run the MATLAB software, and "Add Path" extracted directories (i.e. <work_path>/vardnn-master and <work_path>/mtess-master).
- Move to <work_path>/mtess-master directory and run the following demos.
Demo 1
The first demo shows the calculation of MTESS among time-series data and figure output.
(Copy and paste this command line. Demo data is included in GSDGM and MTESS Toolbox.)
>> mtess --showinsig --showmat --showsig --showprop --shownode data/cx-8x500-demo-surrogate.mat
...
output mat file : results/cx-8x500-demo-surrogate_mtess.mat
This demo calculates MTESS and generates the following figures:
"--showinsig" option shows each multivariate time-series data."--showmat" option shows MTESS matrix and statistical property matrix results.
"--showsig" option shows comparison time-series plot between first data vs. others.
"--showprop" option shows MTESS Polar chart result of first data vs. others.
"--shownode" option shows Node MTESS result of first data vs. others.
Finally, the "mtess" command saves the calculation result in either .mat or .csv format ("--format 0" option).
Demo 2
The second demo shows the calculation of surrogate data, then the calculation of MTESS between original time-series data vs. surrogate data.
>> surrogate -s -f -a -v data/demo-original-8x500.csv
surrogate sample : 1
output mat file : results/demo-original-8x500_var_multi_all.mat
output mat file : results/demo-original-8x500_rs_multi_all.mat
output mat file : results/demo-original-8x500_ft_multi_all.mat
output mat file : results/demo-original-8x500_aaft_multi_all.mat
>> mtess --showmat --showprop --showsig data/demo-original-8x500.csv results/demo-original-8x500_rs_multi_all.mat results/demo-original-8x500_var_multi_all.mat results/demo-original-8x500_ft_multi_all.mat results/demo-original-8x500_aaft_multi_all.mat
...
output mat file : results/demo-original-8x500_mtess.mat
the "surrogate" command reads a single multivariate time-series (8x500) file and generates surrogate data (8x500) by (multivariate) RS surrogate, VAR surrogate, FT surrogate, and AAFT surrogate. Then, the "mtess" command reads the original time-series .csv and surrogate time-series data .mat files, and calculates multivariate time-series similarity.
As can be seen in this result, VAR, FT and AAFT surrogate data (cell number 3-5) vs. original time-series data (cell number 1) shows a higher MTESS (similarity) than RS surrogate data (cell number 2) vs. original data.Demo 3
The third demo shows the calculation of (VAR) group surrogate data based on the Human Connectome Project S500 male rs-fMRI data (132 ROIs). Then MTESS is calculated amongst some samples of rs-fMRI time-series data and group surrogate data.
(Caution: Downloading and MTESS calculation will take some time.)
>> gsdgm --surrnum 2 --showsig http://fmridata.s322.xrea.com/hcp-s500m-var1.mat
downloading http://fmridata.s322.xrea.com/hcp-s500m-var1.mat ...
save cache file : data/cache/fmridata_s322_xrea_com-hcp-s500m-var1.mat
surrogate sample : 1
surrogate sample : 2
output mat file : results/fmridata_s322_xrea_com-hcp-s500m-var1_gsd_var.mat
>> mtess --showmat --showforce --showdend ward --showprop --cache data/demo-fmri-132x1190s.mat results/fmridata_s322_xrea_com-hcp-s500m-var1_gsd_var.mat
...
output mat file : results/demo-fmri-132x1190s_mtess.mat
Other GSDGM .mat files (male, female, algorithms) can be found in Zenodo.
Demo 4
This demo shows creation of a data generating model (GSDGM) from sample rs-fMRI time-series data (132 ROIs). Then MTESS is calculated amongst the group of original time-series data and group surrogate data.
(Caution: MTESS calculation will take a time.)
>> gsdgm -v --lag 1 --surrnum 2 --showsig --showras data/demo-fmri-132x1190s.mat
output group surrogate model file : results/demo-fmri-132x1190s_gsm_var.mat
surrogate sample : 1
surrogate sample : 2
output mat file : results/demo-fmri-132x1190s_gsd_var.mat
>> mtess --showmat --showprop --showdend ward --cache --showforce data/demo-fmri-132x1190s.mat results/demo-fmri-132x1190s_gsd_var.mat
...
save cache of demo-fmri-132x1190s-gsd-var-1
save cache of demo-fmri-132x1190s-gsd-var-2
output mat file : results/demo-fmri-132x1190s_mtess.mat
Demo 5
Linearity test:
- Use FT or AAFT surrogate ("surrogate -f" or "-a" command) to generate 399 surrogate data samples from the original time-series file.
- Use the "-l" option of the "surrotest" command for carrying out a linearity test. The original time-series and surrogate data files must be specified for this command.
>> surrogate -f --surrnum 399 data/demo-original-8x500.csv
output mat file : results/demo-original-8x500_ft_multi_all.mat
>> surrotest -l --showsig --showrank data/demo-original-8x500.csv results/demo-original-8x500_ft_multi_all.mat
significantly not linear (1 / 8)
output mat file : results/demo-original-8x500_linear_test.mat
The plot design for the linearity test was inspired by J.Theilear and D.Prichard (1996).
Gaussian distribution test:
- Use RG surrogate ("surrogate -g" command) to generate 399 of surrogate data samples from the original time-series file.
- Use the "-g" option of the "surrotest" command for Gaussian distribution test. The original time-series and surrogate data files must be specified for this command.
>> surrogate -g --surrnum 399 data/demo-original-8x500.csv
output mat file : results/demo-original-8x500_rg_multi_all.mat
>> surrotest -g --showsig --showrank data/demo-original-8x500.csv results/demo-original-8x500_rg_multi_all.mat
significantly not gaussian (5 / 8)
output mat file : results/demo-original-8x500_gaussian_test.mat
Independent and Identically Distributed (I.I.D) test:
- Use RS surrogate ("surrogate -s" command) to generate 399 of surrogate data samples from the original time-series file.
- Use the "-i" option of the "surrotest" command for I.I.D test. The original time-series and surrogate data files must be specified for this command.
>> surrogate -s --surrnum 399 data/demo-original-8x500.csv
output mat file : results/demo-original-8x500_rs_multi_all.mat
>> surrotest -i --showsig --showrank data/demo-original-8x500.csv results/demo-original-8x500_rs_multi_all.mat
significantly not I.I.D (8 / 8)
output mat file : results/demo-original-8x500_iid_test.mat
Demo 6
This demo shows the extraction of rs-fMRI time-series data (132 ROIs). Then MTESS is calculated amongst the group of rs-fMRI ROI time-series data.
(Caution: Data for this demo is not included in the toolbox.)
>> nii2roisig -a D:/work/conn/rois/atlas.nii --showsig --transform 1 *_rfMRI_REST1_LR.nii.gz
checking atlas space size ...
...
processing : 101309_rfMRI_REST1_LR
output mat file : results/100307_rfMRI_REST1_LR_all.mat
>> mtess --showmat --showforce --showdend ward --showprop results/100307_rfMRI_REST1_LR_all.mat
...
output mat file : results/100307_rfMRI_REST1_LR_all_mtess.mat
mtess command
>> mtess -h
usage: mtess [options] file1.mat file2.mat ...
--range type input group value range (default:"auto", sigma:<num>, full:<num> or <min>:<max>)
--pcc type Partial Cross-Correlation algorithm 0:auto, 1:PCC, 2:SV-PCC, 3:PC-PCC, 4:[] (dafault:0)
--aclag num time lag <num> for Auto Correlation (default:5)
--paclag num time lag <num> for Partial Auto Correlation (default:13)
--cclag num time lag <num> for Cross Correlation (default:2)
--pcclag num time lag <num> for Partial Cross Correlation (default:4)
--lambda num ridge regression param <num> for Partial Cross Correlation (default:0)
--outpath path output files <path> (default:"results")
--format type save file format <type> 0:csv, 1:mat (default:1)
--transform type input signal transform <type> 0:raw, 1:sigmoid (default:0)
--transopt num signal transform option <num> (for type 1:centroid value)
--showinsig show input time-series data of <filename>.csv
--showinras show raster plot of input time-series data of <filename>.csv
--showmat show result MTESS matrix
--showsig show 1 vs. others node signals
--showprop show result polar chart of 1 vs. others MTESS statistical properties
--shownode show result line plot of 1 vs. others node MTESS
--showdend algo show dendrogram of <algo> hierarchical clustering based on MTESS matrix. see MATLAB linkage method option.
--showforce show force weight effect graph based on MTESS matrix
--cache use cache file for MTESS calculation (low memory mode)
--cachepath path cache files <path> (default:"results/cache")
-v, --version show version number
-h, --help show command line help
The input .mat file should include input cell data. The node count must be the same within the group, whereas time-series length does not have to be the same.
name | cell | description |
---|---|---|
CX | {<nodes> x <length>} x <cell number> | group of multivariate time-series |
names | {'data name string'} x <cell number> | names of each time-series data |
The output .mat file includes the following matrix data:
name | matrix | description |
---|---|---|
MTS | <cell number> x <cell number> | MTESS matrix (2D) |
MTSp | <cell number> x <cell number> x 8 | MTESS statistical property matrix (3D) |
nMTS | <cell number> x <cell number> x <nodes> | Node MTESS matrix (3D) |
nMTSp | <cell number> x <cell number> x <nodes> x 8 | Node MTESS statistical property matrix (4D) |
Similarities are generated for the following 8 statistical properties: mean, standard deviation, DFT amplitude, correlation, partial correlation, cross-correlation and partial cross-correlation.
gsdgm command
>> gsdgm -h
model training : gsdgm [options] file1.mat file2.mat ...
surrogate data : gsdgm [options] file_gsm_<type>.mat
-v, --var output Vector Auto-Regression (VAR) group surrogate model (<filename>_gsm_var.mat)
-p, --pcvar output Principal Component VAR (PCVAR) group surrogate model (<filename>_gsm_pcvar.mat)
-d, --vardnn output VAR Deep Neural Network (VARDNN) group surrogate model (<filename>_gsm_vardnn.mat)
--lag num time lag <num> for VAR, PCVAR, VARDNN surrogate model (default:3)
--noise type noise type for VAR, PCVAR, VARDNN surrogate model (default:"gaussian" or "residuals")
--outpath path output files <path> (default:"results")
--transform type input training signal transform <type> 0:raw, 1:sigmoid (default:0)
--transopt num signal transform option <num> (for type 1:centroid value)
--format type output surrogate data file format <type> 0:csv, 1:mat (default:1)
--surrnum num output surrogate sample number <num> (default:1)
--siglen num output time-series length <num> (default:same as input time-series)
--range type output surrogate value range (default:"auto", sigma:<num>, full:<num>, <min>:<max> or "none")
--pcrate num principal component variance rate <num> for PCVAR surrogate (default:0.99)
--epoch num VARDNN surrogate training epoch number <num> (default:1000)
--showinsig show input time-series data of <filename>.csv
--showinras show raster plot of input time-series data of <filename>.csv
--showsig show output surrogate time-series data
--showras show raster plot of output surrogate time-series data
--version show version number
-h, --help show command line help
The input .mat file should include input cell data described as follows. The node count must be the same within the group, whereas the time-series length does not have to be the same.
name | cell | description |
---|---|---|
CX | {<nodes> x <length>} x <cell number> | group of multivariate time-series |
names | {'data name string'} x <cell number> | names of each time-series data |
The output (group surrogate model) .mat file includes the following struct data:
name | type | description |
---|---|---|
net | struct | struct of group surrogate model |
gRange | struct | struct of group range information |
name | string | name of group surrogate model |
The output (group surrogate data) .mat file includes the following cell data:
name | cell | description |
---|---|---|
CX | {<nodes> x <length>} x <cell number> | group of multivariate time-series |
names | {'data name string'} x <cell number> | names of each time-series data |
surrogate command
>> surrogate -h
usage: surrogate [options] filename.csv ...
-g, --rg output Random Gaussian (RG) surrogate (<filename>_rg_<variate>_<num>.csv)
-s, --rs output Random Shuffling (RS) surrogate (<filename>_rs_<variate>_<num>.csv)
-f, --ft output Fourier Transform (FT) surrogate (<filename>_ft_<variate>_<num>.csv)
-a, --aaft output Amplitude Adjusted FT (AAFT) surrogate (<filename>_aaft_<variate>_<num>.csv)
-i, --iaaft output Iterated AAFT (IAAFT) surrogate (<filename>_iaaft_<variate>_<num>.csv)
-v, --var output Vector Auto-Regression (VAR) surrogate (<filename>_var_<variate>_<num>.csv)
-p, --pcvar output Principal Component VAR (PCVAR) surrogate (<filename>_pcvar_<variate>_<num>.csv)
-d, --vardnn output VAR Deep Neural Network (VARDNN) surrogate (<filename>_vardnn_<variate>_<num>.csv)
-l, --lazy output Lazy Learning (LL) surrogate (<filename>_lazy_<variate>_<num>.csv)
--multi output multivariate surrogate (default:on)
--uni output univariate surrogate (default:off)
--noise type noise type for VAR, PCVAR, VARDNN, LL surrogate (default:"gaussian")
--surrnum num output surrogate sample number <num> (default:1)
--outpath path output files <path> (default:"results")
--format type save file format <type> 0:csv, 1:mat(each), 2:mat(all) (default:2)
--transform type input signal transform <type> 0:raw, 1:sigmoid (default:0)
--transopt num signal transform option <num> (for type 1:centroid value)
--lag num time lag <num> for VAR, PCVAR, VARDNN, LL (default:3)
--epoch num VARDNN training epoch number <num> (default:1000)
--l2 num VARDNN training L2Regularization <num> (default:0.05)
--nn num <num>-nearest neighbor for Lazy Learning (default:2)
--showsig show input time-series data of <filename>.csv
--nocache do not use cache file for VARDNN training
--version show version number
-h, --help show command line help
The input .csv or .mat file should include time-series matrix data in the following format:
name | matrix | description |
---|---|---|
X | <nodes> x <length> | multivariate time-series |
The output (surrogate data) .mat file includes the following cell data:
name | cell | description |
---|---|---|
CX | {<nodes> x <length>} x <cell number> | group of multivariate time-series |
names | {'data name string'} x <cell number> | names of each time-series data |
surrotest command
>> surrotest -h
usage: surrotest [options] <original>.csv surrogate.mat ...
-g, --gaussian output Gaussian distribution test (<original>_gauss_test.csv)
-l, --linear output Linearity test (<original>_linear_test.csv)
-i, --iid output I.I.D test (<original>_iid_test.csv)
--side num bottom-side(1), both-side(2), top-side(3) (default:2)
--outpath path output files <path> (default:"results")
--format type save file format <type> 0:csv, 1:mat (default:1)
--showsig show input time-series data of <original>.csv
--showrank show rank result of <original>.csv
-v, --version show version number
-h, --help show command line help
The input .mat file should include input cell data described as follows. The node count must be the same within the group.
name | cell | description |
---|---|---|
CX | {<nodes> x <length>} x <cell number> | group of multivariate time-series |
names | {'data name string'} x <cell number> | names of each time-series data |
The output .mat file includes the following matrix data:
name | matrix | description |
---|---|---|
P | <nodes> x 1 | P-value result |
Rank | <nodes> x 1 | Rank value result |
nii2roisig command
>> nii2roisig -h
usage: nii2roisig [options] -a atlas.nii file1.nii ...
-a, --atlas file ROI atlas nifti <file>
--outpath path output files <path> (default:"results")
--format type save file format <type> 0:csv, 1:mat(each), 2:mat(all) (default:2)
--transform type output signal transform <type> 0:raw, 1:sigmoid (default:0)
--transopt num signal transform option <num> (for type 1:centroid value)
--showsig show output time-series data of <original>.csv
--showras show raster plot of output time-series data of <original>.csv
--nocache do not use cache file for conversion
-v, --version show version number
-h, --help show command line help
Inputs are a ROI atlas .nii (.nii.gz) and rs-fMRI .nii (.nii.gz) files. Both files should have data in the same physical space (i.e. MNI space).
The output .mat file includes the following cell data:
name | cell | description |
---|---|---|
CX | {<nodes> x <length>} x <cell number> | group of multivariate time-series |
names | {'data name string'} x <cell number> | names of each time-series data |
If you find GSDGM and MTESS Toolbox useful in your research, please cite it as follows:
Takuto Okuno, Junichi Hata, Yawara Haga, Kanako Muta, Hiromichi Tsukada, Ken Nakae, Hideyuki Okano, Alexander Woodward (2023) "Group Surrogate Data Generating Models and Similarity Quantifiation of Multivariate Time-Series: A Resting-State fMRI Study", NeuroImage, 120329