Skip to content

mpi cluster

Alexander Krull edited this page Aug 14, 2019 · 9 revisions

How to run Noise2Void on the MPI cluster:

1: Setting it up on your account:

1. Log into the cluster

2. Install noise2void, this has become a lot easier. Type: pip3 install --user n2v

2: Running the scripts from an interactive node

1. load the cuda module. Type: module load cuda/9.0.176

2. Get an interactive node. Type: srun --exclude=r02n01 -p gpu --gres=gpu:1 --time 0-1:00:00 --mem-per-cpu 256000 --export=ALL --pty /bin/bash

3. Run the training script (here we use Romina's data): trainN2V.py --dataPath='/lustre/projects/juglab/n2vScripts/data/Romina/' --fileName='MTfmi_25deg_181020_SD4-001-hyper.tif' --dims=TZYX --patchSizeZ=16 --batchSize=8

Tip: This will calculate for a couple of hours. If you only want to quickly try the method you can reduce computation time at the cost of reduced quality by using --stepsPerEpoch=5.

The --dataPath should point to a directory with images. With --fileName you can choose which files you want to open as training and validation data. If you do --fileName=*.tif, which is the default, it will use all tifs in the directory. With --dims you have to specify the order of the dimension in your images, where --dims=ZYX is for a single channel 3D stack. A 2D image would be --dims=YX, a multi channel stack --dims=ZYXC or a multi channel image --dims=YXC. Currently time series or movies are not supported, so I cropped one frame from Rominas 4D data to make it work. This will change very soon. It is already implemented. --validationFraction=2 chooses that we want to use 2% of our patches as validation data and the rest for training. For a quick description of the other parameters you can run python3 trainN2V.py -h.

4. Run the prediction script: predictN2V.py --dataPath='/lustre/projects/juglab/n2vScripts/data/Romina/' --fileName='MTfmi_25deg_181020_SD4-001-hyper.tif' --dims=TZYX --tile=3

The flags --dataPath and --fileName work as above, specifying, which images should be precessed. --dims=ZYX is as above. With --output=PATH you can specify a path in which your results will be saved. The filenames will correspond to the original file names with "_N2V" concatenated. With --tile=3 you can choose to divide your image into 3x3 tiles to make it fit into GPU memory. If it is set too small the script will try to find an appropriate tiling by itself. This can take a bit longer. You can use predictN2V.py without parameters to get a description of all parameters.

Running the scripts as a job

1. load the cuda module. Type: module load cuda/9.0.176

2. Create a text file (name it e.g. trainn2v.sh) that holds the command and all its parameters. To train a network as above the file should look like this:

#!/bin/sh

python3 trainN2V.py --dataPath='/lustre/projects/juglab/n2vScripts/data/Romina/' --fileName='MTfmi_25deg_181020_SD4-001.tif' --dims=ZYX --validationFraction=2 --patchSizeZ=16 --batchSize=8

To submit a job for prediction you would have to change the file accordingly (see above).

3. Then submit the job: sbatch -p gpu --gres=gpu:1 --time 0-0:25:00 --mem-per-cpu 256000 --export=ALL -o out.txt trainn2v.sh With the -o parameter you can specify the path to the log file that will be created. The --time days-hours:minutes:seconds param specifies how much time the job will be granted.

4. You can check which of your jobs are runnning by typing: squeue -u <USERNAME> You can cancel a job using: scancel <JOBID> More usefull commands can be found at: https://isugenomics.github.io/bioinformatics-workbook/Appendix/HPC/SLURM/slurm-cheatsheat.html