SpatialNF is a collection of Nextflow DSL2 pipelines for analyzing spatial transcriptomics data.
We offer pipelines for:
- basic processing of spatial transcriptomics data
- identification of spatially variable genes
- segmentation-free analysis
- label-transfer from single cell RNAseq to spatial transcriptomics data
SpatialNF is implemented in the VSN-framework: https://github.com/vib-singlecell-nf/vsn-pipelines.
SpatialNF can be used for FISH-based data like MERFISH or Molecular cartography and sequencing-based data like 10X Visium. As we do not offer automated segmentation pipelines within SpatialNF, FISH-based data has to be segmented in advance and converted into a supported data format.
For all data types, input data should contain raw counts. SpatialNF supports the following data formats:
Data type | Description |
---|---|
AnnData .h5ad |
AnnData object should contain an .obsm entry 'X_spatial' or 'spatial' storing coordinates of segmented cells or spots. Currently, our Docker images only support anndata <= 0.78. |
10X Spaceranger output | outs folder should contain the default 10X Spaceranger output |
Spatial CSV files | A folder containing a coordinate filecoords.csv and a count matrix file matrix.csv |
Coordinates CSV files | A CSV file contaning coordinates of each transcript per row |
A coordinate CSV file coords.csv
contains three columns: an ID for the parent_cell
, X
and Y
coordinates:
parent_cell,X,Y
1,1952.8673508171798,3899.5127328012163
2,1946.5086419753086,4047.905679012346
3,1952.432242022379,3966.5445503522587
4,1963.7581227436824,4089.649097472924
5,1988.4492753623188,4047.0072463768115
...
A count matrix CSV file matrix.csv
contains a column with parent_cell
IDs matching IDs in the coords.csv
and columns for each transcript:
parent_cell,Act79B,Act88F,AkhR,AstC-R2,Awh,CCAP-R,CG32121,Cralbp,FASN2
1,0,0,1,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,31
3,0,0,1,0,0,0,0,0,34
4,0,0,0,0,0,0,0,0,7
5,0,0,0,0,0,0,0,0,1
7,0,0,0,0,0,0,0,0,4
8,0,0,0,0,0,0,0,0,1
9,0,0,2,0,0,0,0,0,0
11,0,0,0,0,0,0,0,1,0
...
Coordinates CSV files contain the coordinates of each detected transcript per row. The should include a header
, x
and y
columns.
gene,x,y
Rora,755,935
Rora,829,574
Rora,1071,1941
...
Slc17a7,2110,1458
Slc17a7,2110,1873
Slc17a7,2111,302
Counts per gene will be collated in a grid. The bin size can be speficied with binsize
.
These pipelines assemble spatial transcriptomics data into AnnData and SCope (https://scope.aertslab.org) compatible loom files. They perform QC, filtering and clustering of the data
Pipeline / entry point | Description |
---|---|
single_sample | Process samples seperately |
multi_sample | Compile and process samples together |
When running multi_sample
pipelines, add file_concatenator
to utils
in the config file to combine the input files:
utils {
...
file_concatenator {
join = 'outer'
off = 'h5ad'
}
...
}
For detecting spatially variable genes, we implemented a pipeline using SpatialDE inlcuding their AEH approach for identifying spatial patterns.
Pipeline / entry point | Description |
---|---|
single_sample_spatialde | Run single_sample pipeline and identify spatially variable genes and spatial patterns |
multi_sample_spatialde | Run multiple_sample pipeline and identify spatially variable genes and spatial patterns |
spatialde | Only run SpatialDE pipeline; input should be an AnnData object created by SpatialNF |
For label-transfer from scRNA-seq to spatial transcriptomics data, we offer pipelines for spot-based or segmented data using Tangram and SpaGE. In addition, for segmentation-free label-transfer, SpatialNF contains a spage2vec pipeline. Optionally, Squidpy can be used for computing enrichments of co-localized labels. And Tangram can also be used to project gene expression from single cell data which will overwrite the count matrix.
The reference scRNAseq data should be a processed and filtered AnnData object .h5ad
and contain raw counts, as well as the annotation as obs
entry.
Pipeline / entry point | Description |
---|---|
single_sample_tangram | Run single_sample pipeline and Tangram for label-transfer. |
multi_sample_tangram | Run multiple_sample pipeline and and Tangram for label-transfer. |
tangram | Only run Tangram pipeline; input should be an AnnData object created by SpatialNF |
single_sample_spage | Run single_sample pipeline and SpaGE for label-transfer. |
multi_sample_spage | Run multiple_sample pipeline and and SpaGE for label-transfer. |
spage2vec_spage_label_transfer | Perform neighborhood embedding analysis and label transfer with spage2vec. |
Initial configs can be generated with a nextflow config
command. See the SpatialNF/examples
folder for pipeline configuration files.
For example, to generate a config for mouse 10X spaceranger data and the single_sample_spatialde
workflow:
nextflow config SpatialNF/main.nf \
-profile mm10,tenx,singularity,single_sample,spatialde > single_sample_spatialde_aeh.config
After changing file names and parameters in the config, the pipeline can be run with the following command:
nextflow -C single_sample_spatialde_aeh.config run SpatialNF/main.nf \
-entry single_sample_spatialde \
-with-report report.html \
-with-trace \
-resume
SpatialNF generates AnnData .h5ad
and SCope (https://scope.aertslab.org) compatible loom .loom
files.
Output data are written to out/data/
including intermediate data files out/data/intermediate/
.
Reports are stored as Jupyter notebooks and HTML files in out/notebooks/
.
SpatialNF pipelines can be run locally or each step can be seperately submitted to as a job to a HPC.
Resource limits and parameters can be specified in the process
section of a config file.
SpatialNF requires singularity to run Docker containers and Nextflow. Currently, only Nextflow version 21.04 is supported. A compatible Netxtflow binary can be downloaded here: https://github.com/nextflow-io/nextflow/releases/download/v21.04.0/nextflow-21.04.0-all
We are currenty providing Docker images at Docker hub on a free license. In case these Docker images become unavailable in the future, they can be rebuild from the Dockerfile
in the workflow specific subfolders in the src
directory.
All pipelines are using SCANPY:
Wolf, Angerer, & Theis (2018). SCANPY: large-scale single-cell gene expression data analysis. Genome Biol., 19:1-5. https://github.com/scverse/scanpy
SpatialDE:
Svensson Teichmann & Stegle (2018). SpatialDE: identification of spatially variable genes. Nat. Methods, 15:343-346) https://github.com/Teichlab/SpatialDE
Tangram:
Biancalani, Scalia, Buffoni, Avasthi, Lu, Sanger, ... & Regev (2021). Deep learning and alignment of spatially resolved single-cell transcriptomes with Tangram. Nat. Methods, 18:1352-1362. https://github.com/broadinstitute/Tangram
SpaGE:
Abdelaal, Mourragui, Mahfouz, & Reinders (2020). SpaGE: spatial gene enhancement using scRNA-seq. Nucleic Acids Res., 48:e107-e107. https://github.com/tabdelaal/SpaGE
spage2vec:
Partel & Waehlby (2021). Spage2vec: Unsupervised representation of localized spatial gene expression signatures. FEBS J., 288:1859-1870. https://github.com/wahlby-lab/spage2vec
Squidpy:
Palla, Spitzer, Klein, Fischer, Schaar, Kuemmerle, ... & Theis (2022). Squidpy: a scalable framework for spatial omics analysis. Nat. Methods, 19:171-178. https://github.com/scverse/squidpy