A set of tools for visualization, processing and analysis (supervised, unsupervised, image alignment, etc..) of Spatial Transcriptomics datasets.
The package is compatible with the output format of the data generated with the ST Pipeline (https://github.com/jfnavarro/st_pipeline).
MIT License, see LICENSE file.
See AUTHORS file.
For bugs, feedback or help you can contact Jose Fernandez Navarro jc.fernandez.navarro@gmail.com
The input format is a matrix of counts (tab delimited) where spot ids are row names and the genes are column names. Additionally, some scripts may require a spot coordinates file where the spot and pixel coordinates are defined for each spot id (tab delimited).
Before you install the ST Analysis package we recommend that you create a Python 3 virtual environment. We recommend Anaconda.
The ST Analysis is only computatible with Python 3.
The following instructions are for installing the ST Analysis package with Python 3.6 and Anaconda
git clone https://github.com/jfnavarro/st_analysis.git
cd st_analysis
python setup.py install
A set of scripts (described below) will then be available in your system or the environment of choice if you chose to work on a specific environment.
Note that if you want to use align_sections.py
you will have to install
the st_tissue_recognition library.
Note that you can always type script_name.py --help
to get more information
about how a script works and its parameters.
To cluster spot together based on their expression profiles you can run:
unsupervised.py --counts matrix_counts.tsv --normalization REL --num-clusters 5 --clustering KMeans --dimensionality tSNE --use-log-scale
The script can be given one or serveral datasets (matrices of counts). The script allows for multiple normalization and filtering options. The script will perform dimesionality reduction and then cluster the spot together based on the manifold space. The script implements multiple options for clustering and dimensionality reduction. The script generates a scatter plot of the clustered spots in a 2D or 3D manifold. The script will write the computed clusters/labels per spot in a file (tab delimited).
To know more about the parameters you can type --help
You can train a classifier with the expression profiles of a set of spots where you know the class (cluster) and then predict the class of the spots of a new dataset of the same tissue. For that you can use the following script:
supervised.py --train-data data_matrix.tsv --test-data data_matrix.tsv --train-casses train_classes.txt --test-classes test_classes.txt
This will generate some statistics and a file with the predicted classes/clusters for each spot. The script allows for several options for normalization and classification settings and algorithms. The test/train classes file shoud look like:
SPOT1 1
SPOT2 1
SPOT3 2
Where 1,1 and 2 are spot classes (clusters).
To know more about the parameters you can type --help
NOTE: there is a version that uses GPU and Neural Networks (supervised_torch.py)
Use the script data_plotter.py to visualize ST data, you can use different thresholds for filtering and different normalization and visualization options. The script allows to plot clusters as well as gene sets. The script generates one image for each gene given in the --show-genes option (one sub-image for each input dataset). The script needs one or more matrices of counts where the spots are rows and the genes are columns.
data_plotter.py --cutoff 2 --show-genes Actb Apoe --counts data_matrix.tsv --normalization REL
This will generate a scatter plot of the expression of the spots that contain a gene Actb and with higher expression than 2.
More info if you type --help
filter_genes_matrix.py --counts data_matrix.tsv --filter-genes Malat1 Actb
keep_genes_matrix.py --counts data_matrix.tsv --keep-genes Malat1 Actb
More info if you type --help
An index corresponding to each matrix given in the input (same order) will be appended to the spot ids of the merged matrix.
merge_counts.py --counts data_matrix1.tsv data_matrix2.tsv
More info if you type --help
This script will merge Spatial Transcriptomics datasets into one (matrices of counts, spot coordinates and HE images). The matrices of counts will be merged as in the previous script. The HE images will be stitched together and the spoot coordinates will be merged together. An index corresponding to each dataset will be appended to the spot ids.
merge_datasets.py --counts data_matrix1.tsv data_matrix2.tsv --coordinates spots1.txt spots2.txt --images image1.jpg --images image2.jpg
More info if you type --help
If you have multiple sections (dataset) of the same tissue you may want to align them
so they all have the same orienation and angle. This enables better visuaalizations.
The script align_sections.py
takes as input a list of matrices of counts, spot coordinates
and HE images corresponding to the datasets that must be aligned. It will output a list of
aligned matrices of counts, aligned spot coordinates and aligned HE images. The script supports
different algorithms for the image detection and alignment process. The first dataset is used
as a reference.
align_sections.py --counts data_matrix1.tsv data_matrix2.tsv --coordinates spots1.txt spots2.txt --images image1.jpg --images image2.jpg
More info if you type --help
This script takes as input a list of matrices of counts and file with the reduced coordinates of the spots (2D) and a meta-file (spots and variables). The script will generate a list of scatter plots where each variable will be plotted onto the 2D manifold of the datasets. The script can also plot the expression of genes if given as input. It allows to use different normalization, filtering and visualization options.
dimredu_plotter.py --counts data_matrix.tsv --dim-redu-file dimred.txt --meta-file meta.tsv --show-genes Actb Apoe
More info if you type --help
This script will transformt a dataset in Visium format to the standard ST format (matrix of counts, spot coordinates and HE image).
visiumToST.py --help
convert_spot_coordinates.py --help