scGCO is a method to identify genes demonstrating position-dependent differential expression patterns, also known as spatially viable genes, using the powerful graph cuts algorithm. ScGCO can analyze spatial transcriptomics data generated by diverse technologies, including but not limited to single-cell RNA-sequencing, or in situ FISH based methods.What's more, scGCO can easy scale to millions of cells.
This repository contains source codes of scGCO, and tutorials on running the program.
The primary implementation is as a Python 3 package, and can be installed from the command line by
pip install scGCO
scGCO has been tested on Ubuntu Linux (18.04.1), Mac OS X (10.14.1) and Windows(Windows 7 Professional).
MIT Licence, see LICENSE file.
See AUTHORS file.
For bugs, feedback or help please contact Wanwan Feng fengwanwan2023@gmail.com.
The following codes demonstrate the typical data analysis flow of scGCO.
The tutorial has also been provided as a Jupyter Notebook in the Tutorial directory (scGCO_starmap.ipynb)
The entire process should only take 1-2 minutes on a typical desktop computer with 8 cores.
The required matrix format is the ST data format, a matrix of counts where spot coordinates are row names and the gene names are column names. This default matrix format (.TSV ) is split by tab.
As an example, let’s analyze spatially variable gene expression in Mouse Olfactory Bulb using a data set published in Ståhl et al 2016.
from scGCO import *
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
import warnings
warnings.filterwarnings('ignore')
- Step 1-5: perform genome-scale identification of spatially variably genes.
- Step 6-7: visualize and save identified spatial variably genes.
- Step 8: perform graph cuts on a single genes to visualize its spatial patterns.
j=11
unary_scale_factor=100
label_cost=10
algorithm='expansion'
ff = 'README_file/Rep'+str(j)+'_MOB_count_matrix-1.tsv'
locs,data, noiseInd=read_spatial_expression(ff,sep='\t',num_exp_genes=0.01, num_exp_spots=0.05, min_expression=1)
data_norm = normalize_count_cellranger(data)
print('Rep{}_processing: {}'.format(j,data_norm.shape))
raw data dim: (262, 16218)
Rep11_processing: (259, 12522)
exp= data_norm.iloc[:,0]
cellGraph= create_graph_with_weight(locs, exp)
fig, ax= plt.subplots(1,1,figsize=(5,5)) #, dpi=300)
ax.set_aspect('equal')
exp= data_norm.iloc[:,0].values
cellGraph = create_graph_with_weight(locs, exp)
ax.scatter(locs[:,0], locs[:,1], s=1, color='black')
for i in np.arange(cellGraph.shape[0]):
x = (locs[int(cellGraph[i,0]), 0], locs[int(cellGraph[i,1]), 0])
y = (locs[int(cellGraph[i,0]), 1], locs[int(cellGraph[i,1]), 1])
ax.plot(x, y, color='black', linewidth=0.5)
plt.title('CellGraph')
Text(0.5, 1.0, 'CellGraph')
t0=time.time()
gmmDict= multiGMM(data_norm)
print('GMM time(s): ', time.time()-t0)
100%|████████████████████████████████████████████████████████████████████████████████████| 8/8 [01:13<00:00, 9.15s/it]
GMM time(s): 74.15773129463196
# store_gmm(gmmDict,fileName='')
t0= time.time()
result_df= identify_spatial_genes(locs, data_norm,
cellGraph ,gmmDict)
print('Running time: {} seconds'.format(time.time()-t0))
100%|████████████████████████████████████████████████████████████████████████████████████| 8/8 [01:51<00:00, 13.97s/it]
Running time: 112.93512773513794 seconds
We perform to write and save scGCO output results with write_result_to_csv functions for cross-platform.
Meanwhile, When reread these results we should use read_result_to_dataframe functions.
write_result_to_csv(result_df,'../../results/MouseOB/scGCO_results/Rep{}_results_df'.format(j))
result_df=read_result_to_dataframe('../../results/MouseOB/scGCO_results/Rep11_result_df.csv')
print(result_df.shape)
(12522, 269)
- default: 0.05
select genes demonstrating significant spatial variability
fdr_cutoff=0.05
fdr_df=result_df.sort_values('fdr').loc[result_df.fdr<fdr_cutoff,]
print(fdr_df.shape)
(333, 269)
Visualize some identified genes.
# visualize top genes
visualize_spatial_genes(fdr_df.iloc[0:10,], locs, data_norm,cellGraph ,point_size=0.2)
# save top genes to pdf
multipage_pdf_visualize_spatial_genes(fdr_df.iloc[0:10,], locs, data_norm,cellGraph,point_size=0) #,
# fileName='../../results//top10_genes.pdf')