GitHub - fengwanwan/scGCO: Single-cell Graph Cuts Optimization

Single-cell Graph Cuts Optimization

(scGCO)

Overview

scGCO is a method to identify genes demonstrating position-dependent differential expression patterns, also known as spatially viable genes, using the powerful graph cuts algorithm. ScGCO can analyze spatial transcriptomics data generated by diverse technologies, including but not limited to single-cell RNA-sequencing, or in situ FISH based methods.What's more, scGCO can easy scale to millions of cells.

Repo Contents

This repository contains source codes of scGCO, and tutorials on running the program.

Installation Guide

The primary implementation is as a Python 3 package, and can be installed from the command line by

pip install scGCO

scGCO has been tested on Ubuntu Linux (18.04.1), Mac OS X (10.14.1) and Windows(Windows 7 Professional).

License

MIT Licence, see LICENSE file.

Authors

See AUTHORS file.

Contact

For bugs, feedback or help please contact Wanwan Feng fengwanwan2023@gmail.com.

Example usage of scGCO

The following codes demonstrate the typical data analysis flow of scGCO.

The tutorial has also been provided as a Jupyter Notebook in the Tutorial directory (scGCO_starmap.ipynb)

The entire process should only take 1-2 minutes on a typical desktop computer with 8 cores.

Input Format

The required matrix format is the ST data format, a matrix of counts where spot coordinates are row names and the gene names are column names. This default matrix format (.TSV ) is split by tab.

As an example, let’s analyze spatially variable gene expression in Mouse Olfactory Bulb using a data set published in Ståhl et al 2016.

Identify spatial genes with scGCO

from scGCO import *
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline

import warnings
warnings.filterwarnings('ignore')

Tutorial with Rep11 of MOB

This is a step-by-step instruction on running the main functionalities of scGCO.

Step 1-5: perform genome-scale identification of spatially variably genes.
Step 6-7: visualize and save identified spatial variably genes.
Step 8: perform graph cuts on a single genes to visualize its spatial patterns.

Step 1:

Read in raw data and perform standard normalization.

j=11
unary_scale_factor=100
label_cost=10
algorithm='expansion'
ff = 'README_file/Rep'+str(j)+'_MOB_count_matrix-1.tsv'
locs,data, noiseInd=read_spatial_expression(ff,sep='\t',num_exp_genes=0.01, num_exp_spots=0.05, min_expression=1)

data_norm = normalize_count_cellranger(data)
print('Rep{}_processing: {}'.format(j,data_norm.shape))

raw data dim: (262, 16218)
Rep11_processing: (259, 12522)

Step 2:

Create complete undiected graph with connecting spatial spots/cells

exp= data_norm.iloc[:,0]
cellGraph= create_graph_with_weight(locs, exp)

fig, ax= plt.subplots(1,1,figsize=(5,5)) #, dpi=300)
ax.set_aspect('equal')

exp= data_norm.iloc[:,0].values
cellGraph = create_graph_with_weight(locs, exp)
ax.scatter(locs[:,0], locs[:,1], s=1, color='black')
for i in np.arange(cellGraph.shape[0]):
    x = (locs[int(cellGraph[i,0]), 0], locs[int(cellGraph[i,1]), 0]) 
    y = (locs[int(cellGraph[i,0]), 1], locs[int(cellGraph[i,1]), 1])     
    ax.plot(x, y, color='black', linewidth=0.5)
    
plt.title('CellGraph')

Text(0.5, 1.0, 'CellGraph')

Step3:

Gene expression classification via Gaussian mixture modeling

t0=time.time()
gmmDict= multiGMM(data_norm)
print('GMM time(s): ', time.time()-t0)

100%|████████████████████████████████████████████████████████████████████████████████████| 8/8 [01:13<00:00,  9.15s/it]


GMM time(s):  74.15773129463196

# store_gmm(gmmDict,fileName='')

Step 4:

Run the main scGCO function to identify genes with a non-random spatial variability

t0= time.time()
result_df= identify_spatial_genes(locs, data_norm, 
                                               cellGraph ,gmmDict)
print('Running time: {} seconds'.format(time.time()-t0))

100%|████████████████████████████████████████████████████████████████████████████████████| 8/8 [01:51<00:00, 13.97s/it]


Running time: 112.93512773513794 seconds

We perform to write and save scGCO output results with write_result_to_csv functions for cross-platform.

Meanwhile, When reread these results we should use read_result_to_dataframe functions.

write_result_to_csv(result_df,'../../results/MouseOB/scGCO_results/Rep{}_results_df'.format(j))

result_df=read_result_to_dataframe('../../results/MouseOB/scGCO_results/Rep11_result_df.csv')
print(result_df.shape)

(12522, 269)

Step 5:

Select genes with significant spatial non-random patterns using a specific fdr cutoff.

default: 0.05
select genes demonstrating significant spatial variability

fdr_cutoff=0.05
fdr_df=result_df.sort_values('fdr').loc[result_df.fdr<fdr_cutoff,]

print(fdr_df.shape)

(333, 269)

Step 6:

Visualize some identified genes.

# visualize top genes
visualize_spatial_genes(fdr_df.iloc[0:10,], locs, data_norm,cellGraph ,point_size=0.2)

# save top genes to pdf
multipage_pdf_visualize_spatial_genes(fdr_df.iloc[0:10,], locs, data_norm,cellGraph,point_size=0) #, 
#                                       fileName='../../results//top10_genes.pdf')

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
README_file		README_file
Simulations		Simulations
Temp_files/tissue_mat		Temp_files/tissue_mat
notebooks		notebooks
scGCO		scGCO
.gitattributes		.gitattributes
.gitignore		.gitignore
AUTHORS		AUTHORS
LICENSE		LICENSE
README.md		README.md
Simulations.zip		Simulations.zip
scGCO_run_STARmap_h5ad.zip		scGCO_run_STARmap_h5ad.zip
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Single-cell Graph Cuts Optimization

(scGCO)

Overview

Repo Contents

Installation Guide

License

Authors

Contact

Example usage of scGCO

Input Format

Identify spatial genes with scGCO

Tutorial with Rep11 of MOB

This is a step-by-step instruction on running the main functionalities of scGCO.

Step 1:

Read in raw data and perform standard normalization.

Step 2:

Create complete undiected graph with connecting spatial spots/cells

Step3:

Gene expression classification via Gaussian mixture modeling

Step 4:

Run the main scGCO function to identify genes with a non-random spatial variability

Step 5:

Select genes with significant spatial non-random patterns using a specific fdr cutoff.

Step 6:

About

Releases

Packages

Languages

License

fengwanwan/scGCO

Folders and files

Latest commit

History

Repository files navigation

Single-cell Graph Cuts Optimization

(scGCO)

Overview

Repo Contents

Installation Guide

License

Authors

Contact

Example usage of scGCO

Input Format

Identify spatial genes with scGCO

Tutorial with Rep11 of MOB

This is a step-by-step instruction on running the main functionalities of scGCO.

Step 1:

Read in raw data and perform standard normalization.

Step 2:

Create complete undiected graph with connecting spatial spots/cells

Step3:

Gene expression classification via Gaussian mixture modeling

Step 4:

Run the main scGCO function to identify genes with a non-random spatial variability

Step 5:

Select genes with significant spatial non-random patterns using a specific fdr cutoff.

Step 6:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages