sc-PHENIX Imputation

What is sc-PHENIX

sc-PHENIX was developed to improve imputation of scRNA-seq data avoiding over-smoothing; it falls into the category of smooth-based imputation based on benchmarking. However, the methods used in sc-PHENIX to obtain the low dimensional manifold is UMAP(Uniform Manifold approximation and Projection), and the M^t (exponentiated Markov matrix) is from diffusion maps, both techniques based on manifold learning being part of the nonlinear dimensionality reduction methods category, a subfield of machine learning. In this work, our approach is an improvement to the popular method MAGIC by integrating UMAP in the imputation process. Consequently, there is an improvement in the computation of M^t reflecting the denoised cell-neighborhood that captures local, continuum and global data structures. The advantage of preserving data structures with sc-PHENIX compared to MAGIC is that we can share gene expression among more accurate nearest neighbors cells on the manifold of M^t sc-PHENIX. Following these procedures, we obtain more biological insights and at the same time mitigate the risk of over-smoothing data among spurious distinct cell phenotypes. For more information see our preprint doi: https://doi.org/10.1101/2022.06.09.495525 or in our peer-reviewed publication: https://doi.org/10.3390/biology13070512

What you need to know first

The user needs to have knowledge of how to use of pandas and numpy libraries, this implies that the user has python knowledge. Any free course, cursera or udeamy course can be used to learn faster this python libraries, for recent users please go in here click here to learn the basics.

sc-PHENIX is based mainly of the use of UMAP, more information of how to use UMAP please click here. Please keep in mind that we suggest that n_components (UMAP dimensions) can be set for more than 3 in a non-visual manner to capture better data structure for the diffusion process.

The important parameters for sc-PHENIX function are:

knn and decay: For the adaptive kernel to construct the Markovian matrix, the user chooses a knn value that is the number of nearest neighbors from which to compute kernel bandwidth. The parameter decay is the decay rate of kernel tails. We recommend a set knn value sufficient to avoid over-smoothing to other clusters but not too small to alter the connectivity of data as a graph.

t : For the diffusion process, the parameter t (diffusion time) is the power value to which the Markovian matrix is powered. This sets the level of diffusion.

The knn and t values need to be sufficient to build a complete graph (considering the class) and less to avoid over-smooth gene exression to other distinct phenotypes.

plase make sure if you want to use on colab download and install umap

1) install umap

put this in a colab cell and run it to pip install UMAP! from click here.

!pip install umap-learn

2) import libraries

then in other cell import the libraries to connect our github to the colab and pandas and visualization (use the visualization that you want)

import requests
import os
import urllib.request
import pandas as pd
import numpy as np
import seaborn as sns

3) download sc-PHENIX python script

in other cell download sc-phenix

url_sc_phenix = 'https://raw.githubusercontent.com/resendislab/sc-PHENIX/main/sc-PHENIX%20tutorial%20colab/sc_PHENIX.py'
urllib.request.urlretrieve(url_sc_phenix, 'sc_PHENIX.py')
os.listdir()
!cd /content
!ls

4) import sc-PHENIX and reduces dimensionality with PCA

then in other cell import sc-phenix

from sc_PHENIX import run_pca, sc_PHENIX
pca_data= run_pca(data,n_components=500, random_state=1)

5) import umap and reduce PCA space into a UMAP space

import umap
#umap parameters we reduced the 500 PCA dimensions to 50 umap dimensions
fit = umap.UMAP(n_components=50,n_neighbors=10,verbose= True,metric='cosine',random_state=42)
%time u_no_3 = fit.fit_transform(pca_data) #u_no_3 variable is the 50 umap dimenions coordinates for sc-PHENIX
#the default output from UMAP is a euclidean interpretable space, but can be changed.

6) impute with sc-PHENIX

neuro_phenix = sc_PHENIX(data, u_no_3,t=15,metric='euclidean',knn=15,decay=500)
neuro_phenix

sc-PHENIX is available in colab

Name		Name	Last commit message	Last commit date
Latest commit History 244 Commits
MCF7		MCF7
MDS		MDS
MNIST and NEURONAL MDS for FIG 4 AND 7, and supplementaries Fi S1- S10		MNIST and NEURONAL MDS for FIG 4 AND 7, and supplementaries Fi S1- S10
Neuronal over-smoothing		Neuronal over-smoothing
PBMC's		PBMC's
microarrays c.elegans		microarrays c.elegans
sc-PHENIX with PCA int - to reproduce MAGIC outputs		sc-PHENIX with PCA int - to reproduce MAGIC outputs
sc-PHENIX tutorial colab		sc-PHENIX tutorial colab
silhouette		silhouette
LICENSE		LICENSE
README.md		README.md
sc_PHENIX_try_me_example_.ipynb		sc_PHENIX_try_me_example_.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

sc-PHENIX Imputation

What is sc-PHENIX

What you need to know first

1) install umap

2) import libraries

3) download sc-PHENIX python script

4) import sc-PHENIX and reduces dimensionality with PCA

5) import umap and reduce PCA space into a UMAP space

6) impute with sc-PHENIX

About

Releases

Packages

Contributors 2

Languages

License

resendislab/sc-PHENIX

Folders and files

Latest commit

History

Repository files navigation

sc-PHENIX Imputation

What is sc-PHENIX

What you need to know first

1) install umap

2) import libraries

3) download sc-PHENIX python script

4) import sc-PHENIX and reduces dimensionality with PCA

5) import umap and reduce PCA space into a UMAP space

6) impute with sc-PHENIX

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages