Skip to content

Noble self-supervised adversarial auto-encoder is proposed to extract biologically relevant genes from cancer transcriptomes.

License

Notifications You must be signed in to change notification settings

NeuroSyd/latent-space-discovery

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

58 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Extracting Biologically Relevant Genes using AFExNet from Cancer Transcriptomes [Paper]

License: CC BY 4.0 contribution python version keras version tensorflow version imblearn version

In this project, we introduce neural network based adversarial autoencoder (AAE) model to extract biologically-relevant features from RNA-Seq data. We also developed a method named TopGene to find highly interactive genes from the latent space. AFExNet in combination with TopGene method finds important genes which could be useful for finding cancer biomarkers.

project_logo_transparent

Getting Started

The following instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See the instruction below:

Prerequisites

The following libraries are required to reproduce this project:

  1. Keras (2.0.6)

  2. Keras-adverserial (0.0.3)

  3. Tensorflow (1.13.1)

  4. Scikit-Learn (0.20.3)

  5. Numpy (1.16.3)

  6. Imbalanced-Learn (0.4.3)

Supports both Python 2.5.0 and Python 3.5.6

Directory Layout

├── results
│   ├── saved_results
│   │   ├── Gene_Analysis_Breast_Cancer.xlsx
│   │   ├── Gene_Analysis_UCEC.xlsx
│   ├── AAE
│   │   ├── aae_encoded.tsv
│   │   ├── aae_sorted_gene.tsv
│   │   ├── aae_weight_distribution.png
│   │   ├── aae_weight_matrix
│   ├── PCA
│   ├── ... # add LDA, SVD etc
├── data
│   ├── data will be stored here
├── feature_extraction
│   ├── AAE
│   │   ├── aae_encoder.h5
│   │   ├── aae_decoder.h5
│   │   ├── aae_discriminator.h5
│   │   ├── aae_history.csv
│   ├── PCA
│   ├──VAE
│   ├── ...
├── README.md
├── figures
│   ├── saved_figures
│   │   ├── Olfactory__Transduction_pathway.png
└── .gitignore

Usage

Run the following to extract features using different autoencoders

main.py

And run the following to extract features when PCA, NMF, FastICA, ICA, RBM etc. are used

main_pca.py

Gene ontology of molecular function was performed using DAVID 6.7 https://david-d.ncifcrf.gov/

More regarding gene ontology http://geneontology.org/docs/ontology-documentation/

Proposed Architecture

weight_analysis_aae

Datasets

Breast Invasive Carcinoma (BRCA)

Molecular Subtypes Number of Patients Label
Luminal A 304 0
Luminal B 121 1
Basal & Triple Negetive 137 2
Her 2 Enriched 43 3
Total Number of Samples (Patients) Total Number of Features (Genes)
605 20439

Validation Data

Uterine Corpus Endometrial Carcinoma (UCEC)

Molecular Subtypes Number of Patients Label
Copy Number High 60 0
Copy Number Low 90 1
Hyper Mutated (MSI) 64 2
Ultra Mutated (POLE) 16 3
Total Number of Samples (Patients) Total Number of Features (Genes)
230 20482

Contribution

If you want to contribute to this project and make it better, your help is very welcome. When contributing to this repository please make a clean pull request.

Acknowledgments

About

Noble self-supervised adversarial auto-encoder is proposed to extract biologically relevant genes from cancer transcriptomes.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages