Extracting Biologically Relevant Genes using AFExNet from Cancer Transcriptomes [Paper]

In this project, we introduce neural network based adversarial autoencoder (AAE) model to extract biologically-relevant features from RNA-Seq data. We also developed a method named TopGene to ﬁnd highly interactive genes from the latent space. AFExNet in combination with TopGene method ﬁnds important genes which could be useful for ﬁnding cancer biomarkers.

Getting Started

The following instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See the instruction below:

Prerequisites

The following libraries are required to reproduce this project:

Keras (2.0.6)
Keras-adverserial (0.0.3)
Tensorflow (1.13.1)
Scikit-Learn (0.20.3)
Numpy (1.16.3)
Imbalanced-Learn (0.4.3)

Supports both Python 2.5.0 and Python 3.5.6

Directory Layout

├── results
│   ├── saved_results
│   │   ├── Gene_Analysis_Breast_Cancer.xlsx
│   │   ├── Gene_Analysis_UCEC.xlsx
│   ├── AAE
│   │   ├── aae_encoded.tsv
│   │   ├── aae_sorted_gene.tsv
│   │   ├── aae_weight_distribution.png
│   │   ├── aae_weight_matrix
│   ├── PCA
│   ├── ... # add LDA, SVD etc
├── data
│   ├── data will be stored here
├── feature_extraction
│   ├── AAE
│   │   ├── aae_encoder.h5
│   │   ├── aae_decoder.h5
│   │   ├── aae_discriminator.h5
│   │   ├── aae_history.csv
│   ├── PCA
│   ├──VAE
│   ├── ...
├── README.md
├── figures
│   ├── saved_figures
│   │   ├── Olfactory__Transduction_pathway.png
└── .gitignore

Usage

Run the following to extract features using different autoencoders

main.py

And run the following to extract features when PCA, NMF, FastICA, ICA, RBM etc. are used

main_pca.py

Gene ontology of molecular function was performed using DAVID 6.7 https://david-d.ncifcrf.gov/

More regarding gene ontology http://geneontology.org/docs/ontology-documentation/

Proposed Architecture

Datasets

cBioPortal - Cancer Genomics Datasets
Breast Invasive Carcinoma (TCGA, Cell 2015) - Clinical information is used to label various molecular subtypes

Breast Invasive Carcinoma (BRCA)

Molecular Subtypes	Number of Patients	Label
Luminal A	304	0
Luminal B	121	1
Basal & Triple Negetive	137	2
Her 2 Enriched	43	3

Total Number of Samples (Patients)	Total Number of Features (Genes)
605	20439

Details about Molecular Subtypes of Breast Cancer

Validation Data

Uterine Corpus Endometrial Carcinoma (TCGA, Nature 2013) - Clinical information is used to label various molecular subtypes.

Uterine Corpus Endometrial Carcinoma (UCEC)

Molecular Subtypes	Number of Patients	Label
Copy Number High	60	0
Copy Number Low	90	1
Hyper Mutated (MSI)	64	2
Ultra Mutated (POLE)	16	3

Total Number of Samples (Patients)	Total Number of Features (Genes)
230	20482

Details about Molecular Subtypes of Endometrial Cancer

Contribution

If you want to contribute to this project and make it better, your help is very welcome. When contributing to this repository please make a clean pull request.

Acknowledgments

The proposed architecture is inspired by https://github.com/bstriner/keras-adversarial

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
datasets		datasets
feature_extraction/AAE		feature_extraction/AAE
figures/saved_figures		figures/saved_figures
results/saved_results		results/saved_results
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
TopGene.py		TopGene.py
TopGene_pca.py		TopGene_pca.py
_config.yml		_config.yml
aae_architechture.py		aae_architechture.py
aae_single_layer.py		aae_single_layer.py
benchmarking_main.py		benchmarking_main.py
deep_autoencoder.py		deep_autoencoder.py
deep_denoising_autoencoder.py		deep_denoising_autoencoder.py
denoising_autoencoder.py		denoising_autoencoder.py
main.py		main.py
main_pca.py		main_pca.py
shallow_autoencoder.py		shallow_autoencoder.py
variational_autoencoder.py		variational_autoencoder.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Extracting Biologically Relevant Genes using AFExNet from Cancer Transcriptomes [Paper]

Getting Started

Prerequisites

Directory Layout

Usage

Proposed Architecture

Datasets

Validation Data

Contribution

Acknowledgments

About

Releases

Packages

Contributors 2

Languages

License

NeuroSyd/latent-space-discovery

Folders and files

Latest commit

History

Repository files navigation

Extracting Biologically Relevant Genes using AFExNet from Cancer Transcriptomes [Paper]

Getting Started

Prerequisites

Directory Layout

Usage

Proposed Architecture

Datasets

Validation Data

Contribution

Acknowledgments

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages