More extensive and easily applicable deep graph attention model for multi-omics biomarker discovery. Original manuscript available here. Here are the major updates in GOAT version 2.0
- Random walk positional encoding of genes in gene-gene interaction graph is added to reflect global structure of the graph (
GOAT_v2
model ingoat/model.py
). - Base library for GNN implementation transformed from PyG to dgl (https://www.dgl.ai/).
Create conda environment.
conda create --name goat python=3.9.19
conda activate goat
conda update -n base -c defaults conda
pip install --upgrade pip
Install required packages.
pip install -r requirements.txt
Install GOAT2.0.
pip install -e .
Gene-gene interaction network from STRING database (https://string-db.org) and gene list to filter the network is required. In data directory specified in configuration file, omics data (patient X gene) and patient label (patient X label) should be stored. The following script will generate pickle file to be used to generate custom dataset object in './goat/dataset.py' that inherits torch.Dataset object.
python preprocessing/preprocessing.py -taskConfig ./configs/tasks/TCGA-LUAD_TMB.yaml
You can specify model hyper-parameters in configs/models/model_*.yaml
. Available models are MLP
, GOAT
, GOAT_v2
.
You can specify datasets and datasplits in configs/tasks/*.yaml
.
python ./demo/test_on_in_distribution_dataset.py -train True -modelConfig configs/models/model_GOAT.yaml -taskConfig configs/tasks/TCGA-LUAD_TMB.yaml -outDir result_test
@article{jeong2023goat,
title={GOAT: Gene-level biomarker discovery from multi-Omics data using graph ATtention neural network for eosinophilic asthma subtype},
author={Jeong, Dabin and Koo, Bonil and Oh, Minsik and Kim, Tae-Bum and Kim, Sun},
journal={Bioinformatics},
volume={39},
number={10},
pages={btad582},
year={2023},
publisher={Oxford University Press}
}