Here, we present the code for our two-step framework for the analysis of multi-modal data, capturing the interplay between the different information layers. This framework consisted of inferring a multi-modal network and embedding the nodes into a low dimensional space for the effective exploration of similarities between nodes and data modalities.
Code for the manuscript Network Embedding across Multiple Tissues Elucidates Multi-modal Context of Host Factors Important for COVID-19 Infection' Yue Hu, Ghalia Rehawi, Lambert Moyon, Christoph Ogris, Janine Knauer-Arloth, Florian Bittner, Annalisa Marsico, Nikola S. Mueller (2022)
Openly available public data Genotype-Tissue Expression (GTEx) from https://gtexportal.org/home/ complemented by confidential data on genotypes and phenotypes which cannot be disclosed here. Special access can be granted by application to NCBI dbGAP Portal (https://dbgap.ncbi.nlm.nih.gov/).
- Install anaconda and snakemake. (We used conda 4.11.0; snakemake-minimal=5.32.1=py_0)
- Conda environment with all packages are to be created automatically by snakemake at each step
Snakemake workflow manager was used for the different steps of the analysis:
- Preprocessing
- Polygenic Risk Score (PRS) calculation
- KiMONo
- Network embedding
- Analysis & Figures
Same directory structure can be found for Snakemake files and code scripts. Instructions on execution of snakemake workflows can be found in each directory in form of README.txt.