PathIntegrate Python package for pathway-based multi-omics data integration
As terabytes of multi-omics data are being generated, there is an ever-increasing need for methods facilitating the integration and interpretation of such data. Current multi-omics integration methods typically output lists, clusters, or subnetworks of molecules related to an outcome. Even with expert domain knowledge, discerning the biological processes involved is a time-consuming activity. Here we propose PathIntegrate, a method for integrating multi-omics datasets based on pathways, designed to exploit knowledge of biological systems and thus provide interpretable models for such studies. PathIntegrate employs single-sample pathway analysis to transform multi-omics datasets from the molecular to the pathway-level, and applies a predictive single-view or multi-view model to integrate the data. Model outputs include multi-omics pathways ranked by their contribution to the outcome prediction, the contribution of each omics layer, and the importance of each molecule in a pathway.
- Pathway-based multi-omics data integration using PathIntegrate Multi-View and Single-View models
- Multi-View model: Integrates multiple omics datasets using a shared pathway-based latent space
- Single-View model: Integrates multi-omics data into one set of multi-omics pathway scores and applies an SKlearn-compatible predictive model
- Pathway importance
- Sample prediction
- NEW unsupervised SingleView models (dimensionality reduction and clustering in the pathway space)
- SKlearn-like API for easy integration into existing pipelines
- Support for multiple pathway databases, including KEGG, Reactome, PathBank, and custom GMT files
- Support for multiple pathway scoring methods available via the sspa package
- Cytoscape Network Viewer app for visualizing pathway-based multi-omics data integration results
At least 8BG RAM recommended. PathIntegrate models can run on a Google Colab notebook server (see walkthrough tutorial below with example data).
PathIntegrate has been tested on MacOs, Windows 10 and Linux. Python 3.10 or higher is required. Python dependencies are listed in the requirements.txt file.
pip install PathIntegrate
Please see our Quickstart guide on Google Colab
Full documentation and function reference for PathIntegrate can be found via our ReadTheDocs page
If you use PathIntegrate in your research, please consider citing our paper:
@article{Wieder2024,
author = {Cecilia Wieder and Juliette Cooke and Clement Frainay and Nathalie Poupin and Russell Bowler and Fabien Jourdan and Katerina J. Kechris and Rachel P.J. Lai and Timothy Ebbels},
doi = {10.1371/JOURNAL.PCBI.1011814},
issue = {3},
journal = {PLOS Computational Biology},
month = {3},
pages = {e1011814},
pmid = {38527092},
publisher = {Public Library of Science},
title = {PathIntegrate: Multivariate modelling approaches for pathway-based multi-omics data integration},
volume = {20},
url = {https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1011814},
year = {2024},
}
Check out the following papers to see how PathIntegrate has been used in research:
GNU GPL v3
- Jude Popham @judepops