Skip to content

Latest commit

 

History

History
87 lines (50 loc) · 5.87 KB

README.md

File metadata and controls

87 lines (50 loc) · 5.87 KB

GRPM System

GRPM (Gene-Rsid-Pmid-Mesh) system is a comprehensive tool designed to integrate and analyze genetic polymorphism data associated with specific biomedical subjects. It comprises five modules that allow data retrieval, merging, analysis, and incorporation of GWAS data.

medrxiv Manuscript DOI

Overview

Introduction

GRPM System is a Python framework able to build a comprehensive dataset of human genetic polymorphisms associated with nutrition. By combining data from multiple sources and utilizing MeSH terms as a framework, this workflow enables researchers to explore the vast genetic literature in search of variants significantly associated with a specific biomedical subject. The main purpose of developing this resource was to assist nutritionists in investigating gene-diet interactions and implementing personalized nutrition interventions.

Graphical Abstract

Modules

The GRPM System comprises five modules that perform various tasks to facilitate the integration and analysis of genetic polymorphism data associated with nutrition. These modules are as follows:

To try out GRPM System. Run each module separately by clicking the "Open in Colab". Be careful to import all necessary dependencies and files. Google Drive folder synch option available.

Each Jupyter notebook is provided with the code for downloading and installing the necessary requirements for their execution.

No. Notebook Module Description
1. Open In Colab Dataset Builder Retrieves data from LitVar and PubMed databases, merging them into a CSV format.
2. Open In Colab MeSH Selection for Retrieval Defines a coherent MeSH term list for information retrieval over the whole GRPM Dataset using NLP.
3. Open In Colab GRPM Dataset MeSH Query Employs MeSH terms for GRPM dataset retrieval. It extracts a subset of matched entities making a Data Report.
4. Open In Colab GRPM Data Analyzer Analyzes retrieved data and calculates survgey metrics. Data visualization trough matplotlib and seaborn.
5. Open In Colab GRPM-GWAS Data Integration: Integrates GWAS data associating GWAS phenotypes and potential risk/effect alleles with the GRPM Dataset.

GRPM system: Integrating Genetic Polymorphism Data with PMIDs and MeSH Terms to Retrieve Genes and rsIDs for Biomedical Research Fields. GRPM Dataset: pcg, protein coding genes; rna, RNA genes; pseudo, presudogenes; in parentheses, dataset shape.

These modules provide a comprehensive framework for researchers and nutritionists to explore genetic polymorphism data and gain insights into gene-diet interactions and personalized nutrition interventions.

Updates

The GRPM Dataset available on Zenodo is a snapshot of LitVar1. LitVar1 is now deprecated and has been fully replaced by LitVar2. Module 1 (Dataset Builder) has been updated to retrieve data from LitVar2. The subsequent modules in the pipeline remain functional and can be tested using the original version of the GRPM Dataset available on Zenodo.

Installation

To install GRPM System, clone the repository to your local machine:

git clone https://github.com/johndef64/GRPM_system.git

Otherwise, run each module separately in Google Colab importing Google Drive to keep-up your progress.

Usage

Detailed instructions on how to use each module of GRPM System can be found inside the relative Jupyter Module provided in the repository. Make sure to follow the instructions and install the necessary Python packages specified for each module.

Requirements

GRPM System has the following requirements:

  • Python 3.9 or above
  • pandas
  • requests
  • biopython
  • nbib
  • beautifulsoup
  • openai
  • matplotlib
  • seaborn
  • nltk