This repository is home to WebSTR-API - REST-full API for and backend for WebSTR - portal of Human genome-wide variation in Short Tandem Repeats (STRs). Our goal is to make large STR genotype datasets used by the broader genomics community by facilitating open access to this data.
WebSTR is the result of collaboration between two scientific groups Maria Anisimova’s Lab and Melissa Gymrek’s Lab.
Source code for the WebSTR web portal can be found here: https://github.com/gymrek-lab/webstr
All the available endpoints are described in automatically generated documentation that includes Python code examples and can be accessed here - http://webstr-api.ucsd.edu/docs
Yes, for that please use provided Docker file, WebSTR-API can be deployed on any container-based service.
Yes! It is possible and we encourage it if you would like to add your own data to WebSTR or perform any advanced analysis on it.
Install and configure PostgreSQL on your machine and create an empty database called strdb. We provide an sql_dump backup of the current version of the database on request. Restore the database from this backup.
a) Set up python3 and virtualenv on your machine: For Mac, follow instructions here. You can also use conda, in this case follow this instructions to create conda env, it is preffered for newer M1/2 Macs and for infrustructures that already use conda. Activate your environment.
b) Create new virtual env and install all the requirements with the following command:
pip install -r requirements.txt
export DATABASE_URL="postgres://postgres:YOURPASSWORD@localhost:5432/strdb"
Note that this is using the default user postgres, if you created your db on a different user, adjust this variable accordingly.
Optional: add this line to ~/.bashrc
and restart your terminal.
Run the following command from the root folder of this repo:
uvicorn strAPI.main:app --host=0.0.0.0 --port=${PORT:-5000} --reload
We recommend to start from making it work locally on your machine from a ready sql_dump that we provide upon request. Se instructions above. We also provide Python scripts for working with the ORM (abstraction layer on top of the database) to import new data into database. Explore "database_setup" directory for different utilities to import data into the database.
-
If you would like to add a new genome assembly see utility add_genomes. Example usage:
python add_genomes.py -d PATH_TO_DB
Modify the script according to your data.
You will also need to import a GTF file corresponding to this assembly using gtf_to_sql.py
Genes, transcripts and exoms currently available for hg38(GRCh38.p2) assembly have been imported from Encode.
-
To add a new reference panel description and study cohort, use add_panels_and_cohorts.py
-
If you would like to import a new reference panel we recommend making a csv corresponding to the repeats table structure and importing it directly to SQL to save time. Alternatively see
insert_repeats.py
andimport_data_ensembltrs.py
utilities that we made for repeats data coming in different formats. Feel free to contact us for more details if you would like to make your own reference STR panel.