GAMBIT

A C++ tool for Gene-based Analysis with oMniBus, Integrative Tests

Implements several gene-based test forms (quadratic: weighted sum of Zsq, linear: weighted sum of Z, and maximum Zsq) to aggregate GWAS single-variant summary statistics cross-referenced with variant- or region-based functional annotations
Calculates annotation-stratified gene-based tests (e.g., TWAS/PrediXcan tests using eSNPs, gene-based tests using only coding variants, and gene-based tests using enhancer-to-target-gene maps), and omnibus tests by combining p-values for each gene
Inputs: GWAS association summary statistics file (chromosome, position, ref/alt allele, and z-score or beta-hat + se), annotation files, and LD reference panel

GWAS Summary Statistics

GWAS summary statistics files can be specified via --gwas my_summary_stats.txt.gz. Input files must be ordered by chromosome and genomic position, with input fields as shown below:

#CHR  POS     REF  ALT  SNP_ID      N         ZSCORE   ANNO
1     721290  C    G    rs12565286  58663.62  0.86661  Intergenic
1     752566  G    A    rs3094315   57135     0.5521   Intergenic
1     775659  A    G    rs2905035   54570     1.12098  Intron:LOC643837
1     777122  A    T    rs2980319   54570     1.11906  Exon:LOC643837

The first four fields and ZSCORE are required, while SNP_ID, ANNO and N (effective sample size) are optional.
See format_gwas_summary_stats.sh for annotating GWAS summary statistics files using EPACTS/TabAnno.

Annotation-Stratified Gene-Based Tests

Gene-Based Analysis with Regulatory Elements

To compute gene-based tests using regulatory element annotations, specify an annotation bed file with regulatory-element-to-target-gene weights via --anno-bed my_reg_elems.txt.gz, formatted

#CHR  START   END     CLASS     ELEMENT_ID          TARGET_GENES                     ANNO
chr1  567400  567600  Enhancer  chr1:567400:567600  MIB2:4.12|CPTP:2.53|GLTPD1:2.53  .
chr1  568000  568200  Enhancer  chr1:568000:568200  ATAD3A:2.75                      .
chr1  758600  758800  Enhancer  chr1:758600:758800  C1orf170:2.57|PERM1:2.57         .
chr1  769200  769400  Enhancer  chr1:769200:769400  C1orf170:3.36|PERM1:3.36         .

Association tests for individual regulatory elements is reported in *.stratified_out.txt files, and gene-based p-values (aggregating across regulatory elements for each gene) in *.summary_out.txt files.
Aggregation Methods for Regulatory Elements. By default, GAMBIT aggregates test statistics across variants in regulatory elements using a weighted sum of single-variant chi-squared statistics (SKAT gene-based test). To instead use weighted ACAT or HMP to combine single-variant p-values, specify --no-skat and a p-value combination method via --pcomb.

Gene-Based Analysis with Coding and Other Annotated Variants

To compute gene-based tests using coding and other variants, GAMBIT relies on the ANNO field in GWAS summary statistics and an annotation hierarchy definitions file specified via --anno-defs my_defs.txt, formatted as below:

#CLASS    SUBCLASS          ANNO_TERMS
Coding    Protein_Altering  Nonsynonymous,Start_Loss,Stop_Gain,Stop_Loss,CodonGain,CodonLoss,Frameshift
Coding    Splice_Site       Essential_Splice_Site,Normal_Splice_Site
Coding    Exon_Other        Exon,Synonymous
UTR       UTR3              Utr3
UTR       UTR5              Utr5

The ANNO_TERMS field specifies a comma-separated list of annotation terms (matching terms from the GWAS summary statistics file's ANNO field), and CLASS and SUBCLASS determine the annotation hierarchy and classes reported in output files.
Gene-Based Test Output. Test statistics stratified by gene and annotation subclass are provided in *.stratified_out.txt files, and gene-based p-values (aggregating across annotation classes for each gene) in *.summary_out.txt files.
Variant Aggregation Methods. By default, GAMBIT aggregates test statistics across variants using a weighted sum of single-variant chi-squared statistics (SKAT gene-based test). To instead use weighted ACAT or HMP to combine single-variant p-values, specify --no-skat and a p-value combination method via --pcomb.

TWAS Analysis

To compute TWAS/PrediXcan gene-based tests using GAMBIT, specify an eWeight file via --eweights my_eWeights.txt.gz, formatted

##TISSUE_IDS=0:Adipose_Subcutaneous,1:Adipose_Visceral_Omentum,2:Adrenal_Gland,3:Artery_Aorta
#CHR  POS     RSID       REF  ALT  BETAS
1     752566  rs3094315  G    A    C1orf159=3.92e-02@0|UBE2J2=-1.49e-02@0|FAM87B=2.75e-01@1;1.25e-01@2;1.17e-01@3
1     752721  rs3131972  A    G    LINC00115=1.15e-01@0;1.75e-02@3;4.90e-02@4|RP11-206L10.8=3.21e-02@1
1     754182  rs3131969  A    G    LINC00115=-2.1e-02@1|RP5-857K21.2=-8.27e-02@2|RP11-206L10.9=-1.11e-01@2
1     760912  rs1048488  C    T    C1orf159=3.35e-04@0|TTLL10=-1.4e-02@3|FAM87B=1.75e-01@1;1.12e-01@2;9.51e-02@3|SAMD11=-1.27e-02@2

The BETAS field format is eGene_A=Weight_A1@Tissue_A1;Weight_A2@Tissue_A2|eGene_B=Weight_B1@Tissue_B1, and labels for tissue IDs can be specified in the header.
Subsetting tissues. To restrict analysis to a subset of tissues/cell-types, specify a comma-separated list of tissues following the --tissues flag. By default, GAMBIT includes all tissues/cell-types present in the eWeight file.
Tissue Aggregation for Omnibus tests. GAMBIT reports both single-tissue TWAS/PrediXcan analysis results, and omnibus tests results aggregating across all specified tissues/cell-types for each eGene. Omnibus p-values for multi-tissue TWAS/PrediXcan analysis can be calculated in GAMBIT using either 1) the maximum single-tissue test statistic based on the joint distribution of single-tissue statistics, 2) the sum of squared single-tissue z-scores (analogous to SKAT), or 3) PCOMB for ACAT or HMP [default]. Omnibus test method for multi-tissue analysis can be specified via --tissue-aggreg (PCOMB, MinP, SKAT, or ALL). P-value combination method can be specified via --pcomb (ACAT or HMP).
Single-tissue and omnibus test output. Gene-based tests and p-values for each eGene-tissue pair are reported in *.stratified_out.txt files, and omnibus p-values (aggregating across all tissues for each eGene) in *.summary_out.txt files.

dTSS-Weighted Gene-Based Tests

To incorporate un-annotated regulatory variants in gene-based analysis, GAMBIT implements a dTSS (distance to Transcription Start Site) weighted gene-based test, which aggregates all single-variant p-values within a specified window from each gene's TSS using weighted ACAT or HMP and assigns higher weight to variants nearer the TSS using an exponential decay function.
To compute dTSS-weighted gene-based tests, specify a TSS bed file via --tss-bed my_tss_bed.bed.gz, fomatted

#CHR  START   END     SYMBOL      GENE             GENE_ANNO
1     11868   11869   DDX11L1     ENSG00000223972  transcribed_unprocessed_pseudogene
1     62947   62948   OR4G11P     ENSG00000240361  transcribed_unprocessed_pseudogene
1     69090   69091   OR4F5       ENSG00000186092  protein_coding
1     131024  131025  CICP27      ENSG00000233750  processed_pseudogene

Window size. The window size for dTSS-weighted gene-based tests can be modified by specifying --tss-window BASEPAIRS (500 Kbp by default).
dTSS decay function. The relative weight assigned to variants nearer/farther from the TSS can be modified by specifying --tss-alpha ALPHA, where alpha=0 implies all variants receive equal weight, and larger values confer more weight to variants nearer the TSS. --tss-alpha also accepts comma-separated lists of alpha values, in which case GAMBIT computes global test p-values across all specified values (individual p-values are reported in INFO output field). By default, GAMBIT uses dTSS alpha values 1e-4,5e-5,1e-5,5e-6.

Methods References

Statistical methods implemented in GAMBIT:

Sequence Kernel Association Test (SKAT): Wu et al. (2011), AJHG
TWAS: Gusev et al. (2016), Nat Genet
PrediXcan: Gamazon et al. (2015), Nat Genet and Barbeira et al. (2018), Nat Comm
Aggregated Cauchy Association Test (ACAT): Liu et al. (2019), AJHG and Liu and Xie (2018), arXiv
Asymptotically exact Harmonic Mean P-value (HMP): Wilson (2019), PNAS

Software References

Libraries and resources used or adapted in GAMBIT:

PDF, CDF, and quantile functions
Hopscotch hashing
- tsl library, Tessil et al.
Tabix and HTSLIB
- tabixpp, ekg et al.
- htslib, samtools team
Matrix libraries
- Eigen matrix library

Feedback and bug reports

Feel free to contact Corbin Quick (corbinq@gmail.com) with questions, bug reports, or feedback

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
bin		bin
copyrights		copyrights
src		src
LICENCES		LICENCES
Makefile		Makefile
README.md		README.md
defs.txt		defs.txt
format_gwas_summary_stats.sh		format_gwas_summary_stats.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GAMBIT

GWAS Summary Statistics

Annotation-Stratified Gene-Based Tests

Gene-Based Analysis with Regulatory Elements

Gene-Based Analysis with Coding and Other Annotated Variants

TWAS Analysis

dTSS-Weighted Gene-Based Tests

Methods References

Software References

Feedback and bug reports

About

Releases

Packages

Languages

corbinq/GAMBIT

Folders and files

Latest commit

History

Repository files navigation

GAMBIT

GWAS Summary Statistics

Annotation-Stratified Gene-Based Tests

Gene-Based Analysis with Regulatory Elements

Gene-Based Analysis with Coding and Other Annotated Variants

TWAS Analysis

dTSS-Weighted Gene-Based Tests

Methods References

Software References

Feedback and bug reports

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages