gps_cpp
is a C++ library which provides an implementation of the genome-wide pairwise-association signal sharing test (the 'GPS test') invented by Li et al.
This project's directory structure was borrowed from a sample project provided by Henry Schreiner here.
We compute the GPS test statistic for p-values from a pair of GWAS using the computeGpsCLI
application. We generate null realisations of the GPS test statistic with the permuteTraitsCLI
application. You can see the use of these programs in the snakemake
pipeline for our publication 'Accurate detection of shared genetic architecture from GWAS summary statistics in the small-sample context' here and in use to the end of actually discovering something (rather than merely evaluating the test's performance) in the pipeline for the paper 'Leveraging pleiotropy identifies common-variant associations with selective IgA deficiency' here.
This program expects a tab-separated, uncompressed file with two columns of p-values with labels corresponding to those passed as the colLabelA
and colLabelB
command-line arguments.
The essential command-line arguments are as follows:
--inputFile/-i
: path to input file--outputFile/-o
: path to output file--colLabelA/-a
: label of first p-value column--colLabelB/-b
: label of second p-value column--traitA/-c
: label of first trait for output file--traitB/-d
: label of second trait for output file
In addition the following are optional arguments:
--logFile/-g
: path to file in which to log input file values which could not be read
This program expects a tab-separated, uncompressed file with two columns of p-values with labels corresponding to those passed as the colLabelA
and colLabelB
command-line arguments.
The work of running the permutations is mapped over the cores provided (with the number of cores specified by the --cores
argument).
Note that the Perisic and Posse ecdf algorithm is used for this program as it is the fastest of the three we considered.
The essential command-line arguments are as follows:
--inputFile/-i
: path to input file--outputFile/-o
: path to output file--colLabelA/-a
: label of first p-value column--colLabelB/-b
: label of second p-value column--draws/-d
: the number of permutations to generate
In addition the following are optional arguments:
--cores/-n
: number of cores
The C++ code in this repository suffices to compute the GPS statistic, but not to obtain a p-value for it. We've included a simple CLI R program in the R
directory which we use to compute a GPS test p-value using a null realisations of the GPS test statistic. The script fits a generalised extreme value distribution (GEVD) to null realisations of the GPS statistic and reports a p-value plus the GEVD parameter estimates and their standard errors.
gps_cpp
is built with CMake. With gps_cpp
as your working directory:
mkdir build
cd build
cmake ..
make
gps_cpp
depends on Boost, specifically the multi_index
library. Earlier I required that this be installed on the user's machine prior to the gps_cpp
build, but I now (October '24) use the FetchContent
feature of cmake
to download Boost to get multi_index
(this is a header-only library). This should save you quite the headache in installing Boost if you're not familiar with this sort of thing (and even if you are!)
gps_cpp
also depends on the rapidcsv
, Catch2
, and CLI11
libraries, but these should be downloaded and built as part of the build process (again using FetchContent
). See the CMakeLists.txt
files for more details.
Catch2
tests can be run from the top-level gps_cpp
directory with ./build/test/testGps
; it's necessary to run them from this working directory as several tests depend on data files in gps_cpp/test/data
directory.
There is also a test
make
target, so you can run make test
with build
as your working directory. Note that this will not rebuild the tests even if their dependencies have changed (this is apparently a long-standing issue/design decision with CMake
) and its output is less informative.
A docker
image containing the program can be found here and can be obtained with
docker pull twillis209/gps-cpp:latest
This code is licensed under the Lesser GPL. Earlier versions licensed components from the StOpt
library, also. These have since been removed as we've moved to a different algorithm for computing the bivariate ecdf.