Provides both CPU and GPU functions for the Stetson J statistic (Stetson 1996).
Requires:
- CUDA-enabled device
- CUDA Toolkit >= 7.5
- A suitable C++ compiler (e.g.
g++
)
-
Clone this repository
-
Edit the Makefile accordingly
ARCH
-- the Compute Capability number for your device, e.g. 5.2 becomes52
REAL_TYPE
-- you can switch between double and single precision by editing this Makefile variable to eitherdouble
orfloat
.CUDA_VERSION
-- make sure you have the CUDA Toolkit installed; this code was developed using version 7.5- The
BLOCK_SIZE
is also something to play around with but is optional to change.
-
Run
make
to generate a binary that you can run for testing. This is mainly for my own debugging purposes, but you can inspect the code to get an idea of how to call thestetson_j_cpu
andstetson_j_gpu
functions.
usage: ./CudaStet <filename> <{s, l}> <skiprows>
filename : path to datafile with either a list of filenames
or a path to a single file containing 'x y yerr'
on each line
s : single file
l : list of files
skiprows : number of rows to skip in the datafile
Arguments (in order):
real_type *x
: values for independent variablereal_type *y
: values for dependent variablereal_type *err
: values for uncertainty of dependent variableweight_type WEIGHTING
: Must be one ofCONSTANT
: all pairs of observations are weighted equallyEXP
: pairs of observations are exponentially suppressed by their distance inx
(usesexp(-|t1 - t2| / median(dt))
). See Zhang et. al. 2003 for a real-world application of this weighting scheme. Note however that Zhang et. al. 2003 used the mean function instead of the median.
int N
: number of datapoints
- Be careful with single precision. If you're using the exponential weighting scheme, the accuracy for ~50,000 datapoints is about 10^(-3) for both the CPU and GPU variants. For a constant weighting scheme, however, the CPU variant is very inaccurate (off by factors of 100 or more) at 50,000 datapoints, while the GPU variant remains accurate to a factor of 10^(-3).
- Python bindings with Swig
- Addition of Stetson K and L variability indices
- Adding configure script to simplify the install process
Some tests run on an Ubuntu 14.04 desktop with an i7-5930K overclocked to 4.5GHz, and a 980 Ti graphics card. For these
timing tests, CudaStet
was compiled with single precision
and using the -O3
optimization flag for g++
and
--use_fast_math
for nvcc
.
Read as:
N
dt
Where N
is the number of data points, dt
is the execution
time in seconds.
timing : (CPU, EXPONENTIAL)
10 9.000e-06
1009 3.612e-02
2008 9.482e-02
3007 1.367e-01
4006 1.922e-01
5005 2.986e-01
6004 4.134e-01
7003 4.819e-01
8002 6.265e-01
9001 7.888e-01
timing : (GPU, EXPONENTIAL)
10 1.648e-01
1009 1.351e-03
2008 2.490e-03
3007 3.450e-03
4006 4.739e-03
5005 5.822e-03
6004 7.672e-03
7003 8.494e-03
8002 9.271e-03
9001 1.066e-02
timing : (CPU, CONSTANT)
10 1.000e-06
1009 5.004e-03
2008 1.877e-02
3007 3.833e-02
4006 6.095e-02
5005 9.584e-02
6004 1.385e-01
7003 1.869e-01
8002 2.532e-01
9001 2.829e-01
timing : (GPU, CONSTANT)
10 1.740e-04
1009 9.020e-04
2008 1.727e-03
3007 3.200e-03
4006 4.346e-03
5005 4.290e-03
6004 5.602e-03
7003 7.210e-03
8002 7.446e-03
9001 8.685e-03