Scripts to perform pairwise t-test on TREC run files.
- R
- gdeval.pl - note this is a fork that adds options
-k <cutoff>
and-j <max judgment>
to the original trec-web/trec-web-2013 - trec_eval
- rbp_eval
There are two bash scripts to run. First run pairwise-eval.sh
to evaluate the
TREC run files. Then run pairwise-ttest.sh
to compute statistical
significance.
The bash scripts assume that rbp_eval
, gdeval.pl
and trec_eval
can be
found in your PATH
environment.
To compute a pairwise t-test of all run files in the runs
directory for
NDCG@10 using foo.qrels
(which contains the relevance judgments), run
the following:
./pairwise-eval.sh ndcg 10 foo.qrels runs/*.run
./pairwise-ttest.sh runs/*.run.ndcg10 > result.txt
cat result.txt
The pairwise-eval.sh
script can compute ERR, NDCG, RBP and MAP. gdeval.pl
is used for ERR and NDCG, rbp_eval
for RBP, and trec_eval
is used for MAP.