Paper | Installation | To Run | Results
FOLK is a claim verification model that can verify complex claims and generate explanations without the need for annotated evidence using Large Language Models (LLMs).
If our code or data helps you in your research, please kindly cite us:
@article{wang2023explainable,
title={Explainable Claim Verification via Knowledge-Grounded Reasoning with Large Language Models},
author={Wang, Haoran and Shu, Kai},
journal={arXiv preprint arXiv:2310.05253},
year={2023}
}
Install conda environment from environment.yml
file.
conda env create -n folk --file environment.yml
conda activate folk
Please add your OpenAI and SerpApi key in keys.py
file.
To decompose claims:
python decompose.py \
--dataset ["hover", "feverous", "scifact"] \
--hover_num_hop ["two", "three", "four"] \
--feverous_challenge ["numerical", "reasoning", "table"] \
--prompt_strategy ["direct", "cot", "self-ask", "logic"] \
--model ["llama-7b", "llama-13b", "llama-30b", "text-davinci"] \
--version "DEFINE YOUR VERSION" \
--max_token 1024 \
--temperature 0.7
To ground answers:
python groudning.py \
--dataset ["hover", "feverous", "scifact"] \
--hover_num_hop ["two", "three", "four"] \
--feverous_challenge ["numerical", "reasoning", "table"] \
--prompt_strategy ["direct", "cot", "self-ask", "logic"] \
--model text-davinci \
--model ["llama-7b", "llama-13b", "llama-30b", "text-davinci"] \
--version "DEFINE YOUR VERSION" \
To make predictions:
python aggregate.py \
--dataset ["hover", "feverous", "scifact"] \
--hover_num_hop ["two", "three", "four"] \
--feverous_challenge ["numerical", "reasoning", "table"] \
--prompt_strategy ["direct", "cot", "self-ask", "logic"] \
--model ["llama-7b", "llama-13b", "llama-30b", "text-davinci"] \
--version "DEFINE YOUR VERSION" \
--max_token 1024 \
--temperature 0.7
To evaluate:
python evaluation.py \
--dataset hover \
--hover_num_hop three \
--prompt_strategy logic \
--model text-davinci \
--version "V1.0"
The experiment results reported in Table 2 from the paper are listed in Final_Results
folder. To evaluate the results, please execute the following script:
./results.sh
The ProgramFC baseline is contained in ProgramFC
folder. The code is modified from the original repo to process dataset used in the paper.