conda create -n seakr python=3.10
conda activate seakr
pip install beir==1.0.1 spacy==3.7.2 aiofiles tenacity
python -m spacy download en_core_web_sm
We modify the vllm to get the uncertainty measures.
cd vllm_uncertainty
pip install -e .
Followed by dragin. Use the Wikipedia dump and elastic search to build the retriever
mkdir -p data/dpr
wget -O data/dpr/psgs_w100.tsv.gz https://dl.fbaipublicfiles.com/dpr/wikipedia_split/psgs_w100.tsv.gz
pushd data/dpr
gzip -d psgs_w100.tsv.gz
popd
cd data
wget -O elasticsearch-7.17.9.tar.gz https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.17.9-linux-x86_64.tar.gz # download Elasticsearch
tar zxvf elasticsearch-7.17.9.tar.gz
rm elasticsearch-7.17.9.tar.gz
cd elasticsearch-7.17.9
nohup bin/elasticsearch & # run Elasticsearch in background
python build_wiki_index.py --data_path $YOUR_WIKIPEDIA_TSV_PATH --index_name wiki --port $YOUR_ELASTIC_SERVICE_PORT
For multihop QA datasets, we use the same files as dragin. You can download and unzip it into the data/multihop_data
folder. We provide a packed multihop data files here: multihop_data.zip.
We use an asynchronous reasoning engine to accelerate multi hop reasoning.
python main_multihop.py \
--n_shot 10 \
--retriever_port $YOUR_ELASTIC_SERVICE_PORT \
--dataset_name twowikihop \
--eigen_threshold -6.0 \
--save_dir "outputs/twowikihop" \
--model_name_or_path $YOUR_MODEL_CHECKPOINT_PATH \
--served_model_name llama2-7b-chat \
--max_reasoning_steps 7 \
--max_docs 5
python main_multihop.py \
--n_shot 10 \
--retriever_port $YOUR_ELASTIC_SERVICE_PORT \
--dataset_name hotpotqa \
--eigen_threshold -6.0 \
--save_dir "outputs/hotpotqa" \
--model_name_or_path $YOUR_MODEL_CHECKPOINT_PATH \
--served_model_name llama2-7b-chat \
--max_reasoning_steps 7 \
--max_docs 5
python main_multihop.py \
--n_shot 10 \
--retriever_port $YOUR_ELASTIC_SERVICE_PORT \
--dataset_name iirc \
--eigen_threshold -6.0 \
--save_dir "outputs/iirc" \
--model_name_or_path $YOUR_MODEL_CHECKPOINT_PATH \
--served_model_name llama2-7b-chat \
--max_reasoning_steps 7 \
--max_docs 5
We provide a jupyter notebook eval_multihop.ipynb
to do evaluation. You just need to replace the output jsonline file name with your own output.
The original files are from DPR. We provide a packed version containing top 10 retrieved documents singlehop_data.zip. You can download and unzip it into the data
folder.
python main_simpleqa.py \
--dataset_name tq \
--model_name_or_path $YOUR_MODEL_CHECKPOINT_PATH \
--selected_intermediate_layer 15 \
--output_dir $OUTPUT_DIR
You can evaluate the output in the eval_singlehop.ipynb
notebook