KoRC is a Knowledge oriented Reading Comprehension Benchmark for Deep Text Understanding
This repo implements several baselines for the benchmark:
- chain-of-thought with text-davinci-002 or GLM-130B
- Seq2Seq-MRC. It directly generates the answer based on the given document and question with Causal Language Model like BART, T5, Flan-T5, etc.
- RAG (Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks)
- EmbedKGQA
- TransferNet
Instructions of how to run these models are described in their README files.
Before trying them, you need to first download the dataset and unzip it into the folder ./dataset
.
The file tree should be like
.
+-- dataset
| +-- train.json
| +-- valid.json
| +-- small_iid_test.json
| +-- small_ood_test.json
+-- Seq2Seq-MRC
| +-- README.md
| +-- train.py
| +-- ...
+-- RAG
+-- EmbedKGQA
...
For environment setup, you can use the KORC.yml file to create a conda environment with all the dependencies installed.