M.Sc. in Language Technology M915-NLG_NLU Assignment2: Development and Comparison of N-shot QA systems on SQuAD v.1 with a pretrained generative language model.
This is a useful repo for you to experiment with the zero one and few sgot learning scheme on Question answering with the use of LLMs. Here, Flan-T5 Small is used, due to its reported few shot capabilities.
- Zero-shot: no instructions
- One-shot: Single use of prompt template
- Few-shot: Random 9-shots after finetuning
- Exp_Few-shot: 11-shots from Question classification in 11 question types ("What", "How many", "How", "Whom", "Whose", "Who", "When", "Which", "Where", "Why", "Be/Do/etc., "Other"}
Evaluation on the official dev set with typical QA metrics: Exact Match and F1_score
- Increase the window size between k-value of shots (e.g 10-20-30) to check for bigger differences in evaluation metrics
- Instead of random sampling, use k-demonstrations of one single Wikipedia article (e.g Super Bowl, Nikola_Tesla or any other specific document) in order to if the model learns more patterns in one specific document and learns to discern pattterns based on the posed quedstion. (Pay Attention to overfitting!)
- Try larger versions of Flan T5 or any other LLM (e.g a decoder only model like GPT-2)