Code and data for Koo et al's ACL 2024 paper "Benchmarking Cognitive Biases in Large Language Models as Evaluators"
nlp evaluation bias bias-detection llm llms llm-evaluation llms-benchmarking llm-as-judge llm-as-a-judge llm-as-evaluator
-
Updated
Feb 16, 2024 - Jupyter Notebook