xFinder: Robust and Pinpoint Answer Extraction for Large Language Models
benchmark
regex
reliability
evaluation
dataset
gpt
phi
large-language-models
llm
open-compass
chatglm
qwen
lm-evaluation
xfinder
reliable-evaluation
key-answer-extraction
-
Updated
Oct 15, 2024 - Python