You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Using an LLM as a Response Judge
Some metrics cannot be defined objectively and are particularly useful for more subjective or complex criteria. We care about correctness, faithfulness, and relevance.
Answer Correctness - Is the generated answer correct compared to the reference and thoroughly answers the user's query?
Answer Relevancy - Is the generated answer relevant and comprehensive?
Answer Factfulness - Is the generated answer factually consistent with the context document?
Title.
Benchmarks:
Evaluation Methodologies
Coding Ability
Confabulation Rate
Context Length
Creative Writing
Pop Culture
Reasoning
Role Playing
Summarization
Tool Calling
Toxicity Testing
Vibes
The text was updated successfully, but these errors were encountered: