Feature Tracker: Evaluation Benchmarks #237

rmusser01 · 2024-09-08T19:05:07Z

rmusser01 · 2024-11-05T01:34:51Z

Using an LLM as a Response Judge

Some metrics cannot be defined objectively and are particularly useful for more subjective or complex criteria. We care about correctness, faithfulness, and relevance.

    Answer Correctness - Is the generated answer correct compared to the reference and thoroughly answers the user's query?
    Answer Relevancy - Is the generated answer relevant and comprehensive?
    Answer Factfulness - Is the generated answer factually consistent with the context document?

rmusser01 added this to the Continual-Improvements milestone Sep 8, 2024

rmusser01 self-assigned this Nov 17, 2024

rmusser01 added enhancement New feature or request Feature-Addition labels Nov 17, 2024

rmusser01 pinned this issue Nov 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Tracker: Evaluation Benchmarks #237

Feature Tracker: Evaluation Benchmarks #237

rmusser01 commented Sep 8, 2024 •

edited

Loading

rmusser01 commented Nov 5, 2024

Feature Tracker: Evaluation Benchmarks #237

Feature Tracker: Evaluation Benchmarks #237

Comments

rmusser01 commented Sep 8, 2024 • edited Loading

rmusser01 commented Nov 5, 2024

rmusser01 commented Sep 8, 2024 •

edited

Loading