llm-as-judge

Here are 2 public repositories matching this topic...

Code and data for Koo et al's ACL 2024 paper "Benchmarking Cognitive Biases in Large Language Models as Evaluators"

nlp evaluation bias bias-detection llm llms llm-evaluation llms-benchmarking llm-as-judge llm-as-a-judge llm-as-evaluator

Evaluate translations by either a self-hosted Embedder or using Chat-GPT as LLM-as-judge.

Add a description, image, and links to the llm-as-judge topic page so that developers can more easily learn about it.

To associate your repository with the llm-as-judge topic, visit your repo's landing page and select "manage topics."