Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
-
Updated
Jun 6, 2023 - Python
Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
The prompt engineering, prompt management, and prompt evaluation tool for Kotlin.
Evaluating LLMs with Multiple Problems at once: A New Paradigm for Probing LLM Capabilities
The prompt engineering, prompt management, and prompt evaluation tool for Ruby.
The prompt engineering, prompt management, and prompt evaluation tool for C# and .NET
The prompt engineering, prompt management, and prompt evaluation tool for Go.
The prompt engineering, prompt management, and prompt evaluation tool for TypeScript, JavaScript, and NodeJS.
The prompt engineering, prompt management, and prompt evaluation tool for Python
LLM Security Platform Docs
Generative agents — computational software agents that simulate believable human behavior and OpenAI LLM models. Our main focus was to develop a game - “Werewolves of Miller’s Hollow”, aiming to replicate human-like behavior.
Realign is a AI testing and simulation framework for multi-turn AI applications. It simulates user interactions, evaluates AI performance, and generates adversarial scenarios to test LLM vulnerabilities.
TypeScript SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)
Code for "Prediction-Powered Ranking of Large Language Models", Arxiv 2024.
An open source library for asynchronous querying of LLM endpoints
🎯 Your free LLM evaluation toolkit helps you assess the accuracy of facts, how well it understands context, its tone, and more. This helps you see how good your LLM applications are.
Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)
Бенчмарк сравнивает русские аналоги ChatGPT: Saiga, YandexGPT, Gigachat
A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.
Generate ideal question-answers for testing RAG
Add a description, image, and links to the llm-eval topic page so that developers can more easily learn about it.
To associate your repository with the llm-eval topic, visit your repo's landing page and select "manage topics."