Skip to content

Latest commit

 

History

History
28 lines (21 loc) · 2.7 KB

CHANGELOG.md

File metadata and controls

28 lines (21 loc) · 2.7 KB

Change Log

v1.0 (2024-04-26)

🚀 Release Highlight

💅 Features

  • Functionality - Supported Evaluations: InspectorRAGet supports any human and automatic evaluations in a unified interface, from numerical metrics such as F1 and Rouge-L, to LLM-as-a-judge metrics such as faithfulness or answer relevance, to any human evaluation metrics, whether numerical or categorical.
  • Integration - Experiment Runner: Python notebook demonstrating how to run experiments using Hugging Face assets (datasets, models, and metric evaluators) and output the results in the JSON format expected by InspectorRAGet
  • Functionality - Input Validator: Validate input file on load and return informative error messages if any data is missing or incorrectly formatted
  • Views - Data Characteristics: New view! Displays several informative visualizations of the input data
  • Views - Predictions: New view! Sortable list of the input text and all corresponding model responses, filterable on all available data enrichments
  • Views - Performance Overview: Aggregate values show mean, std and agreement levels, for all human and automatic metrics
    • Functionality - Aggregate: Aggregator function can be specified for each metric independently
  • Views - Model Behavior: Filter evaluations on any available enrichments and analyze per-metric performance across all models; display a sortable and filtered table of all tasks, with access to ever tasks Detail view
  • Views - Task Detail: Examine all detailed task information including the context(s), input text, all model outputs, and all evaluation metric values
    • Functionality - Sympathetic Highlighting: In the Task Detail view, click to highlight the common text between the response and the context(s)
    • Functionality - Annotate: In the Task Detail view, flag a task instance, or add comments to individual components, which will persist for the rest of the session
    • Functionality - Copy to Clipboard: In the Task Detail view, instantly copy all individual task detail information in text, JSON or LaTeX format
  • Views - Model Comparator: Calculate statistical similarity between the distributions of two models on a given metric; display a table of all tasks with different model aggregate scores, with access to every task's Detail view
  • Views - Metric Behavior: Show correlations of any selected metrics, optionally for a subset of all models, to provide insights on metric definitions and relationships
  • Functionality - Export: Export the evaluation file, including all new annotations of comments and flag, to save your enriched file for future analysis
  • Functionality - Dark Mode: Switch the entire UI to dark mode!