Skip to content

Latest commit

 

History

History
18 lines (14 loc) · 1.65 KB

insights_analytics.md

File metadata and controls

18 lines (14 loc) · 1.65 KB

How to Get Realtime Insights and Analytics from your Evaluation

This tutorial will guide you through the process of getting realtime insights and analytics from your human evaluation.

Glossary

  • SLaM: SLaM is a framework for human evaluation of language models for different tasks. It is designed to be flexible and easy to use, and it is built using jaclang.
  • Human Evaluation: Human evaluation is the process of evaluating the performance of a language model by asking humans which is the best out of a given set of outputs (the identity of the model is hidden from the human evaluators). This is done to understand how well the model is performing and to compare different models for a given task.
  • Task: The task is the specific problem that the language model is trying to solve. For example, the task could be to generate a summary of a given text, or to generate a response to a given prompt.
  • Language Model: A language model is a model that is trained to generate text. It is trained on a large corpus of text and is used to generate text that is similar to the text in the training corpus.
  • Win: The output that is selected as the best out of 2 by the human evaluator for a given question.
  • Loss: The output that is not selected as the best of 2 by the human evaluator for a given question.
  • Tie: Both the outputs are selected as nearly same by the human evaluator for a given question.

Prerequisites

Follow the steps given in the README to install SLaM and its dependencies. Also, make sure to have the human evaluation setup ready and running or ended.

Insights and Analytics