Skip to content

Commit

Permalink
[feature] add cli log10 completions benchmark_models to compare a l…
Browse files Browse the repository at this point in the history
…ogged completion with other models (#141)


* create log10 completions report to compare models from ids or tags

* generate report and also do autoprompt analysis with --analyze_prompt

* minor: enable mistral models

* add and check supported models for gpt, claude-3, and mistral
log the completions

* updates:
- cli docs
- skip failed completions
- remove double log10 patch
- add pandas in deps
- fix autofeedback import issue

* format

* allow to benchmark the same model

* add an example to test multiple log10 load only log once

* fix split models and completion ids with ending comma or multiple comma

* apply same for tags

* minor: skip id is its kind is not chat

* fix tags if not found, don't return any completions

* minor - add a warning if log10 load patched module

* minor clean up example

* move log10_load_twice.py to tests
  • Loading branch information
wenzhe-log10 authored Apr 17, 2024
1 parent d61ca40 commit ac2e177
Show file tree
Hide file tree
Showing 10 changed files with 539 additions and 10 deletions.
93 changes: 90 additions & 3 deletions cli_docs.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,64 @@ output (only showing part of the full raw output):
}
```

You can load completions' prompt messages and compare with other LLM models by using [`log10 completions benchmark_models`](#log10-completions-benchmark_models).
For instance,
```bash
log10 completions benchmark_models --ids 25572f3c-c2f1-45b0-9de8-d96be4c4e544 --models=gpt-3.5-turbo,mistral-small-latest,claude-3-haiku-20240307
```
output
```
Running gpt-3.5-turbo
Running mistral-small-latest
Running claude-3-haiku-20240307
completion_id: 25572f3c-c2f1-45b0-9de8-d96be4c4e544
original_request:
{
"model": "gpt-4-0125-preview",
"messages": [
{
"role": "system",
"content": "Summarize the article in 30 words."
},
{
"role": "user",
"content": "\"Story of Your Life\" is a science fiction novella by American writer Ted Chiang, first published in Starlight 2 in 1998, and in 2002 in Chiang's collection of short stories, Stories of Your Life and Others. Its major themes are language and determinism. \"Story of Your Life\" won the 2000 Nebula Award for Best Novella, as well as the 1999 Theodore Sturgeon Award. It was nominated for the 1999 Hugo Award for Best Novella. The novella has been translated into Italian, Japanese, French and German.[1] A film adaptation of the story, Arrival, was conceived and adapted by Eric Heisserer. Titled and directed by Denis Villeneuve, it was released in 2016. It stars Amy Adams, Jeremy Renner, and Forest Whitaker and was nominated for eight Academy Awards, including Best Picture and Best Adapted Screenplay; it won the award for Best Sound Editing.[2][3][4] The film also won the 2017 Ray Bradbury Award for Outstanding Dramatic Presentation and the Hugo Award for Best Dramatic Presentation."
}
],
"temperature": 0.2
}
╭─────────────────────────┬───────────────────────────────────────────────────────┬──────────────────────────────────┬───────────────╮
│ Model │ Content │ Total Token Usage (Input/Output) │ Duration (ms) │
├─────────────────────────┼───────────────────────────────────────────────────────┼──────────────────────────────────┼───────────────┤
│ gpt-4-0125-preview │ "Story of Your Life" by Ted Chiang explores language │ 323 (255/68) │ 2527 │
│ │ and determinism, winning the 2000 Nebula and 1999 │ │ │
│ │ Theodore Sturgeon Awards. Adapted into the film │ │ │
│ │ "Arrival" by Denis Villeneuve in 2016, it received │ │ │
│ │ critical acclaim and multiple awards, including an │ │ │
│ │ Academy Award for Best Sound Editing. │ │ │
├─────────────────────────┼───────────────────────────────────────────────────────┼──────────────────────────────────┼───────────────┤
│ gpt-3.5-turbo │ "Story of Your Life" is a science fiction novella by │ 295 (255/40) │ 2345 │
│ │ Ted Chiang, exploring themes of language and │ │ │
│ │ determinism. It won awards and was adapted into the │ │ │
│ │ film Arrival in 2016. │ │ │
├─────────────────────────┼───────────────────────────────────────────────────────┼──────────────────────────────────┼───────────────┤
│ mistral-small-latest │ "Story of Your Life" is a Ted Chiang novella │ 342 (282/60) │ 2087 │
│ │ exploring language and determinism, winning Nebula │ │ │
│ │ and Sturgeon Awards. It was adapted into the 2016 │ │ │
│ │ film "Arrival," which received multiple Academy Award │ │ │
│ │ nominations and won for Best Sound Editing. │ │ │
├─────────────────────────┼───────────────────────────────────────────────────────┼──────────────────────────────────┼───────────────┤
│ claude-3-haiku-20240307 │ "Story of Your Life" is a science fiction novella by │ 320 (274/46) │ 1944 │
│ │ Ted Chiang, exploring themes of language and │ │ │
│ │ determinism, winning multiple awards and inspiring a │ │ │
│ │ film adaptation, Arrival, which was critically │ │ │
│ │ acclaimed. │ │ │
╰─────────────────────────┴───────────────────────────────────────────────────────┴──────────────────────────────────┴───────────────╯
```

You can also filter the completions by tags and generate a report in markdown file using `--file` or `-f`. And run our prompt analyzer (auto-prompt) using `--analyze_prompt`.


### Feedback Tasks and Feedback
To start adding feedback, first you need to define a feedback task with [`log10 feedback-task create`](#log10-feedback-task-create). Then you can add feedback to a logged completions with [`log10 feedback create`](#log10-feedback-create). For more details, you can read more in [log10's user documentation](https://log10.io/docs/feedback/feedback#add-feedback).

Expand Down Expand Up @@ -132,10 +190,39 @@ Usage: log10 completions [OPTIONS] COMMAND [ARGS]...

Manage logs from completions i.e. logs from users

Options:
--help Show this message and exit.

Commands:
download Download completions to a jsonl file
get Get a completion by id
list List completions
benchmark_models Compare completions using different models and...
download Download completions to a jsonl file
get Get a completion by id
list List completions
```

#### log10 completions benchmark_models
```bash
log10 completions benchmark_models --help
Usage: log10 completions benchmark_models [OPTIONS]

Compare completions using different models and generate report

Options:
--ids TEXT Completion IDs. Separate multiple ids with commas.
--tags TEXT Filter completions by specific tags. Separate multiple
tags with commas.
--limit TEXT Specify the maximum number of completions to retrieve
filtered by tags.
--offset TEXT Set the starting point (offset) from where to begin
fetching completions filtered by tags.
--models TEXT Comma separated list of models to compare
--temperature FLOAT Temperature
--max_tokens INTEGER Max tokens
--top_p FLOAT Top p
--analyze_prompt Run prompt analyzer on the messages.
-f, --file TEXT Specify the filename for the report in markdown
format.
--help Show this message and exit.
```
#### log10 completions download
Expand Down
3 changes: 2 additions & 1 deletion log10/__main__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import click

from log10.completions.completions import download_completions, get_completion, list_completions
from log10.completions.completions import benchmark_models, download_completions, get_completion, list_completions
from log10.feedback.autofeedback import auto_feedback_icl
from log10.feedback.feedback import create_feedback, download_feedback, get_feedback, list_feedback
from log10.feedback.feedback_task import create_feedback_task, get_feedback_task, list_feedback_task
Expand Down Expand Up @@ -39,6 +39,7 @@ def feedback_task():
completions.add_command(list_completions, "list")
completions.add_command(get_completion, "get")
completions.add_command(download_completions, "download")
completions.add_command(benchmark_models, "benchmark_models")

cli.add_command(feedback)
feedback.add_command(create_feedback, "create")
Expand Down
Loading

0 comments on commit ac2e177

Please sign in to comment.