Skip to content

Actions: openai/evals

Actions

All workflows

Actions

Loading...
Loading

Showing runs from all workflows
511 workflow runs
511 workflow runs

Filter by Event

Filter by Status

Filter by Branch

Filter by Actor

Add skill acquisition eval
Run new evals #2232: Pull request #1497 opened by inwaves
March 19, 2024 08:25 2m 21s inwaves:andrei/updates-20240319
March 19, 2024 08:25 2m 21s
Add skill acquisition eval
Run unit tests #1664: Pull request #1497 opened by inwaves
March 19, 2024 08:25 2m 26s inwaves:andrei/updates-20240319
March 19, 2024 08:25 2m 26s
Error Recovery Eval
Run new evals #2231: Pull request #1485 synchronize by ojaffe
March 19, 2024 08:15 6m 35s ojaffe:ollie/error_recovery
March 19, 2024 08:15 6m 35s
Error Recovery Eval
Run unit tests #1663: Pull request #1485 synchronize by ojaffe
March 19, 2024 08:15 2m 19s ojaffe:ollie/error_recovery
March 19, 2024 08:15 2m 19s
Add Human-Relative MLAgentBench
Run new evals #2230: Pull request #1496 synchronize by danesherbs
March 19, 2024 07:31 3m 43s danesherbs:dane/add-mlab-v2
March 19, 2024 07:31 3m 43s
Add Human-Relative MLAgentBench
Run unit tests #1662: Pull request #1496 synchronize by danesherbs
March 19, 2024 07:31 9m 4s danesherbs:dane/add-mlab-v2
March 19, 2024 07:31 9m 4s
Add Human-Relative MLAgentBench
Run new evals #2229: Pull request #1496 synchronize by danesherbs
March 19, 2024 07:13 3m 55s danesherbs:dane/add-mlab-v2
March 19, 2024 07:13 3m 55s
Add Human-Relative MLAgentBench
Run unit tests #1661: Pull request #1496 synchronize by danesherbs
March 19, 2024 07:13 3m 43s danesherbs:dane/add-mlab-v2
March 19, 2024 07:13 3m 43s
Add Human-Relative MLAgentBench
Run new evals #2228: Pull request #1496 synchronize by danesherbs
March 19, 2024 06:32 6m 24s danesherbs:dane/add-mlab-v2
March 19, 2024 06:32 6m 24s
Add Human-Relative MLAgentBench
Run unit tests #1660: Pull request #1496 synchronize by danesherbs
March 19, 2024 06:32 3m 41s danesherbs:dane/add-mlab-v2
March 19, 2024 06:32 3m 41s
Add Human-Relative MLAgentBench
Run new evals #2227: Pull request #1496 synchronize by danesherbs
March 19, 2024 06:25 3m 29s danesherbs:dane/add-mlab-v2
March 19, 2024 06:25 3m 29s
Add Human-Relative MLAgentBench
Run unit tests #1659: Pull request #1496 synchronize by danesherbs
March 19, 2024 06:25 4m 44s danesherbs:dane/add-mlab-v2
March 19, 2024 06:25 4m 44s
Add Human-Relative MLAgentBench
Run unit tests #1658: Pull request #1496 synchronize by danesherbs
March 19, 2024 06:02 3m 30s danesherbs:dane/add-mlab-v2
March 19, 2024 06:02 3m 30s
Add Human-Relative MLAgentBench
Run new evals #2226: Pull request #1496 synchronize by danesherbs
March 19, 2024 06:02 3m 26s danesherbs:dane/add-mlab-v2
March 19, 2024 06:02 3m 26s
Add Human-Relative MLAgentBench
Run unit tests #1657: Pull request #1496 opened by danesherbs
March 19, 2024 05:57 2m 24s danesherbs:dane/add-mlab-v2
March 19, 2024 05:57 2m 24s
Add Human-Relative MLAgentBench
Run new evals #2225: Pull request #1496 opened by danesherbs
March 19, 2024 05:57 2m 6s danesherbs:dane/add-mlab-v2
March 19, 2024 05:57 2m 6s
Can't Do That Anymore Eval (#1487)
Run unit tests #1656: Commit f72afb9 pushed by JunShern
March 19, 2024 04:04 2m 38s main
March 19, 2024 04:04 2m 38s
Bugged Tools Eval (#1486)
Run unit tests #1655: Commit ad377e4 pushed by JunShern
March 19, 2024 04:00 2m 35s main
March 19, 2024 04:00 2m 35s
Add Function Deduction eval
Run unit tests #1652: Pull request #1492 opened by james-aung
March 15, 2024 18:25 2m 19s james-aung:function-deduction
March 15, 2024 18:25 2m 19s
Add Function Deduction eval
Run new evals #2223: Pull request #1492 opened by james-aung
March 15, 2024 18:25 2m 17s james-aung:function-deduction
March 15, 2024 18:25 2m 17s
Add In-Context RL eval
Run unit tests #1651: Pull request #1491 opened by james-aung
March 15, 2024 18:24 2m 5s james-aung:incontext-rl
March 15, 2024 18:24 2m 5s
Add In-Context RL eval
Run new evals #2222: Pull request #1491 opened by james-aung
March 15, 2024 18:24 2m 5s james-aung:incontext-rl
March 15, 2024 18:24 2m 5s
Already Said That Eval
Run unit tests #1650: Pull request #1490 synchronize by thesofakillers
March 15, 2024 14:22 2m 23s thesofakillers:ast
March 15, 2024 14:22 2m 23s
Already Said That Eval
Run new evals #2221: Pull request #1490 synchronize by thesofakillers
March 15, 2024 14:22 2m 29s thesofakillers:ast
March 15, 2024 14:22 2m 29s
Track the Stat Eval
Run unit tests #1649: Pull request #1489 opened by thesofakillers
March 15, 2024 14:06 2m 51s thesofakillers:tts
March 15, 2024 14:06 2m 51s