Skip to content

Actions: openai/evals

Actions

Run new evals

Actions

Loading...
Loading

Show workflow options

Create status badge

Loading
187 workflow runs
187 workflow runs

Filter by Event

Filter by Status

Filter by Branch

Filter by Actor

Remove global OpenAI client initialization
Run new evals #2276: Pull request #1539 opened by michaelAlvarino
July 21, 2024 17:04 2m 13s michaelAlvarino:main
July 21, 2024 17:04 2m 13s
Added Quran Eval & Simple Fact Model-Graded Definition
Run new evals #2271: Pull request #1511 synchronize by sakher
June 20, 2024 14:13 2m 22s sakher:quran-eval
June 20, 2024 14:13 2m 22s
Fix problematic sample in Schelling Point
Run new evals #2270: Pull request #1534 opened by JunShern
May 22, 2024 23:04 4m 38s jun/schellingpoint-fix
May 22, 2024 23:04 4m 38s
eval pattern-concat-logic
Run new evals #2258: Pull request #1508 synchronize by natanaelwf
May 9, 2024 13:18 2m 25s natanaelwf:pattern-concat-logic
May 9, 2024 13:18 2m 25s
eval pattern-concat-logic
Run new evals #2252: Pull request #1508 opened by natanaelwf
March 28, 2024 13:44 3m 34s natanaelwf:pattern-concat-logic
March 28, 2024 13:44 3m 34s
Updates on existing solvers and bugged tool eval
Run new evals #2251: Pull request #1506 synchronize by ojaffe
March 27, 2024 16:44 3m 37s ojaffe:ollie/updates_270324
March 27, 2024 16:44 3m 37s
Updates on existing solvers and bugged tool eval
Run new evals #2250: Pull request #1506 opened by ojaffe
March 27, 2024 16:37 3m 41s ojaffe:ollie/updates_270324
March 27, 2024 16:37 3m 41s
Add Gemini Solver
Run new evals #2249: Pull request #1503 synchronize by ojaffe
March 26, 2024 11:39 3m 55s ojaffe:ollie/add_gemini_solver
March 26, 2024 11:39 3m 55s
Add Gemini Solver
Run new evals #2248: Pull request #1503 synchronize by ojaffe
March 21, 2024 10:36 3m 36s ojaffe:ollie/add_gemini_solver
March 21, 2024 10:36 3m 36s
Add Gemini Solver
Run new evals #2247: Pull request #1503 opened by ojaffe
March 21, 2024 10:32 3m 46s ojaffe:ollie/add_gemini_solver
March 21, 2024 10:32 3m 46s
TogetherSolver
Run new evals #2246: Pull request #1502 opened by thesofakillers
March 21, 2024 10:25 7m 10s thesofakillers:together_solver
March 21, 2024 10:25 7m 10s
Add Human-Relative MLAgentBench
Run new evals #2245: Pull request #1496 synchronize by danesherbs
March 21, 2024 03:36 3m 45s danesherbs:dane/add-mlab-v2
March 21, 2024 03:36 3m 45s
Add Multi-Step Web Tasks
Run new evals #2244: Pull request #1500 synchronize by danesherbs
March 21, 2024 02:40 3m 50s danesherbs:dane/add-multi-step-web-tasks
March 21, 2024 02:40 3m 50s
Add Multi-Step Web Tasks
Run new evals #2243: Pull request #1500 synchronize by danesherbs
March 21, 2024 02:21 3m 44s danesherbs:dane/add-multi-step-web-tasks
March 21, 2024 02:21 3m 44s
Add In-Context RL eval
Run new evals #2242: Pull request #1491 synchronize by james-aung
March 19, 2024 14:27 2m 21s james-aung:incontext-rl
March 19, 2024 14:27 2m 21s
Add Function Deduction eval
Run new evals #2241: Pull request #1492 synchronize by james-aung
March 19, 2024 14:09 2m 7s james-aung:function-deduction
March 19, 2024 14:09 2m 7s
Add 20 questions eval
Run new evals #2240: Pull request #1499 opened by inwaves
March 19, 2024 11:13 2m 14s inwaves:andrei/add-20-questions
March 19, 2024 11:13 2m 14s
AnthropicSolver
Run new evals #2239: Pull request #1498 opened by thesofakillers
March 19, 2024 10:26 2m 9s thesofakillers:anthropic-solver
March 19, 2024 10:26 2m 9s
Identifying Variables Eval
Run new evals #2238: Pull request #1488 synchronize by thesofakillers
March 19, 2024 09:58 2m 23s thesofakillers:idvars
March 19, 2024 09:58 2m 23s
Track the Stat Eval
Run new evals #2237: Pull request #1489 synchronize by thesofakillers
March 19, 2024 09:38 2m 17s thesofakillers:tts
March 19, 2024 09:38 2m 17s