Skip to content

Actions: openai/evals

Actions

All workflows

Actions

Loading...
Loading

Showing runs from all workflows
511 workflow runs
511 workflow runs

Filter by Event

Filter by Status

Filter by Branch

Filter by Actor

Add Gemini Solver
Run new evals #2249: Pull request #1503 synchronize by ojaffe
March 26, 2024 11:39 3m 55s ojaffe:ollie/add_gemini_solver
March 26, 2024 11:39 3m 55s
Unified create_retrying for all solvers
Run unit tests #1697: Pull request #1501 synchronize by ojaffe
March 26, 2024 11:35 10m 15s ojaffe:ollie/unify_retrying
March 26, 2024 11:35 10m 15s
Add info about logging and link to logviz (#1480)
Run unit tests #1696: Commit ac44aae pushed by etr2460
March 25, 2024 15:53 9m 36s main
March 25, 2024 15:53 9m 36s
Log model and usage stats in record.sampling (#1449)
Run unit tests #1695: Commit 9b2e1b1 pushed by etr2460
March 25, 2024 15:52 9m 58s main
March 25, 2024 15:52 9m 58s
Address sporadic hanging of evals on certain samples (#1482)
Run unit tests #1694: Commit bfe3925 pushed by etr2460
March 25, 2024 15:51 9m 5s main
March 25, 2024 15:51 9m 5s
TogetherSolver (#1502)
Run unit tests #1693: Commit 5805c20 pushed by JunShern
March 22, 2024 09:50 8m 40s main
March 22, 2024 09:50 8m 40s
Add Gemini Solver
Run unit tests #1692: Pull request #1503 synchronize by ojaffe
March 21, 2024 10:36 14m 41s ojaffe:ollie/add_gemini_solver
March 21, 2024 10:36 14m 41s
Add Gemini Solver
Run new evals #2248: Pull request #1503 synchronize by ojaffe
March 21, 2024 10:36 3m 36s ojaffe:ollie/add_gemini_solver
March 21, 2024 10:36 3m 36s
Add Gemini Solver
Run new evals #2247: Pull request #1503 opened by ojaffe
March 21, 2024 10:32 3m 46s ojaffe:ollie/add_gemini_solver
March 21, 2024 10:32 3m 46s
Add Gemini Solver
Run unit tests #1691: Pull request #1503 opened by ojaffe
March 21, 2024 10:32 3m 55s ojaffe:ollie/add_gemini_solver
March 21, 2024 10:32 3m 55s
TogetherSolver
Run unit tests #1690: Pull request #1502 opened by thesofakillers
March 21, 2024 10:25 3m 42s thesofakillers:together_solver
March 21, 2024 10:25 3m 42s
TogetherSolver
Run new evals #2246: Pull request #1502 opened by thesofakillers
March 21, 2024 10:25 7m 10s thesofakillers:together_solver
March 21, 2024 10:25 7m 10s
Unified create_retrying for all solvers
Run unit tests #1689: Pull request #1501 opened by ojaffe
March 21, 2024 08:49 3m 49s ojaffe:ollie/unify_retrying
March 21, 2024 08:49 3m 49s
AnthropicSolver (#1498)
Run unit tests #1688: Commit e30e141 pushed by JunShern
March 21, 2024 04:15 3m 48s main
March 21, 2024 04:15 3m 48s
Add Human-Relative MLAgentBench (#1496)
Run unit tests #1687: Commit 4f97ce6 pushed by JunShern
March 21, 2024 03:47 4m 56s main
March 21, 2024 03:47 4m 56s
Add Human-Relative MLAgentBench
Run unit tests #1686: Pull request #1496 synchronize by danesherbs
March 21, 2024 03:36 3m 37s danesherbs:dane/add-mlab-v2
March 21, 2024 03:36 3m 37s
Add Human-Relative MLAgentBench
Run new evals #2245: Pull request #1496 synchronize by danesherbs
March 21, 2024 03:36 3m 45s danesherbs:dane/add-mlab-v2
March 21, 2024 03:36 3m 45s
Add Multi-Step Web Tasks (#1500)
Run unit tests #1685: Commit 5b84993 pushed by JunShern
March 21, 2024 03:35 2m 27s main
March 21, 2024 03:35 2m 27s
Add Multi-Step Web Tasks
Run unit tests #1684: Pull request #1500 synchronize by danesherbs
March 21, 2024 02:40 2m 21s danesherbs:dane/add-multi-step-web-tasks
March 21, 2024 02:40 2m 21s
Add Multi-Step Web Tasks
Run new evals #2244: Pull request #1500 synchronize by danesherbs
March 21, 2024 02:40 3m 50s danesherbs:dane/add-multi-step-web-tasks
March 21, 2024 02:40 3m 50s
Add Multi-Step Web Tasks
Run new evals #2243: Pull request #1500 synchronize by danesherbs
March 21, 2024 02:21 3m 44s danesherbs:dane/add-multi-step-web-tasks
March 21, 2024 02:21 3m 44s
Add Multi-Step Web Tasks
Run unit tests #1683: Pull request #1500 synchronize by danesherbs
March 21, 2024 02:21 2m 18s danesherbs:dane/add-multi-step-web-tasks
March 21, 2024 02:21 2m 18s
Add In-Context RL eval (#1491)
Run unit tests #1682: Commit ff994b5 pushed by JunShern
March 19, 2024 14:59 5m 56s main
March 19, 2024 14:59 5m 56s
Add In-Context RL eval
Run new evals #2242: Pull request #1491 synchronize by james-aung
March 19, 2024 14:27 2m 21s james-aung:incontext-rl
March 19, 2024 14:27 2m 21s
Add In-Context RL eval
Run unit tests #1681: Pull request #1491 synchronize by james-aung
March 19, 2024 14:27 2m 10s james-aung:incontext-rl
March 19, 2024 14:27 2m 10s