Docs · Website · Twitter · discord · Quickstart · Online Playground
This repo contains benchmarks for tscircuit system prompts used for automatically generating tscircuit code.
You can use bun run benchmark
to select and run a benchmark. A single prompt takes about 10s-15s to
run when run with sonnet
. We have a set of samples (see the tests/samples directory)
that the benchmarks run against. When you change a prompt, you must run the benchmark
for that prompt to update the benchmark snapshot. This is how we record degradation
or improvement in the response quality. Each sample is run 5 times and two tests
are run:
- Does the output from the prompt compile?
- Does the output produce the expected circuit?
The benchmark shows the percentage of samples that pass (1) and (2)