Prompt Benchmarks

Docs · Website · Twitter · discord · Quickstart · Online Playground

This repo contains benchmarks for tscircuit system prompts used for automatically generating tscircuit code.

Running Benchmarks

You can use bun run benchmark to select and run a benchmark. A single prompt takes about 10s-15s to run when run with sonnet. We have a set of samples (see the tests/samples directory) that the benchmarks run against. When you change a prompt, you must run the benchmark for that prompt to update the benchmark snapshot. This is how we record degradation or improvement in the response quality. Each sample is run 5 times and two tests are run:

Does the output from the prompt compile?
Does the output produce the expected circuit?

The benchmark shows the percentage of samples that pass (1) and (2)

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
.github/workflows		.github/workflows
lib		lib
prompt-templates		prompt-templates
scripts		scripts
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
biome.json		biome.json
bun.lockb		bun.lockb
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Prompt Benchmarks

Running Benchmarks

About

Releases

Packages

Contributors 2

Languages

License

tscircuit/prompt-benchmarks

Folders and files

Latest commit

History

Repository files navigation

Prompt Benchmarks

Running Benchmarks

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages