Skip to content

Commit

Permalink
chore(deps): bump crate-ci/typos from 1.25.0 to 1.26.0 (#978)
Browse files Browse the repository at this point in the history
Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.25.0 to 1.26.0.
- [Release notes](https://github.com/crate-ci/typos/releases)
- [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md)
- [Commits](crate-ci/typos@v1.25.0...v1.26.0)

---
updated-dependencies:
- dependency-name: crate-ci/typos
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
  • Loading branch information
dependabot[bot] authored Oct 14, 2024
1 parent 1b0d6f8 commit 1e783df
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion .github/workflows/QualityCheck.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,4 +16,4 @@ jobs:
- name: Checkout Actions Repository
uses: actions/checkout@v4
- name: Check spelling
uses: crate-ci/typos@v1.25.0
uses: crate-ci/typos@v1.26.0

1 comment on commit 1e783df

@github-actions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lux Benchmarks

Benchmark suite Current: 1e783df Previous: 1b0d6f8 Ratio
Dense(512 => 512, identity)(512 x 128)/forward/CPU/2 thread(s) 410479.5 ns 412250 ns 1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/4 thread(s) 322979 ns 244083 ns 1.32
Dense(512 => 512, identity)(512 x 128)/forward/CPU/8 thread(s) 243583 ns 322041 ns 0.76
Dense(512 => 512, identity)(512 x 128)/forward/CPU/1 thread(s) 740125 ns 739625 ns 1.00
Dense(512 => 512, identity)(512 x 128)/forward/GPU/CUDA 43310 ns 43576 ns 0.99
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/2 thread(s) 1312625 ns 1368688 ns 0.96
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/4 thread(s) 2418334 ns 1198625 ns 2.02
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/8 thread(s) 16373020.5 ns 13918417 ns 1.18
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/1 thread(s) 958000 ns 929312.5 ns 1.03
Dense(512 => 512, identity)(512 x 128)/zygote/GPU/CUDA 190740 ns 190464 ns 1.00
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/2 thread(s) 1378500 ns 1348750 ns 1.02
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/4 thread(s) 2610979.5 ns 1282083 ns 2.04
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/8 thread(s) 16066041 ns 13837312.5 ns 1.16
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/1 thread(s) 967958 ns 987250 ns 0.98
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s) 1773750 ns 1655917 ns 1.07
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s) 1093875 ns 1089000 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s) 1520104 ns 1532499.5 ns 0.99
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s) 2458417 ns 2439708 ns 1.01
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/GPU/CUDA 209499 ns 211500 ns 0.99
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s) 12121583 ns 12136437.5 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s) 8834833 ns 8847479 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s) 9223542 ns 9240938 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s) 17972771 ns 17956208 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/GPU/CUDA 1903079 ns 1905747 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s) 17300562 ns 17305250 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s) 13987625 ns 13985416 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s) 14513146 ns 14505584 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s) 21072834 ns 21107833 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s) 250439208 ns 249894083 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s) 148115625 ns 148856208 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s) 117228750 ns 115718875 ns 1.01
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s) 104041542 ns 101619125 ns 1.02
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/GPU/CUDA 5463821 ns 5485492 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s) 1224682250 ns 1228009625 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s) 933837625 ns 931338167 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s) 835803479 ns 829169479 ns 1.01
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s) 628560812 ns 628483479 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA 35032007 ns 38151835 ns 0.92
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s) 1141719792 ns 1134889125 ns 1.01
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s) 983678666.5 ns 992066062.5 ns 0.99
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s) 1377974646 ns 1309459854 ns 1.05
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s) 746244021 ns 745440771 ns 1.00
lenet(28, 28, 1, 32)/forward/CPU/2 thread(s) 1114917 ns 1092042 ns 1.02
lenet(28, 28, 1, 32)/forward/CPU/4 thread(s) 1628542 ns 1645709 ns 0.99
lenet(28, 28, 1, 32)/forward/CPU/8 thread(s) 4086771 ns 3466333 ns 1.18
lenet(28, 28, 1, 32)/forward/CPU/1 thread(s) 959792 ns 957250 ns 1.00
lenet(28, 28, 1, 32)/forward/GPU/CUDA 272035 ns 270549.5 ns 1.01
lenet(28, 28, 1, 32)/zygote/CPU/2 thread(s) 2981354.5 ns 2979042 ns 1.00
lenet(28, 28, 1, 32)/zygote/CPU/4 thread(s) 4115937.5 ns 4110542 ns 1.00
lenet(28, 28, 1, 32)/zygote/CPU/8 thread(s) 9608958 ns 10529229 ns 0.91
lenet(28, 28, 1, 32)/zygote/CPU/1 thread(s) 3297500.5 ns 3308833 ns 1.00
lenet(28, 28, 1, 32)/zygote/GPU/CUDA 1076584 ns 1070477 ns 1.01
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s) 2355125 ns 2350792 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s) 1453000 ns 1364187.5 ns 1.07
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s) 1602646 ns 1709000 ns 0.94
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s) 3770125 ns 3666666.5 ns 1.03
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/GPU/CUDA 215196 ns 210396 ns 1.02
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s) 20246500 ns 20275459 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s) 16965833.5 ns 16981437 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s) 18330417 ns 18162375 ns 1.01
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s) 26150209 ns 26198500 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA 1980657 ns 1979369 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s) 44324250 ns 46206895.5 ns 0.96
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s) 41015042 ns 41017187.5 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s) 41295750 ns 41176208.5 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s) 47634416 ns 47588917 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s) 4656667 ns 4669000 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s) 2867250 ns 2603916 ns 1.10
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s) 2754917 ns 2999833 ns 0.92
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s) 7179750 ns 7252188 ns 0.99
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/GPU/CUDA 515735.5 ns 517525.5 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s) 40447166.5 ns 40878729.5 ns 0.99
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s) 33885499.5 ns 33994250 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s) 34257187.5 ns 33958333 ns 1.01
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s) 51082812.5 ns 51263292 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA 3174195 ns 3013320.5 ns 1.05
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s) 109744583 ns 113392541.5 ns 0.97
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s) 135227938 ns 136850541 ns 0.99
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s) 270381750 ns 250011854.5 ns 1.08
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s) 95391167 ns 95314208 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s) 270563333 ns 270234083 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s) 161054417 ns 157676542 ns 1.02
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s) 125340042 ns 128100708 ns 0.98
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s) 146582812.5 ns 144520145.5 ns 1.01
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/GPU/CUDA 7052057 ns 7091283 ns 0.99
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s) 1502349770.5 ns 1503173291.5 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s) 1201703584 ns 1201978125 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s) 1090436625 ns 1103595666.5 ns 0.99
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s) 1030635583 ns 1028790125.5 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA 33863530 ns 33654931 ns 1.01
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s) 2004525437 ns 2089411062.5 ns 0.96
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s) 1793970792 ns 1851532083 ns 0.97
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s) 2094682166.5 ns 2117297604.5 ns 0.99
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s) 1594796917 ns 1605439208 ns 0.99
lenet(28, 28, 1, 128)/forward/CPU/2 thread(s) 1816417 ns 2066438 ns 0.88
lenet(28, 28, 1, 128)/forward/CPU/4 thread(s) 2535417 ns 3005354 ns 0.84
lenet(28, 28, 1, 128)/forward/CPU/8 thread(s) 9580729.5 ns 7102958.5 ns 1.35
lenet(28, 28, 1, 128)/forward/CPU/1 thread(s) 2124083 ns 2151875 ns 0.99
lenet(28, 28, 1, 128)/forward/GPU/CUDA 265598 ns 270072.5 ns 0.98
lenet(28, 28, 1, 128)/zygote/CPU/2 thread(s) 9396125 ns 9657334 ns 0.97
lenet(28, 28, 1, 128)/zygote/CPU/4 thread(s) 11490250 ns 11945459 ns 0.96
lenet(28, 28, 1, 128)/zygote/CPU/8 thread(s) 25636708 ns 23020875 ns 1.11
lenet(28, 28, 1, 128)/zygote/CPU/1 thread(s) 10456812.5 ns 10467750 ns 1.00
lenet(28, 28, 1, 128)/zygote/GPU/CUDA 1095109 ns 1095059 ns 1.00
vgg16(32, 32, 3, 32)/forward/CPU/2 thread(s) 381007729.5 ns 381251625 ns 1.00
vgg16(32, 32, 3, 32)/forward/CPU/4 thread(s) 283558854 ns 309062375 ns 0.92
vgg16(32, 32, 3, 32)/forward/CPU/8 thread(s) 264714708 ns 241236375 ns 1.10
vgg16(32, 32, 3, 32)/forward/CPU/1 thread(s) 179954521 ns 180294333.5 ns 1.00
vgg16(32, 32, 3, 32)/forward/GPU/CUDA 4874412 ns 4847355 ns 1.01
vgg16(32, 32, 3, 32)/zygote/CPU/2 thread(s) 1154043958 ns 1146004375 ns 1.01
vgg16(32, 32, 3, 32)/zygote/CPU/4 thread(s) 991918083 ns 966522375 ns 1.03
vgg16(32, 32, 3, 32)/zygote/CPU/8 thread(s) 1078324541 ns 1026283833 ns 1.05
vgg16(32, 32, 3, 32)/zygote/CPU/1 thread(s) 668069084 ns 662156542 ns 1.01
vgg16(32, 32, 3, 32)/zygote/GPU/CUDA 16315510 ns 17798543 ns 0.92
lenet(28, 28, 1, 64)/forward/CPU/2 thread(s) 1054520.5 ns 1050458 ns 1.00
lenet(28, 28, 1, 64)/forward/CPU/4 thread(s) 1957562.5 ns 1656750 ns 1.18
lenet(28, 28, 1, 64)/forward/CPU/8 thread(s) 6624334 ns 6491250 ns 1.02
lenet(28, 28, 1, 64)/forward/CPU/1 thread(s) 1352146 ns 1312792 ns 1.03
lenet(28, 28, 1, 64)/forward/GPU/CUDA 267010 ns 270319.5 ns 0.99
lenet(28, 28, 1, 64)/zygote/CPU/2 thread(s) 6499937.5 ns 6504813 ns 1.00
lenet(28, 28, 1, 64)/zygote/CPU/4 thread(s) 13781958 ns 13132417 ns 1.05
lenet(28, 28, 1, 64)/zygote/CPU/8 thread(s) 20923250 ns 19754250 ns 1.06
lenet(28, 28, 1, 64)/zygote/CPU/1 thread(s) 5707062.5 ns 5741521 ns 0.99
lenet(28, 28, 1, 64)/zygote/GPU/CUDA 1115597.5 ns 1124270 ns 0.99
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s) 70442792 ns 70469479 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s) 43467103.5 ns 43706291.5 ns 0.99
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s) 39734999.5 ns 39518625 ns 1.01
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s) 35200125 ns 35367542 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/GPU/CUDA 1845136 ns 1851430 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s) 356138708 ns 356004604 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s) 270050583 ns 270290792 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s) 254207104 ns 254164750 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s) 271696541.5 ns 271950333.5 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/GPU/CUDA 16499812 ns 16539357 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s) 395249958 ns 395899500 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s) 396501625 ns 372060292 ns 1.07
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s) 738492916.5 ns 713782625 ns 1.03
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s) 447067000 ns 447779125 ns 1.00
vgg16(32, 32, 3, 128)/forward/CPU/2 thread(s) 1189294541 ns 1190490459 ns 1.00
vgg16(32, 32, 3, 128)/forward/CPU/4 thread(s) 689030520.5 ns 832670062.5 ns 0.83
vgg16(32, 32, 3, 128)/forward/CPU/8 thread(s) 650962625 ns 629944291 ns 1.03
vgg16(32, 32, 3, 128)/forward/CPU/1 thread(s) 681961562 ns 681507396 ns 1.00
vgg16(32, 32, 3, 128)/forward/GPU/CUDA 12470086 ns 12475051 ns 1.00
vgg16(32, 32, 3, 128)/zygote/CPU/2 thread(s) 3681028375 ns 3708044854 ns 0.99
vgg16(32, 32, 3, 128)/zygote/CPU/4 thread(s) 2822971000 ns 2828581542 ns 1.00
vgg16(32, 32, 3, 128)/zygote/CPU/8 thread(s) 2698825750 ns 2698925958 ns 1.00
vgg16(32, 32, 3, 128)/zygote/CPU/1 thread(s) 2121646854.5 ns 2137669604.5 ns 0.99
vgg16(32, 32, 3, 128)/zygote/GPU/CUDA 49909051 ns 49415932 ns 1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s) 3408458 ns 3423125 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s) 2063208 ns 2078500 ns 0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s) 2518458 ns 2518458 ns 1
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s) 4888750 ns 4870375 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/GPU/CUDA 580004.5 ns 586699.5 ns 0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s) 25958666 ns 25989500 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s) 18964292 ns 19069958.5 ns 0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s) 19447166.5 ns 19259312 ns 1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s) 36745416.5 ns 36800833 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA 3191777 ns 2993892 ns 1.07
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s) 55195125 ns 54216125 ns 1.02
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s) 81683979.5 ns 83642959 ns 0.98
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s) 174851250 ns 174413208.5 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s) 42883916.5 ns 42857708.5 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s) 1788312.5 ns 1784458 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s) 1100250 ns 1095646 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s) 1558396 ns 1575292 ns 0.99
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s) 2464688 ns 2364687 ns 1.04
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/GPU/CUDA 215197 ns 216504.5 ns 0.99
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s) 12518625 ns 12531833 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s) 9205333 ns 9200375 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s) 9628104 ns 9626292 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s) 18331625 ns 18391667 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA 1949026.5 ns 1950268 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s) 17616875 ns 17650333.5 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s) 14310166 ns 14301166 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s) 14557291.5 ns 14560250.5 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s) 21449812.5 ns 21506145.5 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s) 70367541.5 ns 70470354 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s) 43412916.5 ns 43665542 ns 0.99
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s) 39742938 ns 39582249.5 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s) 35448542 ns 35175625 ns 1.01
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/GPU/CUDA 1795063 ns 1838843 ns 0.98
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s) 360004208 ns 360077895.5 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s) 346542937 ns 349062958.5 ns 0.99
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s) 307664333.5 ns 305213917 ns 1.01
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s) 463480458 ns 462206583 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA 13962488.5 ns 13925027 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s) 418770999.5 ns 417720542 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s) 421592709 ns 426193583 ns 0.99
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s) 780166249.5 ns 717833375.5 ns 1.09
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s) 393782854 ns 394045333.5 ns 1.00
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/2 thread(s) 1880375 ns 1908458 ns 0.99
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/4 thread(s) 1570562.5 ns 1382145.5 ns 1.14
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/8 thread(s) 1246416.5 ns 1574208 ns 0.79
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/1 thread(s) 2596208.5 ns 2658583 ns 0.98
mlp7layer_bn(gelu)(32 x 256)/forward/GPU/CUDA 564741 ns 567560 ns 1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/2 thread(s) 9321042 ns 9263291 ns 1.01
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/4 thread(s) 13025292 ns 15741709 ns 0.83
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/8 thread(s) 33090166 ns 30677874.5 ns 1.08
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/1 thread(s) 6518396.5 ns 6782125 ns 0.96
mlp7layer_bn(gelu)(32 x 256)/zygote/GPU/CUDA 1351683.5 ns 1355856 ns 1.00
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/2 thread(s) 22256291 ns 23068125 ns 0.96
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/4 thread(s) 27788229 ns 28298875 ns 0.98
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/8 thread(s) 54815104 ns 49366125 ns 1.11
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/1 thread(s) 15723000 ns 15664541 ns 1.00
Dense(512 => 512, relu)(512 x 128)/forward/CPU/2 thread(s) 660437.5 ns 787000 ns 0.84
Dense(512 => 512, relu)(512 x 128)/forward/CPU/4 thread(s) 564125.5 ns 613416 ns 0.92
Dense(512 => 512, relu)(512 x 128)/forward/CPU/8 thread(s) 1067959 ns 1014937.5 ns 1.05
Dense(512 => 512, relu)(512 x 128)/forward/CPU/1 thread(s) 68833 ns 67541.5 ns 1.02
Dense(512 => 512, relu)(512 x 128)/forward/GPU/CUDA 48015 ns 47213.5 ns 1.02
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/2 thread(s) 1518999.5 ns 1547187.5 ns 0.98
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/4 thread(s) 1050917 ns 1017917 ns 1.03
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/8 thread(s) 1571000 ns 1412645.5 ns 1.11
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/1 thread(s) 325084 ns 321542 ns 1.01
Dense(512 => 512, relu)(512 x 128)/zygote/GPU/CUDA 216110 ns 211309 ns 1.02
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/2 thread(s) 1555895.5 ns 1571042 ns 0.99
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/4 thread(s) 1060292 ns 1020042 ns 1.04
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/8 thread(s) 1624541 ns 1402125.5 ns 1.16
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/1 thread(s) 374750 ns 343812 ns 1.09
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s) 3421708 ns 3408000.5 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s) 2057375 ns 2049583.5 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s) 2472729 ns 2491583.5 ns 0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s) 4540646 ns 4842271 ns 0.94
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/GPU/CUDA 585099 ns 580126 ns 1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s) 24053333 ns 24112333.5 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s) 17186833 ns 17188792 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s) 17114833.5 ns 17119042 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s) 35115834 ns 34987687 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/GPU/CUDA 3096781.5 ns 2894570.5 ns 1.07
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s) 53599104 ns 52602166 ns 1.02
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s) 80093333 ns 83256812 ns 0.96
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s) 172009854 ns 173355916.5 ns 0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s) 42254666 ns 42228833 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s) 249876333.5 ns 250172041.5 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s) 148299229 ns 148659167 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s) 116785208 ns 115831270.5 ns 1.01
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s) 106758125 ns 106484375 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/GPU/CUDA 5452339 ns 5471067 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s) 1100542291 ns 1103002500 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s) 855735416.5 ns 857541375 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s) 831274375 ns 826884708.5 ns 1.01
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s) 738168166.5 ns 740474770.5 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/GPU/CUDA 32317772.5 ns 35136266 ns 0.92
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s) 1001895729 ns 1006767188 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s) 966598875 ns 974529458 ns 0.99
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s) 1307543687 ns 1286053500 ns 1.02
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s) 738405458 ns 727101250 ns 1.02
mlp7layer_bn(relu)(32 x 256)/forward/CPU/2 thread(s) 1230583 ns 1308583 ns 0.94
mlp7layer_bn(relu)(32 x 256)/forward/CPU/4 thread(s) 962250 ns 664854.5 ns 1.45
mlp7layer_bn(relu)(32 x 256)/forward/CPU/8 thread(s) 796604 ns 906375 ns 0.88
mlp7layer_bn(relu)(32 x 256)/forward/CPU/1 thread(s) 2036541 ns 2049458 ns 0.99
mlp7layer_bn(relu)(32 x 256)/forward/GPU/CUDA 567146.5 ns 565223.5 ns 1.00
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/2 thread(s) 5691500 ns 5804687.5 ns 0.98
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/4 thread(s) 6401396 ns 8913625 ns 0.72
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/8 thread(s) 25408000 ns 24320125 ns 1.04
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/1 thread(s) 3697229 ns 3694792 ns 1.00
mlp7layer_bn(relu)(32 x 256)/zygote/GPU/CUDA 1332396 ns 1307349 ns 1.02
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/2 thread(s) 9370333 ns 9459208 ns 0.99
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/4 thread(s) 13058291 ns 15996021 ns 0.82
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/8 thread(s) 32481708 ns 31660167 ns 1.03
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/1 thread(s) 4424396 ns 4429208.5 ns 1.00
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/2 thread(s) 390896 ns 433416.5 ns 0.90
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/4 thread(s) 458604 ns 466208 ns 0.98
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/8 thread(s) 2946292 ns 1932812 ns 1.52
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/1 thread(s) 54375 ns 54000 ns 1.01
Dense(128 => 128, gelu)(128 x 128)/forward/GPU/CUDA 28214 ns 27617 ns 1.02
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/2 thread(s) 360312.5 ns 370958.5 ns 0.97
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/4 thread(s) 439417 ns 459083 ns 0.96
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/8 thread(s) 5063292 ns 4366749.5 ns 1.16
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/1 thread(s) 190708 ns 193875 ns 0.98
Dense(128 => 128, gelu)(128 x 128)/zygote/GPU/CUDA 219423.5 ns 216603.5 ns 1.01
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/2 thread(s) 632709 ns 684292 ns 0.92
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/4 thread(s) 711770.5 ns 731125 ns 0.97
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/8 thread(s) 5249812.5 ns 4502166 ns 1.17
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/1 thread(s) 429750 ns 435458 ns 0.99
Dense(128 => 128, relu)(128 x 128)/forward/CPU/2 thread(s) 335333.5 ns 377416 ns 0.89
Dense(128 => 128, relu)(128 x 128)/forward/CPU/4 thread(s) 393604 ns 405042 ns 0.97
Dense(128 => 128, relu)(128 x 128)/forward/CPU/8 thread(s) 765792 ns 718500 ns 1.07
Dense(128 => 128, relu)(128 x 128)/forward/CPU/1 thread(s) 13458 ns 12834 ns 1.05
Dense(128 => 128, relu)(128 x 128)/forward/GPU/CUDA 28223 ns 27924.5 ns 1.01
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/2 thread(s) 286125 ns 303979.5 ns 0.94
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/4 thread(s) 310708 ns 340916.5 ns 0.91
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/8 thread(s) 733437.5 ns 858875 ns 0.85
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/1 thread(s) 25916 ns 26333 ns 0.98
Dense(128 => 128, relu)(128 x 128)/zygote/GPU/CUDA 209427 ns 206665 ns 1.01
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/2 thread(s) 302000 ns 320916.5 ns 0.94
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/4 thread(s) 328375 ns 355500 ns 0.92
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/8 thread(s) 842791.5 ns 900792 ns 0.94
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/1 thread(s) 28333 ns 28875 ns 0.98
vgg16(32, 32, 3, 64)/forward/CPU/2 thread(s) 602432125 ns 603792041 ns 1.00
vgg16(32, 32, 3, 64)/forward/CPU/4 thread(s) 430731937.5 ns 430597750 ns 1.00
vgg16(32, 32, 3, 64)/forward/CPU/8 thread(s) 392016750 ns 375897687.5 ns 1.04
vgg16(32, 32, 3, 64)/forward/CPU/1 thread(s) 322757833 ns 321301750 ns 1.00
vgg16(32, 32, 3, 64)/forward/GPU/CUDA 7676293 ns 7676185 ns 1.00
vgg16(32, 32, 3, 64)/zygote/CPU/2 thread(s) 2003927916.5 ns 2002056937.5 ns 1.00
vgg16(32, 32, 3, 64)/zygote/CPU/4 thread(s) 1623931938 ns 1637403750 ns 0.99
vgg16(32, 32, 3, 64)/zygote/CPU/8 thread(s) 1626427584 ns 1658326812.5 ns 0.98
vgg16(32, 32, 3, 64)/zygote/CPU/1 thread(s) 1179210042 ns 1181133416 ns 1.00
vgg16(32, 32, 3, 64)/zygote/GPU/CUDA 27131071 ns 27018077.5 ns 1.00
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/2 thread(s) 523645.5 ns 527292 ns 0.99
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/4 thread(s) 450709 ns 402500 ns 1.12
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/8 thread(s) 2446250 ns 1773874.5 ns 1.38
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/1 thread(s) 219187.5 ns 217896 ns 1.01
Dense(512 => 512, gelu)(512 x 128)/forward/GPU/CUDA 47774.5 ns 47539 ns 1.00
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/2 thread(s) 1875042 ns 1972750 ns 0.95
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/4 thread(s) 2602792 ns 1830041 ns 1.42
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/8 thread(s) 16587416.5 ns 14502542 ns 1.14
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/1 thread(s) 1501583 ns 1511084 ns 0.99
Dense(512 => 512, gelu)(512 x 128)/zygote/GPU/CUDA 226318.5 ns 222835 ns 1.02
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/2 thread(s) 2982667 ns 3104000 ns 0.96
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/4 thread(s) 5736062.5 ns 5000208 ns 1.15
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/8 thread(s) 17019146 ns 15174146 ns 1.12
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/1 thread(s) 2470812.5 ns 2515479.5 ns 0.98
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/2 thread(s) 1498583 ns 1599584 ns 0.94
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/4 thread(s) 1193771 ns 933250 ns 1.28
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/8 thread(s) 1029042 ns 1233959 ns 0.83
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/1 thread(s) 2235875 ns 2349500 ns 0.95
mlp7layer_bn(tanh)(32 x 256)/forward/GPU/CUDA 572216 ns 564727.5 ns 1.01
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/2 thread(s) 5950125 ns 5989584 ns 0.99
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/4 thread(s) 4653916 ns 8876479.5 ns 0.52
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/8 thread(s) 27167500 ns 25076041 ns 1.08
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/1 thread(s) 3927896 ns 3931104 ns 1.00
mlp7layer_bn(tanh)(32 x 256)/zygote/GPU/CUDA 1342658.5 ns 1312718 ns 1.02
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/2 thread(s) 11627667 ns 11659958.5 ns 1.00
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/4 thread(s) 14277520.5 ns 18499562.5 ns 0.77
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/8 thread(s) 36899542 ns 34871271.5 ns 1.06
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/1 thread(s) 6331458.5 ns 6354542 ns 1.00
Dense(16 => 16, relu)(16 x 128)/forward/CPU/2 thread(s) 2333 ns 4666.5 ns 0.50
Dense(16 => 16, relu)(16 x 128)/forward/CPU/4 thread(s) 2166 ns 2625 ns 0.83
Dense(16 => 16, relu)(16 x 128)/forward/CPU/8 thread(s) 3333 ns 4333 ns 0.77
Dense(16 => 16, relu)(16 x 128)/forward/CPU/1 thread(s) 2646 ns 2292 ns 1.15
Dense(16 => 16, relu)(16 x 128)/forward/GPU/CUDA 25097 ns 24932 ns 1.01
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/2 thread(s) 7333 ns 7209 ns 1.02
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/4 thread(s) 7125 ns 9792 ns 0.73
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/8 thread(s) 7375 ns 7375 ns 1
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/1 thread(s) 7250 ns 7208 ns 1.01
Dense(16 => 16, relu)(16 x 128)/zygote/GPU/CUDA 189428.5 ns 190569.5 ns 0.99
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/2 thread(s) 8167 ns 8167 ns 1
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/4 thread(s) 8250 ns 8416 ns 0.98
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/8 thread(s) 8542 ns 8375 ns 1.02
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/1 thread(s) 6083 ns 5917 ns 1.03
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/2 thread(s) 10667 ns 10437.5 ns 1.02
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/4 thread(s) 14041.5 ns 13583 ns 1.03
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/8 thread(s) 11125 ns 11104.5 ns 1.00
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/1 thread(s) 7333 ns 7250 ns 1.01
Dense(16 => 16, gelu)(16 x 128)/forward/GPU/CUDA 25251 ns 24757 ns 1.02
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/2 thread(s) 21917 ns 21708 ns 1.01
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/4 thread(s) 21708.5 ns 21625 ns 1.00
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/8 thread(s) 21750 ns 21750 ns 1
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/1 thread(s) 21916 ns 21709 ns 1.01
Dense(16 => 16, gelu)(16 x 128)/zygote/GPU/CUDA 198645 ns 195121 ns 1.02
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/2 thread(s) 53625 ns 57500 ns 0.93
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/4 thread(s) 53500 ns 53500 ns 1
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/8 thread(s) 53625 ns 53583 ns 1.00
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/1 thread(s) 54583 ns 55083 ns 0.99
Dense(128 => 128, identity)(128 x 128)/forward/CPU/2 thread(s) 28395.5 ns 28583 ns 0.99
Dense(128 => 128, identity)(128 x 128)/forward/CPU/4 thread(s) 28667 ns 28667 ns 1
Dense(128 => 128, identity)(128 x 128)/forward/CPU/8 thread(s) 28417 ns 29000 ns 0.98
Dense(128 => 128, identity)(128 x 128)/forward/CPU/1 thread(s) 46084 ns 46334 ns 0.99
Dense(128 => 128, identity)(128 x 128)/forward/GPU/CUDA 26326 ns 25674 ns 1.03
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/2 thread(s) 224125 ns 227125 ns 0.99
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/4 thread(s) 272959 ns 276125 ns 0.99
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/8 thread(s) 4409500 ns 4228416.5 ns 1.04
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/1 thread(s) 65708 ns 63084 ns 1.04
Dense(128 => 128, identity)(128 x 128)/zygote/GPU/CUDA 170084 ns 166940.5 ns 1.02
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/2 thread(s) 240562 ns 246687 ns 0.98
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/4 thread(s) 290792 ns 293708 ns 0.99
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/8 thread(s) 4409209 ns 4174375 ns 1.06
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/1 thread(s) 71541 ns 68833 ns 1.04
Dense(16 => 16, identity)(16 x 128)/forward/CPU/2 thread(s) 1708.5 ns 1979.5 ns 0.86
Dense(16 => 16, identity)(16 x 128)/forward/CPU/4 thread(s) 1792 ns 2042 ns 0.88
Dense(16 => 16, identity)(16 x 128)/forward/CPU/8 thread(s) 2541.5 ns 2583.5 ns 0.98
Dense(16 => 16, identity)(16 x 128)/forward/CPU/1 thread(s) 1917 ns 2000 ns 0.96
Dense(16 => 16, identity)(16 x 128)/forward/GPU/CUDA 23384 ns 22856 ns 1.02
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/2 thread(s) 5292 ns 5416 ns 0.98
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/4 thread(s) 5291 ns 5291 ns 1
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/8 thread(s) 5459 ns 5375 ns 1.02
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/1 thread(s) 5208.5 ns 5291 ns 0.98
Dense(16 => 16, identity)(16 x 128)/zygote/GPU/CUDA 173533 ns 171204 ns 1.01
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/2 thread(s) 7417 ns 7500 ns 0.99
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/4 thread(s) 7500 ns 7542 ns 0.99
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/8 thread(s) 7708 ns 7750 ns 0.99
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/1 thread(s) 5625 ns 5708 ns 0.99
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s) 81107833 ns 80930834 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s) 49783792 ns 48596833 ns 1.02
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s) 43745208 ns 45693208 ns 0.96
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s) 56305270.5 ns 56260583.5 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/GPU/CUDA 2634961 ns 2631409 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s) 620785875 ns 622112500 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s) 429264250 ns 426582750 ns 1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s) 416731125 ns 411799708 ns 1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s) 507694646.5 ns 506749771 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA 15139001 ns 15162045 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s) 871599625 ns 882246666 ns 0.99
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s) 839558208.5 ns 844291292 ns 0.99
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s) 1206593209 ns 1135779771 ns 1.06
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s) 921408813 ns 925012854.5 ns 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Please sign in to comment.