Skip to content

Commit

Permalink
ci(buildkite): machines are back up
Browse files Browse the repository at this point in the history
  • Loading branch information
avik-pal committed Sep 21, 2024
1 parent 9967e02 commit e23b1a7
Showing 1 changed file with 3 additions and 6 deletions.
9 changes: 3 additions & 6 deletions .buildkite/documentation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,12 +33,9 @@ steps:
env:
TUTORIAL_BACKEND_GROUP: "CPU"
agents:
# FIXME: enable these once the servers are back up
# queue: "juliaecosystem"
# os: "linux"
# arch: "x86_64"
queue: "juliagpu"
cuda: "*"
queue: "juliaecosystem"
os: "linux"
arch: "x86_64"
artifact_paths:
- "docs/src/tutorials/beginner/**/*"
- "docs/src/tutorials/intermediate/**/*"
Expand Down

1 comment on commit e23b1a7

@github-actions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lux Benchmarks

Benchmark suite Current: e23b1a7 Previous: 4f29928 Ratio
Dense(512 => 512, identity)(512 x 128)/forward/CPU/2 thread(s) 414542 ns 414750 ns 1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/4 thread(s) 243375 ns 243729.5 ns 1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/8 thread(s) 244500 ns 243645.5 ns 1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/1 thread(s) 740166 ns 739937.5 ns 1.00
Dense(512 => 512, identity)(512 x 128)/forward/GPU/CUDA 44280.5 ns 44131.5 ns 1.00
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/2 thread(s) 1298541.5 ns 1277770.5 ns 1.02
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/4 thread(s) 1240562 ns 1251833 ns 0.99
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/8 thread(s) 16503791 ns 16532875 ns 1.00
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/1 thread(s) 2208500 ns 2259416 ns 0.98
Dense(512 => 512, identity)(512 x 128)/zygote/GPU/CUDA 208333 ns 211816 ns 0.98
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/2 thread(s) 1353521 ns 1353417 ns 1.00
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/4 thread(s) 1293417 ns 1287562.5 ns 1.00
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/8 thread(s) 16423250 ns 16470396.5 ns 1.00
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/1 thread(s) 2228104 ns 2246000 ns 0.99
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s) 1657875 ns 1755729 ns 0.94
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s) 1092375 ns 1021000.5 ns 1.07
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s) 1539021 ns 1537666 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s) 3020458.5 ns 2999834 ns 1.01
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/GPU/CUDA 210061.5 ns 209878 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s) 12139208 ns 12143375 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s) 8813375 ns 8839500 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s) 9256875 ns 9220916.5 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s) 18601500 ns 18588542 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/GPU/CUDA 1487808 ns 1491282 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s) 17301000 ns 17297167 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s) 13890083 ns 13998334 ns 0.99
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s) 14536416 ns 14528812.5 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s) 21849875 ns 21846291.5 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s) 250662791.5 ns 250636687.5 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s) 148483145.5 ns 148810208 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s) 116244708 ns 116894000 ns 0.99
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s) 447941542 ns 447336459 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/GPU/CUDA 5468492 ns 5498322 ns 0.99
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s) 1220473417 ns 1223363500 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s) 928051958 ns 932727167 ns 0.99
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s) 828338104 ns 835497354 ns 0.99
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s) 1629213792 ns 1631111709 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA 31128714 ns 31309225 ns 0.99
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s) 1068598834 ns 1143243667 ns 0.93
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s) 965131583 ns 994946937.5 ns 0.97
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s) 1298869062.5 ns 1312863792 ns 0.99
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s) 1731451000 ns 1733454958 ns 1.00
lenet(28, 28, 1, 32)/forward/CPU/2 thread(s) 1105521 ns 1116500 ns 0.99
lenet(28, 28, 1, 32)/forward/CPU/4 thread(s) 1504458.5 ns 1643667 ns 0.92
lenet(28, 28, 1, 32)/forward/CPU/8 thread(s) 3588000 ns 3643542 ns 0.98
lenet(28, 28, 1, 32)/forward/CPU/1 thread(s) 785542 ns 789041 ns 1.00
lenet(28, 28, 1, 32)/forward/GPU/CUDA 270147 ns 269726 ns 1.00
lenet(28, 28, 1, 32)/zygote/CPU/2 thread(s) 2989521 ns 2991666 ns 1.00
lenet(28, 28, 1, 32)/zygote/CPU/4 thread(s) 4100375 ns 4148959 ns 0.99
lenet(28, 28, 1, 32)/zygote/CPU/8 thread(s) 10725834 ns 11609792 ns 0.92
lenet(28, 28, 1, 32)/zygote/CPU/1 thread(s) 3151334 ns 3148729 ns 1.00
lenet(28, 28, 1, 32)/zygote/GPU/CUDA 1127024.5 ns 1125192 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s) 2273083 ns 2335333.5 ns 0.97
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s) 1320687.5 ns 1299062.5 ns 1.02
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s) 1566750 ns 1557416 ns 1.01
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s) 4217125 ns 4213104 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/GPU/CUDA 209825 ns 210272 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s) 19419958 ns 19407291 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s) 16062459 ns 16096667 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s) 17207666.5 ns 17317666.5 ns 0.99
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s) 25925499.5 ns 25907750 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA 1590537 ns 1592153 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s) 33976167 ns 34310312.5 ns 0.99
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s) 30847604 ns 30986792 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s) 31068500 ns 31273833 ns 0.99
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s) 36660479 ns 36596250 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s) 4532334 ns 4515667 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s) 2536917 ns 2556459 ns 0.99
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s) 2708417 ns 2688249.5 ns 1.01
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s) 8394875.5 ns 8389334 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/GPU/CUDA 425554 ns 427314.5 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s) 39076750 ns 39047458 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s) 32039312.5 ns 32181166.5 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s) 32300542 ns 32260292 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s) 51878792 ns 52002833 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA 2625717.5 ns 2620622.5 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s) 89157937.5 ns 89392542 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s) 110310354.5 ns 115571416.5 ns 0.95
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s) 221196292 ns 230633541.5 ns 0.96
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s) 74661583.5 ns 74339646 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s) 268645541 ns 268599084 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s) 155966250 ns 156359333 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s) 123152709 ns 123836375 ns 0.99
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s) 485576375 ns 485309084 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/GPU/CUDA 7017925 ns 7055046.5 ns 0.99
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s) 1469993416.5 ns 1473572104 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s) 1172293917 ns 1171549667 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s) 1071179125 ns 1064310166.5 ns 1.01
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s) 2008263562.5 ns 2002969750 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA 34758889.5 ns 34640088 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s) 1722939167 ns 1719792625 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s) 1515630729 ns 1528780229.5 ns 0.99
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s) 1805980375 ns 1913538000 ns 0.94
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s) 2204894250 ns 2212588749.5 ns 1.00
lenet(28, 28, 1, 128)/forward/CPU/2 thread(s) 2101917 ns 2096854 ns 1.00
lenet(28, 28, 1, 128)/forward/CPU/4 thread(s) 2855250 ns 3045708 ns 0.94
lenet(28, 28, 1, 128)/forward/CPU/8 thread(s) 8250875 ns 7809458 ns 1.06
lenet(28, 28, 1, 128)/forward/CPU/1 thread(s) 2316458.5 ns 2327583 ns 1.00
lenet(28, 28, 1, 128)/forward/GPU/CUDA 271499 ns 272243 ns 1.00
lenet(28, 28, 1, 128)/zygote/CPU/2 thread(s) 9314958 ns 9676145.5 ns 0.96
lenet(28, 28, 1, 128)/zygote/CPU/4 thread(s) 12005750.5 ns 12104584 ns 0.99
lenet(28, 28, 1, 128)/zygote/CPU/8 thread(s) 24338916.5 ns 25834750.5 ns 0.94
lenet(28, 28, 1, 128)/zygote/CPU/1 thread(s) 11759333 ns 11757916.5 ns 1.00
lenet(28, 28, 1, 128)/zygote/GPU/CUDA 1189529 ns 1202175.5 ns 0.99
vgg16(32, 32, 3, 32)/forward/CPU/2 thread(s) 379542666.5 ns 381271021 ns 1.00
vgg16(32, 32, 3, 32)/forward/CPU/4 thread(s) 310121896 ns 310142458.5 ns 1.00
vgg16(32, 32, 3, 32)/forward/CPU/8 thread(s) 270228604.5 ns 259645854 ns 1.04
vgg16(32, 32, 3, 32)/forward/CPU/1 thread(s) 452462041.5 ns 452979375.5 ns 1.00
vgg16(32, 32, 3, 32)/forward/GPU/CUDA 4858112 ns 4824218 ns 1.01
vgg16(32, 32, 3, 32)/zygote/CPU/2 thread(s) 1161116458 ns 1158461917 ns 1.00
vgg16(32, 32, 3, 32)/zygote/CPU/4 thread(s) 936045042 ns 943496166 ns 0.99
vgg16(32, 32, 3, 32)/zygote/CPU/8 thread(s) 1039056250 ns 962405792 ns 1.08
vgg16(32, 32, 3, 32)/zygote/CPU/1 thread(s) 1397951750 ns 1401166834 ns 1.00
vgg16(32, 32, 3, 32)/zygote/GPU/CUDA 17884006 ns 17976769 ns 0.99
lenet(28, 28, 1, 64)/forward/CPU/2 thread(s) 1057792 ns 1054125 ns 1.00
lenet(28, 28, 1, 64)/forward/CPU/4 thread(s) 1665375 ns 1661875 ns 1.00
lenet(28, 28, 1, 64)/forward/CPU/8 thread(s) 4671500 ns 5315041.5 ns 0.88
lenet(28, 28, 1, 64)/forward/CPU/1 thread(s) 1297417 ns 1312541 ns 0.99
lenet(28, 28, 1, 64)/forward/GPU/CUDA 269688.5 ns 262820 ns 1.03
lenet(28, 28, 1, 64)/zygote/CPU/2 thread(s) 6411041 ns 6268084 ns 1.02
lenet(28, 28, 1, 64)/zygote/CPU/4 thread(s) 13166167 ns 13117917 ns 1.00
lenet(28, 28, 1, 64)/zygote/CPU/8 thread(s) 18369000 ns 19113646 ns 0.96
lenet(28, 28, 1, 64)/zygote/CPU/1 thread(s) 5854395.5 ns 6086916.5 ns 0.96
lenet(28, 28, 1, 64)/zygote/GPU/CUDA 1228485 ns 1202851 ns 1.02
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s) 70564395.5 ns 70511500 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s) 43714083.5 ns 43790645.5 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s) 39753208 ns 39687083 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s) 132540542 ns 132685124.5 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/GPU/CUDA 1943140 ns 1858084 ns 1.05
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s) 355335375 ns 356003687.5 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s) 270403333 ns 270657292 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s) 253291937.5 ns 253711000 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s) 534663375 ns 535180938 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/GPU/CUDA 12307495 ns 12300153.5 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s) 395656250 ns 396495000 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s) 373284833 ns 373274250 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s) 655973250 ns 728479375 ns 0.90
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s) 711770458 ns 713118958 ns 1.00
vgg16(32, 32, 3, 128)/forward/CPU/2 thread(s) 1188878875 ns 1190615875 ns 1.00
vgg16(32, 32, 3, 128)/forward/CPU/4 thread(s) 830603770.5 ns 832981520.5 ns 1.00
vgg16(32, 32, 3, 128)/forward/CPU/8 thread(s) 640453979 ns 636784729 ns 1.01
vgg16(32, 32, 3, 128)/forward/CPU/1 thread(s) 1769157145.5 ns 1772366146 ns 1.00
vgg16(32, 32, 3, 128)/forward/GPU/CUDA 12306601 ns 12314533 ns 1.00
vgg16(32, 32, 3, 128)/zygote/CPU/2 thread(s) 3632733895.5 ns 3626681875 ns 1.00
vgg16(32, 32, 3, 128)/zygote/CPU/4 thread(s) 2812753583 ns 2823362334 ns 1.00
vgg16(32, 32, 3, 128)/zygote/CPU/8 thread(s) 2711988875 ns 2696862625 ns 1.01
vgg16(32, 32, 3, 128)/zygote/CPU/1 thread(s) 5018496208 ns 5012508375 ns 1.00
vgg16(32, 32, 3, 128)/zygote/GPU/CUDA 50053860 ns 49738182 ns 1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s) 3404250 ns 3411709 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s) 2081562.5 ns 2072167 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s) 2527791.5 ns 2511937 ns 1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s) 6026500 ns 6036208.5 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/GPU/CUDA 313980.5 ns 339526 ns 0.92
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s) 26041958 ns 26069125 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s) 18880958 ns 19056209 ns 0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s) 19381417 ns 19089979 ns 1.02
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s) 39366250 ns 39346459 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA 2467954 ns 2459884 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s) 54391666.5 ns 54485937.5 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s) 79414959 ns 83611520.5 ns 0.95
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s) 173499479 ns 172934000 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s) 45644334 ns 45625250 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s) 1779541 ns 1782458 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s) 1103458.5 ns 1105812.5 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s) 1565229.5 ns 1567000 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s) 3034833 ns 3042542 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/GPU/CUDA 212435.5 ns 211807 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s) 12548145.5 ns 12573896 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s) 9176604 ns 9226667 ns 0.99
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s) 9628291.5 ns 9578916 ns 1.01
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s) 19022333 ns 19028833 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA 1541164.5 ns 1529077 ns 1.01
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s) 17655271 ns 17667042 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s) 14328958 ns 14347562.5 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s) 14577375 ns 14567083 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s) 22195583.5 ns 22199146 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s) 70632792 ns 70625625 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s) 43626937.5 ns 43726917 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s) 39727333 ns 39746812.5 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s) 132702083.5 ns 132787937.5 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/GPU/CUDA 1875633.5 ns 1934732 ns 0.97
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s) 359948021 ns 359903500 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s) 346896729.5 ns 348164021 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s) 305342083 ns 304529250 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s) 725230792 ns 723383792 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA 13377006 ns 13382889 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s) 420544646 ns 421402083.5 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s) 420636999.5 ns 425694708 ns 0.99
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s) 764717937 ns 747909395.5 ns 1.02
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s) 716168625 ns 716447375 ns 1.00
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/2 thread(s) 1511645.5 ns 1592833 ns 0.95
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/4 thread(s) 1154625 ns 1158167 ns 1.00
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/8 thread(s) 1163583 ns 1146250 ns 1.02
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/1 thread(s) 2456083 ns 2412542 ns 1.02
mlp7layer_bn(gelu)(32 x 256)/forward/GPU/CUDA 583442.5 ns 572386.5 ns 1.02
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/2 thread(s) 8867000 ns 8854896 ns 1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/4 thread(s) 13888042 ns 13602562.5 ns 1.02
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/8 thread(s) 33278833 ns 33345229.5 ns 1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/1 thread(s) 9863333 ns 9874291.5 ns 1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/GPU/CUDA 1464876 ns 1430613 ns 1.02
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/2 thread(s) 16574334 ns 16524209 ns 1.00
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/4 thread(s) 22600145.5 ns 23380666 ns 0.97
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/8 thread(s) 44879979.5 ns 43658750 ns 1.03
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/1 thread(s) 13139812.5 ns 13137667 ns 1.00
Dense(512 => 512, relu)(512 x 128)/forward/CPU/2 thread(s) 828583.5 ns 824166.5 ns 1.01
Dense(512 => 512, relu)(512 x 128)/forward/CPU/4 thread(s) 420291.5 ns 570124.5 ns 0.74
Dense(512 => 512, relu)(512 x 128)/forward/CPU/8 thread(s) 1049375 ns 1063916 ns 0.99
Dense(512 => 512, relu)(512 x 128)/forward/CPU/1 thread(s) 724875 ns 725458.5 ns 1.00
Dense(512 => 512, relu)(512 x 128)/forward/GPU/CUDA 47459.5 ns 47937 ns 0.99
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/2 thread(s) 1513208 ns 1459250 ns 1.04
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/4 thread(s) 954458 ns 1049437 ns 0.91
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/8 thread(s) 1716520.5 ns 1395208 ns 1.23
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/1 thread(s) 2271209 ns 2260625 ns 1.00
Dense(512 => 512, relu)(512 x 128)/zygote/GPU/CUDA 238389 ns 238994 ns 1.00
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/2 thread(s) 1546208 ns 1530500 ns 1.01
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/4 thread(s) 1060229.5 ns 1089333 ns 0.97
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/8 thread(s) 1489458.5 ns 1620292 ns 0.92
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/1 thread(s) 2241416.5 ns 2253083 ns 0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s) 3400042 ns 3403625 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s) 2070874.5 ns 2061291.5 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s) 2520375 ns 2484792 ns 1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s) 6012875 ns 6026312.5 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/GPU/CUDA 288345 ns 284269 ns 1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s) 24060000 ns 24093750 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s) 17205708 ns 17201292 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s) 17116750 ns 17041500 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s) 37647979.5 ns 37570375 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/GPU/CUDA 2410484 ns 2411977 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s) 52867250 ns 52911625 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s) 80422375 ns 84393791 ns 0.95
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s) 170489625 ns 172819250 ns 0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s) 44608687.5 ns 44615937.5 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s) 250519729 ns 250487334 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s) 148647875 ns 148602334 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s) 116284938 ns 116391208 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s) 447812229.5 ns 448074791.5 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/GPU/CUDA 5466666 ns 5454241 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s) 1101838792 ns 1105117875 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s) 857350166.5 ns 858058396 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s) 827927395.5 ns 825075479.5 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s) 1752721167 ns 1753955542 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/GPU/CUDA 28896656 ns 28910957.5 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s) 1027872958 ns 1030979062.5 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s) 949678541 ns 972989292 ns 0.98
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s) 1283911125 ns 1286035166 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s) 1723765709 ns 1723177166.5 ns 1.00
mlp7layer_bn(relu)(32 x 256)/forward/CPU/2 thread(s) 1101708 ns 1140750 ns 0.97
mlp7layer_bn(relu)(32 x 256)/forward/CPU/4 thread(s) 680396 ns 760750 ns 0.89
mlp7layer_bn(relu)(32 x 256)/forward/CPU/8 thread(s) 667396 ns 752167 ns 0.89
mlp7layer_bn(relu)(32 x 256)/forward/CPU/1 thread(s) 2049895.5 ns 2053417 ns 1.00
mlp7layer_bn(relu)(32 x 256)/forward/GPU/CUDA 572066.5 ns 562591 ns 1.02
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/2 thread(s) 5888125 ns 5876834 ns 1.00
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/4 thread(s) 8353229 ns 8974250 ns 0.93
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/8 thread(s) 25738625.5 ns 25959750 ns 0.99
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/1 thread(s) 7117458 ns 7106396 ns 1.00
mlp7layer_bn(relu)(32 x 256)/zygote/GPU/CUDA 1386537.5 ns 1411580 ns 0.98
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/2 thread(s) 9689104 ns 9670166.5 ns 1.00
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/4 thread(s) 15038229.5 ns 16148166 ns 0.93
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/8 thread(s) 32959125 ns 33000792 ns 1.00
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/1 thread(s) 7631000 ns 7621875 ns 1.00
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/2 thread(s) 512854 ns 516896 ns 0.99
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/4 thread(s) 285500 ns 415479.5 ns 0.69
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/8 thread(s) 3290708.5 ns 2957791.5 ns 1.11
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/1 thread(s) 90000 ns 89500 ns 1.01
Dense(128 => 128, gelu)(128 x 128)/forward/GPU/CUDA 28008 ns 28198 ns 0.99
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/2 thread(s) 381292 ns 380583 ns 1.00
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/4 thread(s) 433083.5 ns 444083.5 ns 0.98
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/8 thread(s) 4497542 ns 4683416 ns 0.96
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/1 thread(s) 258500 ns 258979.5 ns 1.00
Dense(128 => 128, gelu)(128 x 128)/zygote/GPU/CUDA 224122.5 ns 227826.5 ns 0.98
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/2 thread(s) 411708.5 ns 413416 ns 1.00
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/4 thread(s) 463834 ns 475458 ns 0.98
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/8 thread(s) 4857917 ns 4631791 ns 1.05
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/1 thread(s) 271354.5 ns 271583 ns 1.00
Dense(128 => 128, relu)(128 x 128)/forward/CPU/2 thread(s) 464854.5 ns 462958 ns 1.00
Dense(128 => 128, relu)(128 x 128)/forward/CPU/4 thread(s) 219854 ns 355875 ns 0.62
Dense(128 => 128, relu)(128 x 128)/forward/CPU/8 thread(s) 760000 ns 767000.5 ns 0.99
Dense(128 => 128, relu)(128 x 128)/forward/CPU/1 thread(s) 53292 ns 53917 ns 0.99
Dense(128 => 128, relu)(128 x 128)/forward/GPU/CUDA 28360 ns 28301 ns 1.00
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/2 thread(s) 340833 ns 339959 ns 1.00
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/4 thread(s) 326666 ns 341521 ns 0.96
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/8 thread(s) 697604 ns 898375 ns 0.78
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/1 thread(s) 151625 ns 151708 ns 1.00
Dense(128 => 128, relu)(128 x 128)/zygote/GPU/CUDA 210056 ns 212644.5 ns 0.99
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/2 thread(s) 354167 ns 355000 ns 1.00
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/4 thread(s) 340541 ns 356709 ns 0.95
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/8 thread(s) 612458 ns 944500 ns 0.65
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/1 thread(s) 151208 ns 151167 ns 1.00
vgg16(32, 32, 3, 64)/forward/CPU/2 thread(s) 601611250 ns 603130416 ns 1.00
vgg16(32, 32, 3, 64)/forward/CPU/4 thread(s) 429098979 ns 428986854 ns 1.00
vgg16(32, 32, 3, 64)/forward/CPU/8 thread(s) 392612937.5 ns 386662562 ns 1.02
vgg16(32, 32, 3, 64)/forward/CPU/1 thread(s) 871912417 ns 871726083.5 ns 1.00
vgg16(32, 32, 3, 64)/forward/GPU/CUDA 7031843.5 ns 7027236 ns 1.00
vgg16(32, 32, 3, 64)/zygote/CPU/2 thread(s) 2003215979.5 ns 2003136437 ns 1.00
vgg16(32, 32, 3, 64)/zygote/CPU/4 thread(s) 1588632104 ns 1606958687.5 ns 0.99
vgg16(32, 32, 3, 64)/zygote/CPU/8 thread(s) 1645858395.5 ns 1550423687 ns 1.06
vgg16(32, 32, 3, 64)/zygote/CPU/1 thread(s) 2622754667 ns 2625941250 ns 1.00
vgg16(32, 32, 3, 64)/zygote/GPU/CUDA 26077633.5 ns 25917847 ns 1.01
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/2 thread(s) 531041.5 ns 520000 ns 1.02
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/4 thread(s) 392562.5 ns 394895.5 ns 0.99
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/8 thread(s) 3112916 ns 2701958 ns 1.15
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/1 thread(s) 869500 ns 866188 ns 1.00
Dense(512 => 512, gelu)(512 x 128)/forward/GPU/CUDA 47171 ns 47079 ns 1.00
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/2 thread(s) 1751750 ns 1772187.5 ns 0.99
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/4 thread(s) 1762291.5 ns 1781709 ns 0.99
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/8 thread(s) 16309167 ns 16286125 ns 1.00
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/1 thread(s) 2771167 ns 2723250 ns 1.02
Dense(512 => 512, gelu)(512 x 128)/zygote/GPU/CUDA 251324 ns 248319.5 ns 1.01
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/2 thread(s) 1848417 ns 1850645.5 ns 1.00
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/4 thread(s) 1852416 ns 1848146 ns 1.00
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/8 thread(s) 16667979 ns 16689875 ns 1.00
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/1 thread(s) 2787916 ns 2754291 ns 1.01
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/2 thread(s) 1351458 ns 1469521 ns 0.92
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/4 thread(s) 1027312 ns 1034625 ns 0.99
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/8 thread(s) 931875 ns 988249.5 ns 0.94
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/1 thread(s) 2324458 ns 2212416.5 ns 1.05
mlp7layer_bn(tanh)(32 x 256)/forward/GPU/CUDA 584909.5 ns 574726 ns 1.02
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/2 thread(s) 5897250 ns 5868937.5 ns 1.00
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/4 thread(s) 8354604.5 ns 9178042 ns 0.91
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/8 thread(s) 26379334 ns 27617875 ns 0.96
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/1 thread(s) 7333291 ns 7341854.5 ns 1.00
mlp7layer_bn(tanh)(32 x 256)/zygote/GPU/CUDA 1385897 ns 1351520 ns 1.03
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/2 thread(s) 11681167 ns 11650895.5 ns 1.00
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/4 thread(s) 18190500 ns 18290208 ns 0.99
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/8 thread(s) 38237709 ns 38510270.5 ns 0.99
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/1 thread(s) 9556291 ns 9545666 ns 1.00
Dense(16 => 16, relu)(16 x 128)/forward/CPU/2 thread(s) 2584 ns 2583 ns 1.00
Dense(16 => 16, relu)(16 x 128)/forward/CPU/4 thread(s) 3604 ns 2458 ns 1.47
Dense(16 => 16, relu)(16 x 128)/forward/CPU/8 thread(s) 3542 ns 3250 ns 1.09
Dense(16 => 16, relu)(16 x 128)/forward/CPU/1 thread(s) 2417 ns 4562.5 ns 0.53
Dense(16 => 16, relu)(16 x 128)/forward/GPU/CUDA 24985 ns 24500.5 ns 1.02
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/2 thread(s) 7208 ns 6833 ns 1.05
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/4 thread(s) 7083 ns 6875 ns 1.03
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/8 thread(s) 7334 ns 7292 ns 1.01
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/1 thread(s) 7167 ns 7166.5 ns 1.00
Dense(16 => 16, relu)(16 x 128)/zygote/GPU/CUDA 216583.5 ns 209627.5 ns 1.03
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/2 thread(s) 8250 ns 8084 ns 1.02
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/4 thread(s) 8458.5 ns 8166 ns 1.04
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/8 thread(s) 8666 ns 8520.5 ns 1.02
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/1 thread(s) 5875 ns 6020.5 ns 0.98
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/2 thread(s) 9937.5 ns 10042 ns 0.99
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/4 thread(s) 13041.5 ns 14396 ns 0.91
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/8 thread(s) 10375 ns 9625 ns 1.08
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/1 thread(s) 7334 ns 7333.5 ns 1.00
Dense(16 => 16, gelu)(16 x 128)/forward/GPU/CUDA 25394 ns 24458 ns 1.04
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/2 thread(s) 19792 ns 19792 ns 1
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/4 thread(s) 19833 ns 19708 ns 1.01
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/8 thread(s) 20042 ns 20125 ns 1.00
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/1 thread(s) 19875 ns 19875 ns 1
Dense(16 => 16, gelu)(16 x 128)/zygote/GPU/CUDA 236401.5 ns 229625 ns 1.03
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/2 thread(s) 23500 ns 23562.5 ns 1.00
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/4 thread(s) 23500 ns 23542 ns 1.00
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/8 thread(s) 23875 ns 23791 ns 1.00
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/1 thread(s) 21416 ns 21520.5 ns 1.00
Dense(128 => 128, identity)(128 x 128)/forward/CPU/2 thread(s) 28583.5 ns 27000 ns 1.06
Dense(128 => 128, identity)(128 x 128)/forward/CPU/4 thread(s) 28625 ns 28416.5 ns 1.01
Dense(128 => 128, identity)(128 x 128)/forward/CPU/8 thread(s) 28895.5 ns 28188 ns 1.03
Dense(128 => 128, identity)(128 x 128)/forward/CPU/1 thread(s) 46292 ns 46083 ns 1.00
Dense(128 => 128, identity)(128 x 128)/forward/GPU/CUDA 26179 ns 25611 ns 1.02
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/2 thread(s) 224187.5 ns 224666 ns 1.00
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/4 thread(s) 270084 ns 278416 ns 0.97
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/8 thread(s) 4123000 ns 3900375.5 ns 1.06
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/1 thread(s) 145500 ns 145292 ns 1.00
Dense(128 => 128, identity)(128 x 128)/zygote/GPU/CUDA 212922.5 ns 211892 ns 1.00
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/2 thread(s) 242145.5 ns 243417 ns 0.99
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/4 thread(s) 287834 ns 295959 ns 0.97
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/8 thread(s) 4006583 ns 4528416.5 ns 0.88
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/1 thread(s) 145875 ns 145875 ns 1
Dense(16 => 16, identity)(16 x 128)/forward/CPU/2 thread(s) 2000 ns 2667 ns 0.75
Dense(16 => 16, identity)(16 x 128)/forward/CPU/4 thread(s) 1500 ns 1791 ns 0.84
Dense(16 => 16, identity)(16 x 128)/forward/CPU/8 thread(s) 2458 ns 2708 ns 0.91
Dense(16 => 16, identity)(16 x 128)/forward/CPU/1 thread(s) 2041 ns 1959 ns 1.04
Dense(16 => 16, identity)(16 x 128)/forward/GPU/CUDA 23181 ns 23071 ns 1.00
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/2 thread(s) 5375 ns 5125 ns 1.05
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/4 thread(s) 5000 ns 5083 ns 0.98
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/8 thread(s) 5417 ns 5333 ns 1.02
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/1 thread(s) 4917 ns 5125 ns 0.96
Dense(16 => 16, identity)(16 x 128)/zygote/GPU/CUDA 277300.5 ns 266994 ns 1.04
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/2 thread(s) 7458 ns 7500 ns 0.99
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/4 thread(s) 7375 ns 7500 ns 0.98
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/8 thread(s) 7792 ns 7625 ns 1.02
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/1 thread(s) 5125 ns 5125 ns 1
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s) 79972458 ns 80068250 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s) 47857917 ns 47839854.5 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s) 43307917 ns 43348791 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s) 151540125 ns 151521792 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/GPU/CUDA 2710162 ns 2715083 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s) 662506875 ns 665235792 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s) 410576167 ns 410381834 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s) 397618416.5 ns 394582542 ns 1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s) 683832250 ns 682653250 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA 14567626 ns 14595495 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s) 714189229 ns 712441042 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s) 665454667 ns 680663916 ns 0.98
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s) 1013219583 ns 1031283708 ns 0.98
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s) 1002665583 ns 997418875 ns 1.01

This comment was automatically generated by workflow using github-action-benchmark.

Please sign in to comment.