You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
e23b1a7
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lux Benchmarks
Dense(512 => 512, identity)(512 x 128)/forward/CPU/2 thread(s)
414542
ns414750
ns1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/4 thread(s)
243375
ns243729.5
ns1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/8 thread(s)
244500
ns243645.5
ns1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/1 thread(s)
740166
ns739937.5
ns1.00
Dense(512 => 512, identity)(512 x 128)/forward/GPU/CUDA
44280.5
ns44131.5
ns1.00
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/2 thread(s)
1298541.5
ns1277770.5
ns1.02
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/4 thread(s)
1240562
ns1251833
ns0.99
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/8 thread(s)
16503791
ns16532875
ns1.00
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/1 thread(s)
2208500
ns2259416
ns0.98
Dense(512 => 512, identity)(512 x 128)/zygote/GPU/CUDA
208333
ns211816
ns0.98
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/2 thread(s)
1353521
ns1353417
ns1.00
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/4 thread(s)
1293417
ns1287562.5
ns1.00
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/8 thread(s)
16423250
ns16470396.5
ns1.00
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/1 thread(s)
2228104
ns2246000
ns0.99
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
1657875
ns1755729
ns0.94
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1092375
ns1021000.5
ns1.07
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1539021
ns1537666
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
3020458.5
ns2999834
ns1.01
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/GPU/CUDA
210061.5
ns209878
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
12139208
ns12143375
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
8813375
ns8839500
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
9256875
ns9220916.5
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
18601500
ns18588542
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1487808
ns1491282
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
17301000
ns17297167
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
13890083
ns13998334
ns0.99
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
14536416
ns14528812.5
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
21849875
ns21846291.5
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
250662791.5
ns250636687.5
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
148483145.5
ns148810208
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
116244708
ns116894000
ns0.99
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
447941542
ns447336459
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/GPU/CUDA
5468492
ns5498322
ns0.99
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
1220473417
ns1223363500
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
928051958
ns932727167
ns0.99
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
828338104
ns835497354
ns0.99
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
1629213792
ns1631111709
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
31128714
ns31309225
ns0.99
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
1068598834
ns1143243667
ns0.93
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
965131583
ns994946937.5
ns0.97
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
1298869062.5
ns1312863792
ns0.99
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
1731451000
ns1733454958
ns1.00
lenet(28, 28, 1, 32)/forward/CPU/2 thread(s)
1105521
ns1116500
ns0.99
lenet(28, 28, 1, 32)/forward/CPU/4 thread(s)
1504458.5
ns1643667
ns0.92
lenet(28, 28, 1, 32)/forward/CPU/8 thread(s)
3588000
ns3643542
ns0.98
lenet(28, 28, 1, 32)/forward/CPU/1 thread(s)
785542
ns789041
ns1.00
lenet(28, 28, 1, 32)/forward/GPU/CUDA
270147
ns269726
ns1.00
lenet(28, 28, 1, 32)/zygote/CPU/2 thread(s)
2989521
ns2991666
ns1.00
lenet(28, 28, 1, 32)/zygote/CPU/4 thread(s)
4100375
ns4148959
ns0.99
lenet(28, 28, 1, 32)/zygote/CPU/8 thread(s)
10725834
ns11609792
ns0.92
lenet(28, 28, 1, 32)/zygote/CPU/1 thread(s)
3151334
ns3148729
ns1.00
lenet(28, 28, 1, 32)/zygote/GPU/CUDA
1127024.5
ns1125192
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
2273083
ns2335333.5
ns0.97
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1320687.5
ns1299062.5
ns1.02
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1566750
ns1557416
ns1.01
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
4217125
ns4213104
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/GPU/CUDA
209825
ns210272
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
19419958
ns19407291
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
16062459
ns16096667
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
17207666.5
ns17317666.5
ns0.99
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
25925499.5
ns25907750
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1590537
ns1592153
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
33976167
ns34310312.5
ns0.99
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
30847604
ns30986792
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
31068500
ns31273833
ns0.99
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
36660479
ns36596250
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
4532334
ns4515667
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2536917
ns2556459
ns0.99
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2708417
ns2688249.5
ns1.01
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
8394875.5
ns8389334
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/GPU/CUDA
425554
ns427314.5
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
39076750
ns39047458
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
32039312.5
ns32181166.5
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
32300542
ns32260292
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
51878792
ns52002833
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
2625717.5
ns2620622.5
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
89157937.5
ns89392542
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
110310354.5
ns115571416.5
ns0.95
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
221196292
ns230633541.5
ns0.96
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
74661583.5
ns74339646
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
268645541
ns268599084
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
155966250
ns156359333
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
123152709
ns123836375
ns0.99
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
485576375
ns485309084
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/GPU/CUDA
7017925
ns7055046.5
ns0.99
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
1469993416.5
ns1473572104
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
1172293917
ns1171549667
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
1071179125
ns1064310166.5
ns1.01
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
2008263562.5
ns2002969750
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
34758889.5
ns34640088
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
1722939167
ns1719792625
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
1515630729
ns1528780229.5
ns0.99
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
1805980375
ns1913538000
ns0.94
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
2204894250
ns2212588749.5
ns1.00
lenet(28, 28, 1, 128)/forward/CPU/2 thread(s)
2101917
ns2096854
ns1.00
lenet(28, 28, 1, 128)/forward/CPU/4 thread(s)
2855250
ns3045708
ns0.94
lenet(28, 28, 1, 128)/forward/CPU/8 thread(s)
8250875
ns7809458
ns1.06
lenet(28, 28, 1, 128)/forward/CPU/1 thread(s)
2316458.5
ns2327583
ns1.00
lenet(28, 28, 1, 128)/forward/GPU/CUDA
271499
ns272243
ns1.00
lenet(28, 28, 1, 128)/zygote/CPU/2 thread(s)
9314958
ns9676145.5
ns0.96
lenet(28, 28, 1, 128)/zygote/CPU/4 thread(s)
12005750.5
ns12104584
ns0.99
lenet(28, 28, 1, 128)/zygote/CPU/8 thread(s)
24338916.5
ns25834750.5
ns0.94
lenet(28, 28, 1, 128)/zygote/CPU/1 thread(s)
11759333
ns11757916.5
ns1.00
lenet(28, 28, 1, 128)/zygote/GPU/CUDA
1189529
ns1202175.5
ns0.99
vgg16(32, 32, 3, 32)/forward/CPU/2 thread(s)
379542666.5
ns381271021
ns1.00
vgg16(32, 32, 3, 32)/forward/CPU/4 thread(s)
310121896
ns310142458.5
ns1.00
vgg16(32, 32, 3, 32)/forward/CPU/8 thread(s)
270228604.5
ns259645854
ns1.04
vgg16(32, 32, 3, 32)/forward/CPU/1 thread(s)
452462041.5
ns452979375.5
ns1.00
vgg16(32, 32, 3, 32)/forward/GPU/CUDA
4858112
ns4824218
ns1.01
vgg16(32, 32, 3, 32)/zygote/CPU/2 thread(s)
1161116458
ns1158461917
ns1.00
vgg16(32, 32, 3, 32)/zygote/CPU/4 thread(s)
936045042
ns943496166
ns0.99
vgg16(32, 32, 3, 32)/zygote/CPU/8 thread(s)
1039056250
ns962405792
ns1.08
vgg16(32, 32, 3, 32)/zygote/CPU/1 thread(s)
1397951750
ns1401166834
ns1.00
vgg16(32, 32, 3, 32)/zygote/GPU/CUDA
17884006
ns17976769
ns0.99
lenet(28, 28, 1, 64)/forward/CPU/2 thread(s)
1057792
ns1054125
ns1.00
lenet(28, 28, 1, 64)/forward/CPU/4 thread(s)
1665375
ns1661875
ns1.00
lenet(28, 28, 1, 64)/forward/CPU/8 thread(s)
4671500
ns5315041.5
ns0.88
lenet(28, 28, 1, 64)/forward/CPU/1 thread(s)
1297417
ns1312541
ns0.99
lenet(28, 28, 1, 64)/forward/GPU/CUDA
269688.5
ns262820
ns1.03
lenet(28, 28, 1, 64)/zygote/CPU/2 thread(s)
6411041
ns6268084
ns1.02
lenet(28, 28, 1, 64)/zygote/CPU/4 thread(s)
13166167
ns13117917
ns1.00
lenet(28, 28, 1, 64)/zygote/CPU/8 thread(s)
18369000
ns19113646
ns0.96
lenet(28, 28, 1, 64)/zygote/CPU/1 thread(s)
5854395.5
ns6086916.5
ns0.96
lenet(28, 28, 1, 64)/zygote/GPU/CUDA
1228485
ns1202851
ns1.02
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
70564395.5
ns70511500
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
43714083.5
ns43790645.5
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
39753208
ns39687083
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
132540542
ns132685124.5
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/GPU/CUDA
1943140
ns1858084
ns1.05
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
355335375
ns356003687.5
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
270403333
ns270657292
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
253291937.5
ns253711000
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
534663375
ns535180938
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
12307495
ns12300153.5
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
395656250
ns396495000
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
373284833
ns373274250
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
655973250
ns728479375
ns0.90
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
711770458
ns713118958
ns1.00
vgg16(32, 32, 3, 128)/forward/CPU/2 thread(s)
1188878875
ns1190615875
ns1.00
vgg16(32, 32, 3, 128)/forward/CPU/4 thread(s)
830603770.5
ns832981520.5
ns1.00
vgg16(32, 32, 3, 128)/forward/CPU/8 thread(s)
640453979
ns636784729
ns1.01
vgg16(32, 32, 3, 128)/forward/CPU/1 thread(s)
1769157145.5
ns1772366146
ns1.00
vgg16(32, 32, 3, 128)/forward/GPU/CUDA
12306601
ns12314533
ns1.00
vgg16(32, 32, 3, 128)/zygote/CPU/2 thread(s)
3632733895.5
ns3626681875
ns1.00
vgg16(32, 32, 3, 128)/zygote/CPU/4 thread(s)
2812753583
ns2823362334
ns1.00
vgg16(32, 32, 3, 128)/zygote/CPU/8 thread(s)
2711988875
ns2696862625
ns1.01
vgg16(32, 32, 3, 128)/zygote/CPU/1 thread(s)
5018496208
ns5012508375
ns1.00
vgg16(32, 32, 3, 128)/zygote/GPU/CUDA
50053860
ns49738182
ns1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
3404250
ns3411709
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2081562.5
ns2072167
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2527791.5
ns2511937
ns1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
6026500
ns6036208.5
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/GPU/CUDA
313980.5
ns339526
ns0.92
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
26041958
ns26069125
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
18880958
ns19056209
ns0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
19381417
ns19089979
ns1.02
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
39366250
ns39346459
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
2467954
ns2459884
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
54391666.5
ns54485937.5
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
79414959
ns83611520.5
ns0.95
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
173499479
ns172934000
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
45644334
ns45625250
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
1779541
ns1782458
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1103458.5
ns1105812.5
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1565229.5
ns1567000
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
3034833
ns3042542
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/GPU/CUDA
212435.5
ns211807
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
12548145.5
ns12573896
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
9176604
ns9226667
ns0.99
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
9628291.5
ns9578916
ns1.01
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
19022333
ns19028833
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1541164.5
ns1529077
ns1.01
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
17655271
ns17667042
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
14328958
ns14347562.5
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
14577375
ns14567083
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
22195583.5
ns22199146
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
70632792
ns70625625
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
43626937.5
ns43726917
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
39727333
ns39746812.5
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
132702083.5
ns132787937.5
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/GPU/CUDA
1875633.5
ns1934732
ns0.97
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
359948021
ns359903500
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
346896729.5
ns348164021
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
305342083
ns304529250
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
725230792
ns723383792
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
13377006
ns13382889
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
420544646
ns421402083.5
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
420636999.5
ns425694708
ns0.99
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
764717937
ns747909395.5
ns1.02
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
716168625
ns716447375
ns1.00
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/2 thread(s)
1511645.5
ns1592833
ns0.95
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/4 thread(s)
1154625
ns1158167
ns1.00
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/8 thread(s)
1163583
ns1146250
ns1.02
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/1 thread(s)
2456083
ns2412542
ns1.02
mlp7layer_bn(gelu)(32 x 256)/forward/GPU/CUDA
583442.5
ns572386.5
ns1.02
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/2 thread(s)
8867000
ns8854896
ns1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/4 thread(s)
13888042
ns13602562.5
ns1.02
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/8 thread(s)
33278833
ns33345229.5
ns1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/1 thread(s)
9863333
ns9874291.5
ns1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/GPU/CUDA
1464876
ns1430613
ns1.02
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/2 thread(s)
16574334
ns16524209
ns1.00
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/4 thread(s)
22600145.5
ns23380666
ns0.97
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/8 thread(s)
44879979.5
ns43658750
ns1.03
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/1 thread(s)
13139812.5
ns13137667
ns1.00
Dense(512 => 512, relu)(512 x 128)/forward/CPU/2 thread(s)
828583.5
ns824166.5
ns1.01
Dense(512 => 512, relu)(512 x 128)/forward/CPU/4 thread(s)
420291.5
ns570124.5
ns0.74
Dense(512 => 512, relu)(512 x 128)/forward/CPU/8 thread(s)
1049375
ns1063916
ns0.99
Dense(512 => 512, relu)(512 x 128)/forward/CPU/1 thread(s)
724875
ns725458.5
ns1.00
Dense(512 => 512, relu)(512 x 128)/forward/GPU/CUDA
47459.5
ns47937
ns0.99
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/2 thread(s)
1513208
ns1459250
ns1.04
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/4 thread(s)
954458
ns1049437
ns0.91
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/8 thread(s)
1716520.5
ns1395208
ns1.23
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/1 thread(s)
2271209
ns2260625
ns1.00
Dense(512 => 512, relu)(512 x 128)/zygote/GPU/CUDA
238389
ns238994
ns1.00
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/2 thread(s)
1546208
ns1530500
ns1.01
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/4 thread(s)
1060229.5
ns1089333
ns0.97
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/8 thread(s)
1489458.5
ns1620292
ns0.92
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/1 thread(s)
2241416.5
ns2253083
ns0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
3400042
ns3403625
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2070874.5
ns2061291.5
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2520375
ns2484792
ns1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
6012875
ns6026312.5
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/GPU/CUDA
288345
ns284269
ns1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
24060000
ns24093750
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
17205708
ns17201292
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
17116750
ns17041500
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
37647979.5
ns37570375
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
2410484
ns2411977
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
52867250
ns52911625
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
80422375
ns84393791
ns0.95
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
170489625
ns172819250
ns0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
44608687.5
ns44615937.5
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
250519729
ns250487334
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
148647875
ns148602334
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
116284938
ns116391208
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
447812229.5
ns448074791.5
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/GPU/CUDA
5466666
ns5454241
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
1101838792
ns1105117875
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
857350166.5
ns858058396
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
827927395.5
ns825075479.5
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
1752721167
ns1753955542
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
28896656
ns28910957.5
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
1027872958
ns1030979062.5
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
949678541
ns972989292
ns0.98
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
1283911125
ns1286035166
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
1723765709
ns1723177166.5
ns1.00
mlp7layer_bn(relu)(32 x 256)/forward/CPU/2 thread(s)
1101708
ns1140750
ns0.97
mlp7layer_bn(relu)(32 x 256)/forward/CPU/4 thread(s)
680396
ns760750
ns0.89
mlp7layer_bn(relu)(32 x 256)/forward/CPU/8 thread(s)
667396
ns752167
ns0.89
mlp7layer_bn(relu)(32 x 256)/forward/CPU/1 thread(s)
2049895.5
ns2053417
ns1.00
mlp7layer_bn(relu)(32 x 256)/forward/GPU/CUDA
572066.5
ns562591
ns1.02
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/2 thread(s)
5888125
ns5876834
ns1.00
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/4 thread(s)
8353229
ns8974250
ns0.93
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/8 thread(s)
25738625.5
ns25959750
ns0.99
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/1 thread(s)
7117458
ns7106396
ns1.00
mlp7layer_bn(relu)(32 x 256)/zygote/GPU/CUDA
1386537.5
ns1411580
ns0.98
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/2 thread(s)
9689104
ns9670166.5
ns1.00
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/4 thread(s)
15038229.5
ns16148166
ns0.93
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/8 thread(s)
32959125
ns33000792
ns1.00
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/1 thread(s)
7631000
ns7621875
ns1.00
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/2 thread(s)
512854
ns516896
ns0.99
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/4 thread(s)
285500
ns415479.5
ns0.69
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/8 thread(s)
3290708.5
ns2957791.5
ns1.11
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/1 thread(s)
90000
ns89500
ns1.01
Dense(128 => 128, gelu)(128 x 128)/forward/GPU/CUDA
28008
ns28198
ns0.99
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/2 thread(s)
381292
ns380583
ns1.00
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/4 thread(s)
433083.5
ns444083.5
ns0.98
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/8 thread(s)
4497542
ns4683416
ns0.96
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/1 thread(s)
258500
ns258979.5
ns1.00
Dense(128 => 128, gelu)(128 x 128)/zygote/GPU/CUDA
224122.5
ns227826.5
ns0.98
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/2 thread(s)
411708.5
ns413416
ns1.00
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/4 thread(s)
463834
ns475458
ns0.98
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/8 thread(s)
4857917
ns4631791
ns1.05
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/1 thread(s)
271354.5
ns271583
ns1.00
Dense(128 => 128, relu)(128 x 128)/forward/CPU/2 thread(s)
464854.5
ns462958
ns1.00
Dense(128 => 128, relu)(128 x 128)/forward/CPU/4 thread(s)
219854
ns355875
ns0.62
Dense(128 => 128, relu)(128 x 128)/forward/CPU/8 thread(s)
760000
ns767000.5
ns0.99
Dense(128 => 128, relu)(128 x 128)/forward/CPU/1 thread(s)
53292
ns53917
ns0.99
Dense(128 => 128, relu)(128 x 128)/forward/GPU/CUDA
28360
ns28301
ns1.00
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/2 thread(s)
340833
ns339959
ns1.00
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/4 thread(s)
326666
ns341521
ns0.96
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/8 thread(s)
697604
ns898375
ns0.78
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/1 thread(s)
151625
ns151708
ns1.00
Dense(128 => 128, relu)(128 x 128)/zygote/GPU/CUDA
210056
ns212644.5
ns0.99
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/2 thread(s)
354167
ns355000
ns1.00
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/4 thread(s)
340541
ns356709
ns0.95
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/8 thread(s)
612458
ns944500
ns0.65
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/1 thread(s)
151208
ns151167
ns1.00
vgg16(32, 32, 3, 64)/forward/CPU/2 thread(s)
601611250
ns603130416
ns1.00
vgg16(32, 32, 3, 64)/forward/CPU/4 thread(s)
429098979
ns428986854
ns1.00
vgg16(32, 32, 3, 64)/forward/CPU/8 thread(s)
392612937.5
ns386662562
ns1.02
vgg16(32, 32, 3, 64)/forward/CPU/1 thread(s)
871912417
ns871726083.5
ns1.00
vgg16(32, 32, 3, 64)/forward/GPU/CUDA
7031843.5
ns7027236
ns1.00
vgg16(32, 32, 3, 64)/zygote/CPU/2 thread(s)
2003215979.5
ns2003136437
ns1.00
vgg16(32, 32, 3, 64)/zygote/CPU/4 thread(s)
1588632104
ns1606958687.5
ns0.99
vgg16(32, 32, 3, 64)/zygote/CPU/8 thread(s)
1645858395.5
ns1550423687
ns1.06
vgg16(32, 32, 3, 64)/zygote/CPU/1 thread(s)
2622754667
ns2625941250
ns1.00
vgg16(32, 32, 3, 64)/zygote/GPU/CUDA
26077633.5
ns25917847
ns1.01
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/2 thread(s)
531041.5
ns520000
ns1.02
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/4 thread(s)
392562.5
ns394895.5
ns0.99
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/8 thread(s)
3112916
ns2701958
ns1.15
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/1 thread(s)
869500
ns866188
ns1.00
Dense(512 => 512, gelu)(512 x 128)/forward/GPU/CUDA
47171
ns47079
ns1.00
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/2 thread(s)
1751750
ns1772187.5
ns0.99
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/4 thread(s)
1762291.5
ns1781709
ns0.99
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/8 thread(s)
16309167
ns16286125
ns1.00
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/1 thread(s)
2771167
ns2723250
ns1.02
Dense(512 => 512, gelu)(512 x 128)/zygote/GPU/CUDA
251324
ns248319.5
ns1.01
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/2 thread(s)
1848417
ns1850645.5
ns1.00
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/4 thread(s)
1852416
ns1848146
ns1.00
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/8 thread(s)
16667979
ns16689875
ns1.00
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/1 thread(s)
2787916
ns2754291
ns1.01
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/2 thread(s)
1351458
ns1469521
ns0.92
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/4 thread(s)
1027312
ns1034625
ns0.99
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/8 thread(s)
931875
ns988249.5
ns0.94
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/1 thread(s)
2324458
ns2212416.5
ns1.05
mlp7layer_bn(tanh)(32 x 256)/forward/GPU/CUDA
584909.5
ns574726
ns1.02
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/2 thread(s)
5897250
ns5868937.5
ns1.00
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/4 thread(s)
8354604.5
ns9178042
ns0.91
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/8 thread(s)
26379334
ns27617875
ns0.96
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/1 thread(s)
7333291
ns7341854.5
ns1.00
mlp7layer_bn(tanh)(32 x 256)/zygote/GPU/CUDA
1385897
ns1351520
ns1.03
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/2 thread(s)
11681167
ns11650895.5
ns1.00
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/4 thread(s)
18190500
ns18290208
ns0.99
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/8 thread(s)
38237709
ns38510270.5
ns0.99
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/1 thread(s)
9556291
ns9545666
ns1.00
Dense(16 => 16, relu)(16 x 128)/forward/CPU/2 thread(s)
2584
ns2583
ns1.00
Dense(16 => 16, relu)(16 x 128)/forward/CPU/4 thread(s)
3604
ns2458
ns1.47
Dense(16 => 16, relu)(16 x 128)/forward/CPU/8 thread(s)
3542
ns3250
ns1.09
Dense(16 => 16, relu)(16 x 128)/forward/CPU/1 thread(s)
2417
ns4562.5
ns0.53
Dense(16 => 16, relu)(16 x 128)/forward/GPU/CUDA
24985
ns24500.5
ns1.02
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/2 thread(s)
7208
ns6833
ns1.05
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/4 thread(s)
7083
ns6875
ns1.03
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/8 thread(s)
7334
ns7292
ns1.01
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/1 thread(s)
7167
ns7166.5
ns1.00
Dense(16 => 16, relu)(16 x 128)/zygote/GPU/CUDA
216583.5
ns209627.5
ns1.03
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/2 thread(s)
8250
ns8084
ns1.02
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/4 thread(s)
8458.5
ns8166
ns1.04
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/8 thread(s)
8666
ns8520.5
ns1.02
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/1 thread(s)
5875
ns6020.5
ns0.98
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/2 thread(s)
9937.5
ns10042
ns0.99
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/4 thread(s)
13041.5
ns14396
ns0.91
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/8 thread(s)
10375
ns9625
ns1.08
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/1 thread(s)
7334
ns7333.5
ns1.00
Dense(16 => 16, gelu)(16 x 128)/forward/GPU/CUDA
25394
ns24458
ns1.04
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/2 thread(s)
19792
ns19792
ns1
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/4 thread(s)
19833
ns19708
ns1.01
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/8 thread(s)
20042
ns20125
ns1.00
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/1 thread(s)
19875
ns19875
ns1
Dense(16 => 16, gelu)(16 x 128)/zygote/GPU/CUDA
236401.5
ns229625
ns1.03
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/2 thread(s)
23500
ns23562.5
ns1.00
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/4 thread(s)
23500
ns23542
ns1.00
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/8 thread(s)
23875
ns23791
ns1.00
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/1 thread(s)
21416
ns21520.5
ns1.00
Dense(128 => 128, identity)(128 x 128)/forward/CPU/2 thread(s)
28583.5
ns27000
ns1.06
Dense(128 => 128, identity)(128 x 128)/forward/CPU/4 thread(s)
28625
ns28416.5
ns1.01
Dense(128 => 128, identity)(128 x 128)/forward/CPU/8 thread(s)
28895.5
ns28188
ns1.03
Dense(128 => 128, identity)(128 x 128)/forward/CPU/1 thread(s)
46292
ns46083
ns1.00
Dense(128 => 128, identity)(128 x 128)/forward/GPU/CUDA
26179
ns25611
ns1.02
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/2 thread(s)
224187.5
ns224666
ns1.00
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/4 thread(s)
270084
ns278416
ns0.97
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/8 thread(s)
4123000
ns3900375.5
ns1.06
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/1 thread(s)
145500
ns145292
ns1.00
Dense(128 => 128, identity)(128 x 128)/zygote/GPU/CUDA
212922.5
ns211892
ns1.00
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/2 thread(s)
242145.5
ns243417
ns0.99
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/4 thread(s)
287834
ns295959
ns0.97
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/8 thread(s)
4006583
ns4528416.5
ns0.88
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/1 thread(s)
145875
ns145875
ns1
Dense(16 => 16, identity)(16 x 128)/forward/CPU/2 thread(s)
2000
ns2667
ns0.75
Dense(16 => 16, identity)(16 x 128)/forward/CPU/4 thread(s)
1500
ns1791
ns0.84
Dense(16 => 16, identity)(16 x 128)/forward/CPU/8 thread(s)
2458
ns2708
ns0.91
Dense(16 => 16, identity)(16 x 128)/forward/CPU/1 thread(s)
2041
ns1959
ns1.04
Dense(16 => 16, identity)(16 x 128)/forward/GPU/CUDA
23181
ns23071
ns1.00
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/2 thread(s)
5375
ns5125
ns1.05
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/4 thread(s)
5000
ns5083
ns0.98
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/8 thread(s)
5417
ns5333
ns1.02
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/1 thread(s)
4917
ns5125
ns0.96
Dense(16 => 16, identity)(16 x 128)/zygote/GPU/CUDA
277300.5
ns266994
ns1.04
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/2 thread(s)
7458
ns7500
ns0.99
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/4 thread(s)
7375
ns7500
ns0.98
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/8 thread(s)
7792
ns7625
ns1.02
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/1 thread(s)
5125
ns5125
ns1
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
79972458
ns80068250
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
47857917
ns47839854.5
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
43307917
ns43348791
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
151540125
ns151521792
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/GPU/CUDA
2710162
ns2715083
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
662506875
ns665235792
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
410576167
ns410381834
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
397618416.5
ns394582542
ns1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
683832250
ns682653250
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
14567626
ns14595495
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
714189229
ns712441042
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
665454667
ns680663916
ns0.98
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
1013219583
ns1031283708
ns0.98
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
1002665583
ns997418875
ns1.01
This comment was automatically generated by workflow using github-action-benchmark.