Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
chore: bump compat for JLD2 to 0.5 for package ImageNet, (keep existi…
…ng compat) (#886) Co-authored-by: CompatHelper Julia <compathelper_noreply@julialang.org>
- Loading branch information
d381969
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lux Benchmarks
Dense(512 => 512, identity)(512 x 128)/forward/CPU/2 thread(s)
411750
ns409792
ns1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/4 thread(s)
322541
ns322250
ns1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/8 thread(s)
323000
ns243583
ns1.33
Dense(512 => 512, identity)(512 x 128)/forward/CPU/1 thread(s)
741125
ns739625
ns1.00
Dense(512 => 512, identity)(512 x 128)/forward/GPU/CUDA
43519
ns44053
ns0.99
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/2 thread(s)
1352646
ns1353834
ns1.00
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/4 thread(s)
2410520.5
ns2426458
ns0.99
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/8 thread(s)
14567000
ns16512459
ns0.88
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/1 thread(s)
2198750
ns2191083.5
ns1.00
Dense(512 => 512, identity)(512 x 128)/zygote/GPU/CUDA
207751
ns209370
ns0.99
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/2 thread(s)
1399542
ns1454375
ns0.96
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/4 thread(s)
873625
ns908458
ns0.96
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/8 thread(s)
1692500
ns1834875
ns0.92
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/1 thread(s)
2212958
ns2240458.5
ns0.99
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
1771458.5
ns1748562.5
ns1.01
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1103187.5
ns1089395.5
ns1.01
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1508541
ns1512729
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
2942874.5
ns3013750
ns0.98
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/GPU/CUDA
209185
ns208817.5
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
12143291.5
ns12152041.5
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
8815750
ns8814875
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
9216667
ns9198917
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
18607583.5
ns18613479
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1485488
ns1488013.5
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
17281166
ns17304750
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
13953458
ns13952770.5
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
14500417
ns14533958
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
21857458
ns21843833.5
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
250208625
ns250399541.5
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
148288167
ns148350083
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
116176854
ns117130083
ns0.99
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
447713791
ns450838083
ns0.99
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/GPU/CUDA
5477076
ns5478039
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
1219737541
ns1223340875
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
929484250
ns931640292
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
831918520.5
ns831594354.5
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
1635891459
ns1647325416
ns0.99
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
31213208
ns31506744.5
ns0.99
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
1133213417
ns1144335875
ns0.99
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
980873625
ns995382583.5
ns0.99
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
1340231687.5
ns1322398292
ns1.01
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
1731769583
ns1739450208
ns1.00
lenet(28, 28, 1, 32)/forward/CPU/2 thread(s)
1090875
ns1068417
ns1.02
lenet(28, 28, 1, 32)/forward/CPU/4 thread(s)
1577729
ns1603458.5
ns0.98
lenet(28, 28, 1, 32)/forward/CPU/8 thread(s)
4079000
ns3760063
ns1.08
lenet(28, 28, 1, 32)/forward/CPU/1 thread(s)
780666
ns782062
ns1.00
lenet(28, 28, 1, 32)/forward/GPU/CUDA
260126.5
ns261189.5
ns1.00
lenet(28, 28, 1, 32)/zygote/CPU/2 thread(s)
2947896
ns3001979
ns0.98
lenet(28, 28, 1, 32)/zygote/CPU/4 thread(s)
4119187.5
ns4127958
ns1.00
lenet(28, 28, 1, 32)/zygote/CPU/8 thread(s)
10901042
ns10894833
ns1.00
lenet(28, 28, 1, 32)/zygote/CPU/1 thread(s)
3160729
ns3233270.5
ns0.98
lenet(28, 28, 1, 32)/zygote/GPU/CUDA
1100430
ns1128601
ns0.98
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
2330791.5
ns2312312.5
ns1.01
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1427875
ns1427541.5
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1672479
ns1552396
ns1.08
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
4208291.5
ns4205417
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/GPU/CUDA
207691
ns207575
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
19390833
ns19386792
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
16081417
ns16057458
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
17319916.5
ns17256291
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
25940458
ns25860208
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1590182
ns1590086
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
34308604.5
ns34375666
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
30637541.5
ns30899458.5
ns0.99
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
31121708
ns31158000
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
36558542
ns36246917
ns1.01
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
4536583.5
ns4546167
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2768292
ns2772584
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2919709
ns2682438
ns1.09
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
8393437.5
ns8378667
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/GPU/CUDA
427507
ns420456
ns1.02
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
38925624.5
ns38885979.5
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
32072729
ns32074313
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
32202792
ns32239667
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
52013208
ns51823708
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
2621720
ns2618884
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
81565208
ns82643500
ns0.99
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
112402208
ns112560458
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
218212166
ns185039874.5
ns1.18
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
74389500
ns73747708
ns1.01
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
267897375
ns268204791.5
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
159400292
ns159374708
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
127046375
ns123950416.5
ns1.02
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
485507209
ns485039833
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/GPU/CUDA
6981458
ns7043693
ns0.99
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
1471695708.5
ns1468109979
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
1169932458
ns1174089583
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
1072471604
ns1065212458.5
ns1.01
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
2003667895.5
ns2013851104.5
ns0.99
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
34727316
ns34531403
ns1.01
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
1695553417
ns1695591000
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
1467223249.5
ns1493306146
ns0.98
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
1864132334
ns1801755584
ns1.03
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
2207761250
ns2201440812.5
ns1.00
lenet(28, 28, 1, 128)/forward/CPU/2 thread(s)
1831895.5
ns1806792
ns1.01
lenet(28, 28, 1, 128)/forward/CPU/4 thread(s)
2554979
ns2531562
ns1.01
lenet(28, 28, 1, 128)/forward/CPU/8 thread(s)
7371125
ns7672666
ns0.96
lenet(28, 28, 1, 128)/forward/CPU/1 thread(s)
2471937.5
ns2462833
ns1.00
lenet(28, 28, 1, 128)/forward/GPU/CUDA
275322
ns266951
ns1.03
lenet(28, 28, 1, 128)/zygote/CPU/2 thread(s)
9374562.5
ns9343333
ns1.00
lenet(28, 28, 1, 128)/zygote/CPU/4 thread(s)
11397062.5
ns11495750
ns0.99
lenet(28, 28, 1, 128)/zygote/CPU/8 thread(s)
25149625
ns26058854.5
ns0.97
lenet(28, 28, 1, 128)/zygote/CPU/1 thread(s)
11776917
ns11770625
ns1.00
lenet(28, 28, 1, 128)/zygote/GPU/CUDA
1196044.5
ns1165407
ns1.03
vgg16(32, 32, 3, 32)/forward/CPU/2 thread(s)
380496500
ns379821291
ns1.00
vgg16(32, 32, 3, 32)/forward/CPU/4 thread(s)
283000250
ns284431333.5
ns0.99
vgg16(32, 32, 3, 32)/forward/CPU/8 thread(s)
248274541.5
ns276993833.5
ns0.90
vgg16(32, 32, 3, 32)/forward/CPU/1 thread(s)
453217104.5
ns453499125
ns1.00
vgg16(32, 32, 3, 32)/forward/GPU/CUDA
4853395.5
ns4933427
ns0.98
vgg16(32, 32, 3, 32)/zygote/CPU/2 thread(s)
1146433292
ns1154735042
ns0.99
vgg16(32, 32, 3, 32)/zygote/CPU/4 thread(s)
939675875
ns934566458
ns1.01
vgg16(32, 32, 3, 32)/zygote/CPU/8 thread(s)
910277166
ns1022641417
ns0.89
vgg16(32, 32, 3, 32)/zygote/CPU/1 thread(s)
1399365083
ns1392634541
ns1.00
vgg16(32, 32, 3, 32)/zygote/GPU/CUDA
17839841
ns18839648
ns0.95
lenet(28, 28, 1, 64)/forward/CPU/2 thread(s)
1050625
ns1047667
ns1.00
lenet(28, 28, 1, 64)/forward/CPU/4 thread(s)
1869750
ns1906208
ns0.98
lenet(28, 28, 1, 64)/forward/CPU/8 thread(s)
5510375
ns6506020.5
ns0.85
lenet(28, 28, 1, 64)/forward/CPU/1 thread(s)
1400416.5
ns1385270.5
ns1.01
lenet(28, 28, 1, 64)/forward/GPU/CUDA
275552.5
ns268224
ns1.03
lenet(28, 28, 1, 64)/zygote/CPU/2 thread(s)
6502291.5
ns6461437
ns1.01
lenet(28, 28, 1, 64)/zygote/CPU/4 thread(s)
13774167
ns13802959
ns1.00
lenet(28, 28, 1, 64)/zygote/CPU/8 thread(s)
21393292
ns21722625
ns0.98
lenet(28, 28, 1, 64)/zygote/CPU/1 thread(s)
6034209
ns6091083
ns0.99
lenet(28, 28, 1, 64)/zygote/GPU/CUDA
1255073
ns1208321
ns1.04
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
70518437
ns70468396
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
43453209
ns43613625
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
39518750
ns39889875
ns0.99
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
132645125
ns132854895.5
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/GPU/CUDA
1884124
ns1872456
ns1.01
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
355363583
ns355307875
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
269480917
ns270273125
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
254468500.5
ns254197770.5
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
535126104
ns534390229.5
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
12296923
ns12309296.5
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
395233833
ns395284167
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
392713625
ns394804354.5
ns0.99
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
727646187
ns701196333.5
ns1.04
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
710088708
ns711179875
ns1.00
vgg16(32, 32, 3, 128)/forward/CPU/2 thread(s)
1187138125
ns1186639833
ns1.00
vgg16(32, 32, 3, 128)/forward/CPU/4 thread(s)
689654395.5
ns689274542
ns1.00
vgg16(32, 32, 3, 128)/forward/CPU/8 thread(s)
631849125
ns640237249.5
ns0.99
vgg16(32, 32, 3, 128)/forward/CPU/1 thread(s)
1771680916.5
ns1775678646
ns1.00
vgg16(32, 32, 3, 128)/forward/GPU/CUDA
12312221
ns12314528
ns1.00
vgg16(32, 32, 3, 128)/zygote/CPU/2 thread(s)
3672801604
ns3680556646
ns1.00
vgg16(32, 32, 3, 128)/zygote/CPU/4 thread(s)
2818472834
ns2857162417
ns0.99
vgg16(32, 32, 3, 128)/zygote/CPU/8 thread(s)
2766537959
ns2854405625
ns0.97
vgg16(32, 32, 3, 128)/zygote/CPU/1 thread(s)
5065721375
ns5145784083
ns0.98
vgg16(32, 32, 3, 128)/zygote/GPU/CUDA
49571605.5
ns49808957
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
3419354
ns3409479
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2060937.5
ns2065084
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2506603.5
ns2479917
ns1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
6010687.5
ns6015479
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/GPU/CUDA
342327.5
ns341120
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
25984833
ns25925021
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
18848354
ns18915667
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
19326937.5
ns19134125.5
ns1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
39348166
ns39216437.5
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
2475985
ns2468869
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
55415292
ns55378250
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
81482791
ns81111916
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
171451083
ns174313958.5
ns0.98
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
45362833
ns45500125
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
1776500
ns1779417
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1083459
ns1092250
ns0.99
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1560333
ns1547583
ns1.01
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
3034625
ns3037625
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/GPU/CUDA
213929
ns212275
ns1.01
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
12513666
ns12533437.5
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
9181542
ns9199000
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
9611229.5
ns9578167
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
19017125
ns18975812.5
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1536683
ns1533549
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
17622541
ns17619125
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
14289271
ns14239459
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
14490562
ns14500521
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
22192084
ns22180250
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
70486875
ns70496583.5
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
43536166
ns43594834
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
39571167
ns39807625
ns0.99
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
132732875.5
ns132718979
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/GPU/CUDA
1880179
ns1947710
ns0.97
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
360038375
ns360073791
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
344199854
ns345868042
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
304988917
ns302741792
ns1.01
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
722793084
ns725319167
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
13374232
ns13371028
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
417916708
ns419555417
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
427335417
ns418148437.5
ns1.02
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
724586812.5
ns710077458.5
ns1.02
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
714886416
ns715636334
ns1.00
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/2 thread(s)
1666375
ns1661042
ns1.00
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/4 thread(s)
1345812.5
ns1277792
ns1.05
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/8 thread(s)
1349500
ns1134813
ns1.19
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/1 thread(s)
2306688
ns2433292
ns0.95
mlp7layer_bn(gelu)(32 x 256)/forward/GPU/CUDA
549983
ns584506.5
ns0.94
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/2 thread(s)
8943250
ns9020542
ns0.99
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/4 thread(s)
12857458
ns12869000
ns1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/8 thread(s)
30380396
ns32651417
ns0.93
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/1 thread(s)
9862708
ns9805792
ns1.01
mlp7layer_bn(gelu)(32 x 256)/zygote/GPU/CUDA
1482094
ns1428291
ns1.04
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/2 thread(s)
17149708.5
ns18111583
ns0.95
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/4 thread(s)
17069292
ns17253354
ns0.99
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/8 thread(s)
30182000
ns26535354
ns1.14
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/1 thread(s)
14403750.5
ns14356792
ns1.00
Dense(512 => 512, relu)(512 x 128)/forward/CPU/2 thread(s)
673250.5
ns710208
ns0.95
Dense(512 => 512, relu)(512 x 128)/forward/CPU/4 thread(s)
534416
ns599312.5
ns0.89
Dense(512 => 512, relu)(512 x 128)/forward/CPU/8 thread(s)
1039625
ns912395.5
ns1.14
Dense(512 => 512, relu)(512 x 128)/forward/CPU/1 thread(s)
726250
ns725791
ns1.00
Dense(512 => 512, relu)(512 x 128)/forward/GPU/CUDA
48750
ns47816
ns1.02
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/2 thread(s)
1562021
ns1582187.5
ns0.99
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/4 thread(s)
1008583.5
ns973833
ns1.04
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/8 thread(s)
1379479
ns1835187.5
ns0.75
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/1 thread(s)
2291979
ns2183125
ns1.05
Dense(512 => 512, relu)(512 x 128)/zygote/GPU/CUDA
242615.5
ns236731.5
ns1.02
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/2 thread(s)
1558791.5
ns1600083
ns0.97
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/4 thread(s)
1056625
ns1053041.5
ns1.00
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/8 thread(s)
1418667
ns1388771
ns1.02
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/1 thread(s)
2225042
ns2256062
ns0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
3409916
ns3409541.5
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2058979
ns2060229
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2482708
ns2482875
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
6018000
ns5998167
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/GPU/CUDA
286378
ns286197
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
24041250
ns24038625
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
17174104
ns17258666.5
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
17053959
ns17123396
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
37566250.5
ns37487104
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
2404563
ns2409477.5
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
53636896
ns54679729
ns0.98
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
81399459
ns84538542
ns0.96
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
168265249.5
ns157339000
ns1.07
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
44475917
ns44498708
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
249926041.5
ns250028813
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
148262292
ns147930708
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
116062417
ns116617291
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
448317354.5
ns454228375
ns0.99
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/GPU/CUDA
5340767
ns5443404
ns0.98
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
1102986708
ns1101896208
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
854720791.5
ns855324125.5
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
827927812
ns839930250.5
ns0.99
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
1754387958
ns1774005666
ns0.99
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
28314924
ns29278014
ns0.97
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
1008984916.5
ns1013677520.5
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
945336417
ns922761000
ns1.02
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
1185625042
ns1320593542
ns0.90
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
1728502750
ns1744904771
ns0.99
mlp7layer_bn(relu)(32 x 256)/forward/CPU/2 thread(s)
1263833
ns1230812.5
ns1.03
mlp7layer_bn(relu)(32 x 256)/forward/CPU/4 thread(s)
905542
ns967417
ns0.94
mlp7layer_bn(relu)(32 x 256)/forward/CPU/8 thread(s)
960083
ns669125
ns1.43
mlp7layer_bn(relu)(32 x 256)/forward/CPU/1 thread(s)
1944125
ns2028541
ns0.96
mlp7layer_bn(relu)(32 x 256)/forward/GPU/CUDA
575792
ns558507.5
ns1.03
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/2 thread(s)
5733792
ns6006292
ns0.95
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/4 thread(s)
6291604.5
ns6899417
ns0.91
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/8 thread(s)
24260958
ns25958937
ns0.93
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/1 thread(s)
7123042
ns7102312
ns1.00
mlp7layer_bn(relu)(32 x 256)/zygote/GPU/CUDA
1397495
ns1368625
ns1.02
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/2 thread(s)
10348333
ns10886750
ns0.95
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/4 thread(s)
9719437.5
ns9389042
ns1.04
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/8 thread(s)
16565333.5
ns17293854.5
ns0.96
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/1 thread(s)
8814042
ns7443459
ns1.18
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/2 thread(s)
402292
ns352104
ns1.14
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/4 thread(s)
413417
ns409416.5
ns1.01
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/8 thread(s)
2055375
ns3455917
ns0.59
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/1 thread(s)
88333
ns88750
ns1.00
Dense(128 => 128, gelu)(128 x 128)/forward/GPU/CUDA
28130
ns27682
ns1.02
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/2 thread(s)
349375
ns392604
ns0.89
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/4 thread(s)
401500
ns399000
ns1.01
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/8 thread(s)
4847312.5
ns4557125
ns1.06
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/1 thread(s)
258625
ns258875
ns1.00
Dense(128 => 128, gelu)(128 x 128)/zygote/GPU/CUDA
226757.5
ns221053
ns1.03
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/2 thread(s)
380917
ns422125
ns0.90
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/4 thread(s)
431667
ns429208
ns1.01
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/8 thread(s)
4832875
ns4755354
ns1.02
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/1 thread(s)
271250
ns270916
ns1.00
Dense(128 => 128, relu)(128 x 128)/forward/CPU/2 thread(s)
344833
ns305104
ns1.13
Dense(128 => 128, relu)(128 x 128)/forward/CPU/4 thread(s)
351917
ns348458
ns1.01
Dense(128 => 128, relu)(128 x 128)/forward/CPU/8 thread(s)
596333.5
ns635625
ns0.94
Dense(128 => 128, relu)(128 x 128)/forward/CPU/1 thread(s)
53083
ns54250
ns0.98
Dense(128 => 128, relu)(128 x 128)/forward/GPU/CUDA
28544
ns27950
ns1.02
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/2 thread(s)
297000
ns355958
ns0.83
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/4 thread(s)
275291.5
ns274500
ns1.00
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/8 thread(s)
721584
ns753208.5
ns0.96
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/1 thread(s)
151833
ns151667
ns1.00
Dense(128 => 128, relu)(128 x 128)/zygote/GPU/CUDA
211261.5
ns205458.5
ns1.03
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/2 thread(s)
308750
ns372292
ns0.83
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/4 thread(s)
291083
ns288521
ns1.01
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/8 thread(s)
599833
ns798979
ns0.75
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/1 thread(s)
151084
ns150792
ns1.00
vgg16(32, 32, 3, 64)/forward/CPU/2 thread(s)
602960542
ns602253459
ns1.00
vgg16(32, 32, 3, 64)/forward/CPU/4 thread(s)
427926645.5
ns430857604
ns0.99
vgg16(32, 32, 3, 64)/forward/CPU/8 thread(s)
374874229
ns392009125
ns0.96
vgg16(32, 32, 3, 64)/forward/CPU/1 thread(s)
872331042
ns877215958
ns0.99
vgg16(32, 32, 3, 64)/forward/GPU/CUDA
7026315
ns7028016
ns1.00
vgg16(32, 32, 3, 64)/zygote/CPU/2 thread(s)
2010620020.5
ns1996302145.5
ns1.01
vgg16(32, 32, 3, 64)/zygote/CPU/4 thread(s)
1601555416.5
ns1609994521
ns0.99
vgg16(32, 32, 3, 64)/zygote/CPU/8 thread(s)
1543795499.5
ns1565616166.5
ns0.99
vgg16(32, 32, 3, 64)/zygote/CPU/1 thread(s)
2638535458
ns2641861333
ns1.00
vgg16(32, 32, 3, 64)/zygote/GPU/CUDA
25732530.5
ns25992958
ns0.99
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/2 thread(s)
521479.5
ns536791.5
ns0.97
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/4 thread(s)
436917
ns435250
ns1.00
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/8 thread(s)
2359583.5
ns2792250
ns0.85
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/1 thread(s)
883459
ns865125
ns1.02
Dense(512 => 512, gelu)(512 x 128)/forward/GPU/CUDA
48099
ns47701
ns1.01
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/2 thread(s)
1859625
ns1900167
ns0.98
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/4 thread(s)
2566167
ns2798208
ns0.92
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/8 thread(s)
14721812.5
ns16325500
ns0.90
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/1 thread(s)
2772708
ns2771604
ns1.00
Dense(512 => 512, gelu)(512 x 128)/zygote/GPU/CUDA
258195.5
ns248374
ns1.04
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/2 thread(s)
1930708
ns1976729
ns0.98
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/4 thread(s)
5055042
ns5051583
ns1.00
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/8 thread(s)
14919271
ns16501146
ns0.90
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/1 thread(s)
2790500
ns2698083.5
ns1.03
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/2 thread(s)
1571728.5
ns1614854
ns0.97
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/4 thread(s)
1175187.5
ns1236833
ns0.95
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/8 thread(s)
1199667
ns1069583
ns1.12
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/1 thread(s)
2372709
ns2226209
ns1.07
mlp7layer_bn(tanh)(32 x 256)/forward/GPU/CUDA
547236.5
ns577670
ns0.95
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/2 thread(s)
6028333
ns5930562.5
ns1.02
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/4 thread(s)
4710562.5
ns6880833
ns0.68
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/8 thread(s)
24475354
ns26135520.5
ns0.94
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/1 thread(s)
7087875
ns7284792
ns0.97
mlp7layer_bn(tanh)(32 x 256)/zygote/GPU/CUDA
1396100
ns1356112
ns1.03
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/2 thread(s)
12455375
ns12782291
ns0.97
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/4 thread(s)
12268625
ns11955834
ns1.03
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/8 thread(s)
21774562.5
ns21105833.5
ns1.03
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/1 thread(s)
10955542
ns10667312.5
ns1.03
Dense(16 => 16, relu)(16 x 128)/forward/CPU/2 thread(s)
2375
ns2334
ns1.02
Dense(16 => 16, relu)(16 x 128)/forward/CPU/4 thread(s)
4562.5
ns4792
ns0.95
Dense(16 => 16, relu)(16 x 128)/forward/CPU/8 thread(s)
3000
ns3625
ns0.83
Dense(16 => 16, relu)(16 x 128)/forward/CPU/1 thread(s)
2667
ns2375
ns1.12
Dense(16 => 16, relu)(16 x 128)/forward/GPU/CUDA
25691
ns24681
ns1.04
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/2 thread(s)
7083
ns7333
ns0.97
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/4 thread(s)
7291
ns7250
ns1.01
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/8 thread(s)
7416
ns7167
ns1.03
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/1 thread(s)
7083
ns7291
ns0.97
Dense(16 => 16, relu)(16 x 128)/zygote/GPU/CUDA
217637.5
ns209372.5
ns1.04
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/2 thread(s)
8083
ns8333
ns0.97
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/4 thread(s)
8291
ns8292
ns1.00
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/8 thread(s)
8667
ns8500
ns1.02
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/1 thread(s)
6042
ns6000
ns1.01
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/2 thread(s)
10396
ns10312.5
ns1.01
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/4 thread(s)
12625
ns14125
ns0.89
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/8 thread(s)
10667
ns10687.5
ns1.00
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/1 thread(s)
9562.5
ns7167
ns1.33
Dense(16 => 16, gelu)(16 x 128)/forward/GPU/CUDA
25234.5
ns24485
ns1.03
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/2 thread(s)
19834
ns19958
ns0.99
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/4 thread(s)
20167
ns20041.5
ns1.01
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/8 thread(s)
20167
ns19833
ns1.02
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/1 thread(s)
20125
ns20000
ns1.01
Dense(16 => 16, gelu)(16 x 128)/zygote/GPU/CUDA
238032.5
ns229359
ns1.04
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/2 thread(s)
23292
ns23395.5
ns1.00
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/4 thread(s)
23667
ns23750
ns1.00
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/8 thread(s)
23750
ns23542
ns1.01
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/1 thread(s)
21416
ns21333
ns1.00
Dense(128 => 128, identity)(128 x 128)/forward/CPU/2 thread(s)
28792
ns28875
ns1.00
Dense(128 => 128, identity)(128 x 128)/forward/CPU/4 thread(s)
29167
ns28750
ns1.01
Dense(128 => 128, identity)(128 x 128)/forward/CPU/8 thread(s)
28896
ns29083
ns0.99
Dense(128 => 128, identity)(128 x 128)/forward/CPU/1 thread(s)
46917
ns46041
ns1.02
Dense(128 => 128, identity)(128 x 128)/forward/GPU/CUDA
26423
ns25546
ns1.03
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/2 thread(s)
233583
ns221812.5
ns1.05
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/4 thread(s)
286187.5
ns279708
ns1.02
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/8 thread(s)
4211271
ns4417417
ns0.95
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/1 thread(s)
145625
ns145625
ns1
Dense(128 => 128, identity)(128 x 128)/zygote/GPU/CUDA
219292
ns211875.5
ns1.04
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/2 thread(s)
347958
ns332875
ns1.05
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/4 thread(s)
331959
ns321125
ns1.03
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/8 thread(s)
870041
ns562312.5
ns1.55
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/1 thread(s)
168125
ns161625
ns1.04
Dense(16 => 16, identity)(16 x 128)/forward/CPU/2 thread(s)
1875
ns2083
ns0.90
Dense(16 => 16, identity)(16 x 128)/forward/CPU/4 thread(s)
2750
ns2125
ns1.29
Dense(16 => 16, identity)(16 x 128)/forward/CPU/8 thread(s)
2417
ns3875
ns0.62
Dense(16 => 16, identity)(16 x 128)/forward/CPU/1 thread(s)
2208.5
ns1709
ns1.29
Dense(16 => 16, identity)(16 x 128)/forward/GPU/CUDA
23633
ns22559
ns1.05
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/2 thread(s)
5125
ns5334
ns0.96
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/4 thread(s)
5166
ns5437.5
ns0.95
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/8 thread(s)
5500
ns5458
ns1.01
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/1 thread(s)
5334
ns5417
ns0.98
Dense(16 => 16, identity)(16 x 128)/zygote/GPU/CUDA
265711
ns254509.5
ns1.04
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/2 thread(s)
11167
ns11708
ns0.95
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/4 thread(s)
11291.5
ns11416
ns0.99
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/8 thread(s)
11458
ns11416
ns1.00
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/1 thread(s)
6917
ns6750
ns1.02
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
79845791
ns79881458
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
49133292
ns49107667
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
45028500
ns43180145.5
ns1.04
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
151579083
ns151771375
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/GPU/CUDA
2681579.5
ns2680326.5
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
605176584
ns662703292
ns0.91
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
410334917
ns414205958
ns0.99
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
400305958
ns397227958
ns1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
683219833
ns688889667
ns0.99
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
14579057
ns14602708
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
711709958.5
ns715248166.5
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
670438250
ns686640708
ns0.98
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
1044946792
ns1044047896
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
997944375
ns994524042
ns1.00
This comment was automatically generated by workflow using github-action-benchmark.