-
Notifications
You must be signed in to change notification settings - Fork 59
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
3 changed files
with
1 addition
and
11 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
4f29928
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lux Benchmarks
Dense(512 => 512, identity)(512 x 128)/forward/CPU/2 thread(s)
414750
ns414167
ns1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/4 thread(s)
243729.5
ns243812.5
ns1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/8 thread(s)
243645.5
ns243375
ns1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/1 thread(s)
739937.5
ns739750
ns1.00
Dense(512 => 512, identity)(512 x 128)/forward/GPU/CUDA
44131.5
ns43608.5
ns1.01
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/2 thread(s)
1277770.5
ns1274750
ns1.00
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/4 thread(s)
1251833
ns1257604
ns1.00
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/8 thread(s)
16532875
ns16232709
ns1.02
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/1 thread(s)
2259416
ns2193229
ns1.03
Dense(512 => 512, identity)(512 x 128)/zygote/GPU/CUDA
211816
ns205508.5
ns1.03
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/2 thread(s)
1353417
ns1311791
ns1.03
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/4 thread(s)
1287562.5
ns1296000
ns0.99
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/8 thread(s)
16470396.5
ns16564750
ns0.99
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/1 thread(s)
2246000
ns2236917
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
1755729
ns1656771
ns1.06
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1021000.5
ns1101167
ns0.93
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1537666
ns1519083
ns1.01
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
2999834
ns2996500
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/GPU/CUDA
209878
ns206771
ns1.02
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
12143375
ns12074917
ns1.01
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
8839500
ns8846125
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
9220916.5
ns9185812.5
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
18588542
ns18620646
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1491282
ns1506641
ns0.99
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
17297167
ns17279459
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
13998334
ns14009229.5
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
14528812.5
ns14468291.5
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
21846291.5
ns21873146
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
250636687.5
ns252162083.5
ns0.99
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
148810208
ns148884583
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
116894000
ns116232875
ns1.01
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
447336459
ns447534666
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/GPU/CUDA
5498322
ns5465296
ns1.01
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
1223363500
ns1230946875
ns0.99
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
932727167
ns931953750
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
835497354
ns826867750.5
ns1.01
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
1631111709
ns1631748667
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
31309225
ns31362804
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
1143243667
ns1146184875
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
994946937.5
ns997853916.5
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
1312863792
ns1329065916.5
ns0.99
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
1733454958
ns1736617187.5
ns1.00
lenet(28, 28, 1, 32)/forward/CPU/2 thread(s)
1116500
ns1111541.5
ns1.00
lenet(28, 28, 1, 32)/forward/CPU/4 thread(s)
1643667
ns1663917
ns0.99
lenet(28, 28, 1, 32)/forward/CPU/8 thread(s)
3643542
ns3634917
ns1.00
lenet(28, 28, 1, 32)/forward/CPU/1 thread(s)
789041
ns788500
ns1.00
lenet(28, 28, 1, 32)/forward/GPU/CUDA
269726
ns262430.5
ns1.03
lenet(28, 28, 1, 32)/zygote/CPU/2 thread(s)
2991666
ns2981646
ns1.00
lenet(28, 28, 1, 32)/zygote/CPU/4 thread(s)
4148959
ns4151854.5
ns1.00
lenet(28, 28, 1, 32)/zygote/CPU/8 thread(s)
11609792
ns10487312.5
ns1.11
lenet(28, 28, 1, 32)/zygote/CPU/1 thread(s)
3148729
ns3265083
ns0.96
lenet(28, 28, 1, 32)/zygote/GPU/CUDA
1125192
ns1131749
ns0.99
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
2335333.5
ns2342791
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1299062.5
ns1260000
ns1.03
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1557416
ns1539542
ns1.01
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
4213104
ns4176916
ns1.01
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/GPU/CUDA
210272
ns208157.5
ns1.01
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
19407291
ns19392625
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
16096667
ns16105895.5
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
17317666.5
ns17329250
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
25907750
ns25905125
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1592153
ns1607984
ns0.99
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
34310312.5
ns34168604
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
30986792
ns30734292
ns1.01
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
31273833
ns30891041.5
ns1.01
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
36596250
ns36714750
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
4515667
ns4532000
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2556459
ns2546584
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2688249.5
ns2675583.5
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
8389334
ns8386333
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/GPU/CUDA
427314.5
ns419971
ns1.02
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
39047458
ns38621250
ns1.01
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
32181166.5
ns32144146
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
32260292
ns32234313
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
52002833
ns51925709
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
2620622.5
ns2628667
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
89392542
ns89245375
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
115571416.5
ns115663979
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
230633541.5
ns223717000
ns1.03
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
74339646
ns74519062.5
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
268599084
ns270237667
ns0.99
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
156359333
ns156197542
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
123836375
ns123423271
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
485309084
ns485408250
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/GPU/CUDA
7055046.5
ns7027939
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
1473572104
ns1473080062.5
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
1171549667
ns1168760792
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
1064310166.5
ns1063953145.5
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
2002969750
ns2006090104
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
34640088
ns34772934.5
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
1719792625
ns1719270959
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
1528780229.5
ns1530344979
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
1913538000
ns1879104875
ns1.02
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
2212588749.5
ns2217620458
ns1.00
lenet(28, 28, 1, 128)/forward/CPU/2 thread(s)
2096854
ns2066124.5
ns1.01
lenet(28, 28, 1, 128)/forward/CPU/4 thread(s)
3045708
ns3080917
ns0.99
lenet(28, 28, 1, 128)/forward/CPU/8 thread(s)
7809458
ns7964834
ns0.98
lenet(28, 28, 1, 128)/forward/CPU/1 thread(s)
2327583
ns2511771
ns0.93
lenet(28, 28, 1, 128)/forward/GPU/CUDA
272243
ns272286
ns1.00
lenet(28, 28, 1, 128)/zygote/CPU/2 thread(s)
9676145.5
ns9629792
ns1.00
lenet(28, 28, 1, 128)/zygote/CPU/4 thread(s)
12104584
ns12051208
ns1.00
lenet(28, 28, 1, 128)/zygote/CPU/8 thread(s)
25834750.5
ns23782666.5
ns1.09
lenet(28, 28, 1, 128)/zygote/CPU/1 thread(s)
11757916.5
ns11321791
ns1.04
lenet(28, 28, 1, 128)/zygote/GPU/CUDA
1202175.5
ns1192316.5
ns1.01
vgg16(32, 32, 3, 32)/forward/CPU/2 thread(s)
381271021
ns379182875
ns1.01
vgg16(32, 32, 3, 32)/forward/CPU/4 thread(s)
310142458.5
ns311332270.5
ns1.00
vgg16(32, 32, 3, 32)/forward/CPU/8 thread(s)
259645854
ns260260313
ns1.00
vgg16(32, 32, 3, 32)/forward/CPU/1 thread(s)
452979375.5
ns450681833
ns1.01
vgg16(32, 32, 3, 32)/forward/GPU/CUDA
4824218
ns4857816
ns0.99
vgg16(32, 32, 3, 32)/zygote/CPU/2 thread(s)
1158461917
ns1151703750
ns1.01
vgg16(32, 32, 3, 32)/zygote/CPU/4 thread(s)
943496166
ns938427709
ns1.01
vgg16(32, 32, 3, 32)/zygote/CPU/8 thread(s)
962405792
ns943142791
ns1.02
vgg16(32, 32, 3, 32)/zygote/CPU/1 thread(s)
1401166834
ns1396853084
ns1.00
vgg16(32, 32, 3, 32)/zygote/GPU/CUDA
17976769
ns17794579
ns1.01
lenet(28, 28, 1, 64)/forward/CPU/2 thread(s)
1054125
ns1048833
ns1.01
lenet(28, 28, 1, 64)/forward/CPU/4 thread(s)
1661875
ns1655208.5
ns1.00
lenet(28, 28, 1, 64)/forward/CPU/8 thread(s)
5315041.5
ns4851812
ns1.10
lenet(28, 28, 1, 64)/forward/CPU/1 thread(s)
1312541
ns1291167
ns1.02
lenet(28, 28, 1, 64)/forward/GPU/CUDA
262820
ns278270.5
ns0.94
lenet(28, 28, 1, 64)/zygote/CPU/2 thread(s)
6268084
ns6497104
ns0.96
lenet(28, 28, 1, 64)/zygote/CPU/4 thread(s)
13117917
ns13086396
ns1.00
lenet(28, 28, 1, 64)/zygote/CPU/8 thread(s)
19113646
ns18753875
ns1.02
lenet(28, 28, 1, 64)/zygote/CPU/1 thread(s)
6086916.5
ns5891208.5
ns1.03
lenet(28, 28, 1, 64)/zygote/GPU/CUDA
1202851
ns1253158.5
ns0.96
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
70511500
ns70556458
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
43790645.5
ns44452167
ns0.99
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
39687083
ns39837500
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
132685124.5
ns132581125
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/GPU/CUDA
1858084
ns1865473
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
356003687.5
ns356767520.5
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
270657292
ns272336833
ns0.99
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
253711000
ns255661771
ns0.99
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
535180938
ns534829208.5
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
12300153.5
ns12304649
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
396495000
ns395040042
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
373274250
ns370401500
ns1.01
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
728479375
ns693812291
ns1.05
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
713118958
ns711246750
ns1.00
vgg16(32, 32, 3, 128)/forward/CPU/2 thread(s)
1190615875
ns1188023709
ns1.00
vgg16(32, 32, 3, 128)/forward/CPU/4 thread(s)
832981520.5
ns835256562.5
ns1.00
vgg16(32, 32, 3, 128)/forward/CPU/8 thread(s)
636784729
ns638885750
ns1.00
vgg16(32, 32, 3, 128)/forward/CPU/1 thread(s)
1772366146
ns1768729250
ns1.00
vgg16(32, 32, 3, 128)/forward/GPU/CUDA
12314533
ns12316863.5
ns1.00
vgg16(32, 32, 3, 128)/zygote/CPU/2 thread(s)
3626681875
ns3627838020.5
ns1.00
vgg16(32, 32, 3, 128)/zygote/CPU/4 thread(s)
2823362334
ns2824735750
ns1.00
vgg16(32, 32, 3, 128)/zygote/CPU/8 thread(s)
2696862625
ns2694929167
ns1.00
vgg16(32, 32, 3, 128)/zygote/CPU/1 thread(s)
5012508375
ns5002434750
ns1.00
vgg16(32, 32, 3, 128)/zygote/GPU/CUDA
49738182
ns49730192
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
3411709
ns3432375.5
ns0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2072167
ns2078583
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2511937
ns2530500
ns0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
6036208.5
ns6020833
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/GPU/CUDA
339526
ns339043.5
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
26069125
ns25844354
ns1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
19056209
ns18918770.5
ns1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
19089979
ns19719959
ns0.97
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
39346459
ns39362209
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
2459884
ns2460010
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
54485937.5
ns54493625
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
83611520.5
ns84184417
ns0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
172934000
ns173059688
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
45625250
ns45573959
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
1782458
ns1783437.5
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1105812.5
ns1098584
ns1.01
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1567000
ns1563624.5
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
3042542
ns3028979
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/GPU/CUDA
211807
ns212147.5
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
12573896
ns12574667
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
9226667
ns9223854
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
9578916
ns9681958
ns0.99
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
19028833
ns18996416
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1529077
ns1525057
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
17667042
ns17650833
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
14347562.5
ns14332292
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
14567083
ns14552750
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
22199146
ns22194208
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
70625625
ns70637271
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
43726917
ns44500249.5
ns0.98
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
39746812.5
ns40038333
ns0.99
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
132787937.5
ns132595500
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/GPU/CUDA
1934732
ns1878861
ns1.03
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
359903500
ns361106062
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
348164021
ns349644938
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
304529250
ns304116708.5
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
723383792
ns723634000
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
13382889
ns13382866.5
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
421402083.5
ns419845083.5
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
425694708
ns427670459
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
747909395.5
ns765524104
ns0.98
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
716447375
ns715822875
ns1.00
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/2 thread(s)
1592833
ns1591792
ns1.00
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/4 thread(s)
1158167
ns1165292
ns0.99
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/8 thread(s)
1146250
ns1150479.5
ns1.00
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/1 thread(s)
2412542
ns2435375
ns0.99
mlp7layer_bn(gelu)(32 x 256)/forward/GPU/CUDA
572386.5
ns580934.5
ns0.99
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/2 thread(s)
8854896
ns8855583
ns1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/4 thread(s)
13602562.5
ns13566583
ns1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/8 thread(s)
33345229.5
ns33371313
ns1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/1 thread(s)
9874291.5
ns9856250
ns1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/GPU/CUDA
1430613
ns1447660.5
ns0.99
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/2 thread(s)
16524209
ns16614333.5
ns0.99
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/4 thread(s)
23380666
ns22957687.5
ns1.02
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/8 thread(s)
43658750
ns45530875
ns0.96
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/1 thread(s)
13137667
ns13137979
ns1.00
Dense(512 => 512, relu)(512 x 128)/forward/CPU/2 thread(s)
824166.5
ns830833
ns0.99
Dense(512 => 512, relu)(512 x 128)/forward/CPU/4 thread(s)
570124.5
ns515458
ns1.11
Dense(512 => 512, relu)(512 x 128)/forward/CPU/8 thread(s)
1063916
ns1061583
ns1.00
Dense(512 => 512, relu)(512 x 128)/forward/CPU/1 thread(s)
725458.5
ns723895.5
ns1.00
Dense(512 => 512, relu)(512 x 128)/forward/GPU/CUDA
47937
ns48058.5
ns1.00
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/2 thread(s)
1459250
ns1549792
ns0.94
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/4 thread(s)
1049437
ns1043458
ns1.01
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/8 thread(s)
1395208
ns1717459
ns0.81
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/1 thread(s)
2260625
ns2249729
ns1.00
Dense(512 => 512, relu)(512 x 128)/zygote/GPU/CUDA
238994
ns235968.5
ns1.01
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/2 thread(s)
1530500
ns1556416
ns0.98
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/4 thread(s)
1089333
ns1068292
ns1.02
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/8 thread(s)
1620292
ns1707875
ns0.95
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/1 thread(s)
2253083
ns2224354
ns1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
3403625
ns3404875
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2061291.5
ns2061708
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2484792
ns2526583
ns0.98
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
6026312.5
ns6005458
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/GPU/CUDA
284269
ns284654
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
24093750
ns24057375
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
17201292
ns17188917
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
17041500
ns17108854
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
37570375
ns37589750
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
2411977
ns2418683.5
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
52911625
ns52962291.5
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
84393791
ns85344416
ns0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
172819250
ns171244354
ns1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
44615937.5
ns44652208.5
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
250487334
ns251293750
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
148602334
ns148493709
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
116391208
ns116314333.5
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
448074791.5
ns447949229.5
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/GPU/CUDA
5454241
ns5446386
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
1105117875
ns1103974709
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
858058396
ns855630395.5
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
825075479.5
ns831750854.5
ns0.99
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
1753955542
ns1754110584
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
28910957.5
ns28887646
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
1030979062.5
ns1030795771
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
972989292
ns973527459
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
1286035166
ns1276835833
ns1.01
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
1723177166.5
ns1741435895.5
ns0.99
mlp7layer_bn(relu)(32 x 256)/forward/CPU/2 thread(s)
1140750
ns1102104.5
ns1.04
mlp7layer_bn(relu)(32 x 256)/forward/CPU/4 thread(s)
760750
ns764333
ns1.00
mlp7layer_bn(relu)(32 x 256)/forward/CPU/8 thread(s)
752167
ns784979
ns0.96
mlp7layer_bn(relu)(32 x 256)/forward/CPU/1 thread(s)
2053417
ns1957854
ns1.05
mlp7layer_bn(relu)(32 x 256)/forward/GPU/CUDA
562591
ns563252
ns1.00
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/2 thread(s)
5876834
ns5885125
ns1.00
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/4 thread(s)
8974250
ns9085895.5
ns0.99
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/8 thread(s)
25959750
ns26897042
ns0.97
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/1 thread(s)
7106396
ns7099083
ns1.00
mlp7layer_bn(relu)(32 x 256)/zygote/GPU/CUDA
1411580
ns1415829
ns1.00
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/2 thread(s)
9670166.5
ns9699771
ns1.00
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/4 thread(s)
16148166
ns15967729
ns1.01
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/8 thread(s)
33000792
ns32771687.5
ns1.01
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/1 thread(s)
7621875
ns7633666
ns1.00
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/2 thread(s)
516896
ns514458
ns1.00
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/4 thread(s)
415479.5
ns384604.5
ns1.08
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/8 thread(s)
2957791.5
ns3059459
ns0.97
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/1 thread(s)
89500
ns87833
ns1.02
Dense(128 => 128, gelu)(128 x 128)/forward/GPU/CUDA
28198
ns28219
ns1.00
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/2 thread(s)
380583
ns381812.5
ns1.00
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/4 thread(s)
444083.5
ns447750
ns0.99
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/8 thread(s)
4683416
ns4678459
ns1.00
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/1 thread(s)
258979.5
ns258375
ns1.00
Dense(128 => 128, gelu)(128 x 128)/zygote/GPU/CUDA
227826.5
ns228924.5
ns1.00
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/2 thread(s)
413416
ns410916.5
ns1.01
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/4 thread(s)
475458
ns479208
ns0.99
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/8 thread(s)
4631791
ns4649000
ns1.00
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/1 thread(s)
271583
ns270833
ns1.00
Dense(128 => 128, relu)(128 x 128)/forward/CPU/2 thread(s)
462958
ns461250.5
ns1.00
Dense(128 => 128, relu)(128 x 128)/forward/CPU/4 thread(s)
355875
ns322625
ns1.10
Dense(128 => 128, relu)(128 x 128)/forward/CPU/8 thread(s)
767000.5
ns768834
ns1.00
Dense(128 => 128, relu)(128 x 128)/forward/CPU/1 thread(s)
53917
ns52875
ns1.02
Dense(128 => 128, relu)(128 x 128)/forward/GPU/CUDA
28301
ns28278
ns1.00
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/2 thread(s)
339959
ns342333
ns0.99
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/4 thread(s)
341521
ns347625
ns0.98
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/8 thread(s)
898375
ns396687
ns2.26
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/1 thread(s)
151708
ns151250
ns1.00
Dense(128 => 128, relu)(128 x 128)/zygote/GPU/CUDA
212644.5
ns212495
ns1.00
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/2 thread(s)
355000
ns356000
ns1.00
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/4 thread(s)
356709
ns362937.5
ns0.98
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/8 thread(s)
944500
ns740771
ns1.28
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/1 thread(s)
151167
ns150875
ns1.00
vgg16(32, 32, 3, 64)/forward/CPU/2 thread(s)
603130416
ns601061209
ns1.00
vgg16(32, 32, 3, 64)/forward/CPU/4 thread(s)
428986854
ns430671250
ns1.00
vgg16(32, 32, 3, 64)/forward/CPU/8 thread(s)
386662562
ns383040583
ns1.01
vgg16(32, 32, 3, 64)/forward/CPU/1 thread(s)
871726083.5
ns870727020.5
ns1.00
vgg16(32, 32, 3, 64)/forward/GPU/CUDA
7027236
ns7032100
ns1.00
vgg16(32, 32, 3, 64)/zygote/CPU/2 thread(s)
2003136437
ns2000504228.5
ns1.00
vgg16(32, 32, 3, 64)/zygote/CPU/4 thread(s)
1606958687.5
ns1604685125
ns1.00
vgg16(32, 32, 3, 64)/zygote/CPU/8 thread(s)
1550423687
ns1652458646
ns0.94
vgg16(32, 32, 3, 64)/zygote/CPU/1 thread(s)
2625941250
ns2626165250
ns1.00
vgg16(32, 32, 3, 64)/zygote/GPU/CUDA
25917847
ns25934443
ns1.00
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/2 thread(s)
520000
ns526333
ns0.99
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/4 thread(s)
394895.5
ns400458.5
ns0.99
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/8 thread(s)
2701958
ns3022187.5
ns0.89
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/1 thread(s)
866188
ns868667
ns1.00
Dense(512 => 512, gelu)(512 x 128)/forward/GPU/CUDA
47079
ns47967.5
ns0.98
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/2 thread(s)
1772187.5
ns1757062.5
ns1.01
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/4 thread(s)
1781709
ns1694333
ns1.05
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/8 thread(s)
16286125
ns16312334
ns1.00
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/1 thread(s)
2723250
ns2651375
ns1.03
Dense(512 => 512, gelu)(512 x 128)/zygote/GPU/CUDA
248319.5
ns257253
ns0.97
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/2 thread(s)
1850645.5
ns1894750.5
ns0.98
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/4 thread(s)
1848146
ns1834625
ns1.01
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/8 thread(s)
16689875
ns16537333
ns1.01
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/1 thread(s)
2754291
ns2736604.5
ns1.01
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/2 thread(s)
1469521
ns1496021
ns0.98
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/4 thread(s)
1034625
ns931750
ns1.11
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/8 thread(s)
988249.5
ns1059667
ns0.93
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/1 thread(s)
2212416.5
ns2319292
ns0.95
mlp7layer_bn(tanh)(32 x 256)/forward/GPU/CUDA
574726
ns585808.5
ns0.98
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/2 thread(s)
5868937.5
ns5882458
ns1.00
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/4 thread(s)
9178042
ns8563167
ns1.07
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/8 thread(s)
27617875
ns26031937
ns1.06
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/1 thread(s)
7341854.5
ns7331479
ns1.00
mlp7layer_bn(tanh)(32 x 256)/zygote/GPU/CUDA
1351520
ns1393892
ns0.97
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/2 thread(s)
11650895.5
ns11701667
ns1.00
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/4 thread(s)
18290208
ns18292896
ns1.00
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/8 thread(s)
38510270.5
ns39864875
ns0.97
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/1 thread(s)
9545666
ns9527500
ns1.00
Dense(16 => 16, relu)(16 x 128)/forward/CPU/2 thread(s)
2583
ns2750
ns0.94
Dense(16 => 16, relu)(16 x 128)/forward/CPU/4 thread(s)
2458
ns2334
ns1.05
Dense(16 => 16, relu)(16 x 128)/forward/CPU/8 thread(s)
3250
ns3292
ns0.99
Dense(16 => 16, relu)(16 x 128)/forward/CPU/1 thread(s)
4562.5
ns2583
ns1.77
Dense(16 => 16, relu)(16 x 128)/forward/GPU/CUDA
24500.5
ns24864
ns0.99
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/2 thread(s)
6833
ns7041
ns0.97
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/4 thread(s)
6875
ns7166
ns0.96
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/8 thread(s)
7292
ns7250
ns1.01
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/1 thread(s)
7166.5
ns7083
ns1.01
Dense(16 => 16, relu)(16 x 128)/zygote/GPU/CUDA
209627.5
ns216254.5
ns0.97
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/2 thread(s)
8084
ns8250
ns0.98
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/4 thread(s)
8166
ns8459
ns0.97
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/8 thread(s)
8520.5
ns8542
ns1.00
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/1 thread(s)
6020.5
ns5834
ns1.03
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/2 thread(s)
10042
ns10479.5
ns0.96
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/4 thread(s)
14396
ns13062.5
ns1.10
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/8 thread(s)
9625
ns10500
ns0.92
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/1 thread(s)
7333.5
ns7500
ns0.98
Dense(16 => 16, gelu)(16 x 128)/forward/GPU/CUDA
24458
ns25125
ns0.97
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/2 thread(s)
19792
ns19916
ns0.99
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/4 thread(s)
19708
ns19917
ns0.99
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/8 thread(s)
20125
ns20270.5
ns0.99
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/1 thread(s)
19875
ns20000
ns0.99
Dense(16 => 16, gelu)(16 x 128)/zygote/GPU/CUDA
229625
ns238014.5
ns0.96
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/2 thread(s)
23562.5
ns23541
ns1.00
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/4 thread(s)
23542
ns23584
ns1.00
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/8 thread(s)
23791
ns23917
ns0.99
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/1 thread(s)
21520.5
ns21333
ns1.01
Dense(128 => 128, identity)(128 x 128)/forward/CPU/2 thread(s)
27000
ns28687.5
ns0.94
Dense(128 => 128, identity)(128 x 128)/forward/CPU/4 thread(s)
28416.5
ns28458
ns1.00
Dense(128 => 128, identity)(128 x 128)/forward/CPU/8 thread(s)
28188
ns28750
ns0.98
Dense(128 => 128, identity)(128 x 128)/forward/CPU/1 thread(s)
46083
ns46041
ns1.00
Dense(128 => 128, identity)(128 x 128)/forward/GPU/CUDA
25611
ns26166
ns0.98
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/2 thread(s)
224666
ns224416
ns1.00
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/4 thread(s)
278416
ns277458
ns1.00
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/8 thread(s)
3900375.5
ns3940416
ns0.99
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/1 thread(s)
145292
ns145375
ns1.00
Dense(128 => 128, identity)(128 x 128)/zygote/GPU/CUDA
211892
ns215900.5
ns0.98
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/2 thread(s)
243417
ns241916.5
ns1.01
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/4 thread(s)
295959
ns294834
ns1.00
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/8 thread(s)
4528416.5
ns4072750
ns1.11
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/1 thread(s)
145875
ns145500
ns1.00
Dense(16 => 16, identity)(16 x 128)/forward/CPU/2 thread(s)
2667
ns1750
ns1.52
Dense(16 => 16, identity)(16 x 128)/forward/CPU/4 thread(s)
1791
ns1709
ns1.05
Dense(16 => 16, identity)(16 x 128)/forward/CPU/8 thread(s)
2708
ns2833
ns0.96
Dense(16 => 16, identity)(16 x 128)/forward/CPU/1 thread(s)
1959
ns1792
ns1.09
Dense(16 => 16, identity)(16 x 128)/forward/GPU/CUDA
23071
ns23320
ns0.99
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/2 thread(s)
5125
ns5250
ns0.98
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/4 thread(s)
5083
ns5084
ns1.00
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/8 thread(s)
5333
ns5375
ns0.99
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/1 thread(s)
5125
ns5250
ns0.98
Dense(16 => 16, identity)(16 x 128)/zygote/GPU/CUDA
266994
ns273997
ns0.97
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/2 thread(s)
7500
ns7500
ns1
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/4 thread(s)
7500
ns7458
ns1.01
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/8 thread(s)
7625
ns7625
ns1
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/1 thread(s)
5125
ns5125
ns1
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
80068250
ns79922000
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
47839854.5
ns48869292
ns0.98
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
43348791
ns43653750
ns0.99
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
151521792
ns151454541
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/GPU/CUDA
2715083
ns2718779
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
665235792
ns663985416
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
410381834
ns413249125
ns0.99
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
394582542
ns397260000
ns0.99
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
682653250
ns684524000
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
14595495
ns14579213
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
712441042
ns713434583.5
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
680663916
ns675522709
ns1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
1031283708
ns997663125
ns1.03
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
997418875
ns999548041
ns1.00
This comment was automatically generated by workflow using github-action-benchmark.