-
Notifications
You must be signed in to change notification settings - Fork 59
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
1 changed file
with
48 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
e9e8587
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lux Benchmarks
Dense(512 => 512, identity)(512 x 128)/forward/CPU/2 thread(s)
411375
ns412166.5
ns1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/4 thread(s)
243709
ns323500
ns0.75
Dense(512 => 512, identity)(512 x 128)/forward/CPU/8 thread(s)
322646
ns320875
ns1.01
Dense(512 => 512, identity)(512 x 128)/forward/CPU/1 thread(s)
740375
ns741250.5
ns1.00
Dense(512 => 512, identity)(512 x 128)/forward/GPU/CUDA
43950.5
ns44423
ns0.99
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/2 thread(s)
1337249.5
ns1321270.5
ns1.01
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/4 thread(s)
1273791
ns2464833
ns0.52
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/8 thread(s)
14174812.5
ns19238396
ns0.74
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/1 thread(s)
2272771
ns2195417
ns1.04
Dense(512 => 512, identity)(512 x 128)/zygote/GPU/CUDA
209419.5
ns207553
ns1.01
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/2 thread(s)
1400291.5
ns1425917
ns0.98
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/4 thread(s)
887625
ns932500
ns0.95
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/8 thread(s)
1538917
ns10322250
ns0.15
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/1 thread(s)
2206208
ns2213895.5
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
1771500
ns1661895.5
ns1.07
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1090645.5
ns1070020.5
ns1.02
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1522125
ns1434166.5
ns1.06
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
3029417
ns2827416
ns1.07
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/GPU/CUDA
209813.5
ns209087
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
12102479
ns12123333
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
8818646
ns8828083
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
9205917
ns9265083
ns0.99
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
18584146.5
ns18585042
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1488578
ns1486549
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
17292708
ns17281708
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
13982625
ns13944083
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
14552250
ns14497834
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
21819500
ns21831187
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
250090042
ns250414312.5
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
149083000
ns148327625
ns1.01
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
115757187.5
ns121777833
ns0.95
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
447196708
ns447079250
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/GPU/CUDA
5471357
ns5479239
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
1218132458
ns1224472334
ns0.99
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
981733583
ns981860834
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
848222166.5
ns866559896
ns0.98
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
1760366667
ns1786183250
ns0.99
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
31426620.5
ns31141432
ns1.01
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
1029120292
ns1136093208
ns0.91
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
1005929416.5
ns995263750
ns1.01
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
1279847771
ns4093347604.5
ns0.31
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
1729728562.5
ns1730318458
ns1.00
lenet(28, 28, 1, 32)/forward/CPU/2 thread(s)
1070979
ns1099208
ns0.97
lenet(28, 28, 1, 32)/forward/CPU/4 thread(s)
1650166.5
ns1634791.5
ns1.01
lenet(28, 28, 1, 32)/forward/CPU/8 thread(s)
3527667
ns10471458.5
ns0.34
lenet(28, 28, 1, 32)/forward/CPU/1 thread(s)
783771
ns793749.5
ns0.99
lenet(28, 28, 1, 32)/forward/GPU/CUDA
273446
ns275033.5
ns0.99
lenet(28, 28, 1, 32)/zygote/CPU/2 thread(s)
3010792
ns3015812.5
ns1.00
lenet(28, 28, 1, 32)/zygote/CPU/4 thread(s)
4162729
ns4182500
ns1.00
lenet(28, 28, 1, 32)/zygote/CPU/8 thread(s)
11403792
ns18339084
ns0.62
lenet(28, 28, 1, 32)/zygote/CPU/1 thread(s)
3309916.5
ns3167500
ns1.04
lenet(28, 28, 1, 32)/zygote/GPU/CUDA
1192098
ns1197294.5
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
2334145.5
ns2300000
ns1.01
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1313125
ns1436021
ns0.91
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1658000
ns1622666.5
ns1.02
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
4216333
ns4204812
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/GPU/CUDA
209877
ns210264.5
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
19385042
ns19585749.5
ns0.99
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
16097792
ns16074458
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
17355959
ns17045417
ns1.02
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
25923083
ns25852083
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1591766
ns1597607
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
34160834
ns34485000
ns0.99
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
30761000
ns31047541
ns0.99
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
31144521
ns31432771
ns0.99
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
36207166
ns36566250
ns0.99
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
4524604.5
ns4524583
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2549416
ns2763250
ns0.92
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2909937.5
ns2932625
ns0.99
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
8392709
ns8382708
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/GPU/CUDA
424154
ns427520
ns0.99
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
39098708.5
ns38953500
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
32083041.5
ns32105354.5
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
32252375
ns32548104.5
ns0.99
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
51940146
ns51899750
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
2623770.5
ns2626030
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
82296833.5
ns89000896
ns0.92
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
114970542
ns113802417
ns1.01
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
219951812.5
ns1309568792
ns0.17
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
73552333.5
ns74097062
ns0.99
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
268540834
ns268319292
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
156204958
ns159133500
ns0.98
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
126610208
ns133094396
ns0.95
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
485345125
ns484827333
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/GPU/CUDA
7026886
ns7006400
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
1482105791
ns1476195458
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
1163867041
ns1132135875
ns1.03
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
1075361354
ns1088449687.5
ns0.99
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
2005616229
ns2000320208.5
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
34552058
ns34779703
ns0.99
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
1690575542
ns1685598500
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
1481039354
ns1538044750
ns0.96
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
1858159000
ns4332338646
ns0.43
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
2198122146
ns2208580041
ns1.00
lenet(28, 28, 1, 128)/forward/CPU/2 thread(s)
1874667
ns2076083
ns0.90
lenet(28, 28, 1, 128)/forward/CPU/4 thread(s)
2564645.5
ns2992583
ns0.86
lenet(28, 28, 1, 128)/forward/CPU/8 thread(s)
7967395.5
ns14339541
ns0.56
lenet(28, 28, 1, 128)/forward/CPU/1 thread(s)
2510895.5
ns2435937.5
ns1.03
lenet(28, 28, 1, 128)/forward/GPU/CUDA
274477
ns272995
ns1.01
lenet(28, 28, 1, 128)/zygote/CPU/2 thread(s)
9545521
ns9695167
ns0.98
lenet(28, 28, 1, 128)/zygote/CPU/4 thread(s)
11597645.5
ns12099333
ns0.96
lenet(28, 28, 1, 128)/zygote/CPU/8 thread(s)
25177208
ns37454270.5
ns0.67
lenet(28, 28, 1, 128)/zygote/CPU/1 thread(s)
11843625
ns11819458
ns1.00
lenet(28, 28, 1, 128)/zygote/GPU/CUDA
1271782
ns1260390
ns1.01
vgg16(32, 32, 3, 32)/forward/CPU/2 thread(s)
387397208.5
ns381383625
ns1.02
vgg16(32, 32, 3, 32)/forward/CPU/4 thread(s)
307140208
ns286352875
ns1.07
vgg16(32, 32, 3, 32)/forward/CPU/8 thread(s)
238501416.5
ns273227083
ns0.87
vgg16(32, 32, 3, 32)/forward/CPU/1 thread(s)
453568833.5
ns452262167
ns1.00
vgg16(32, 32, 3, 32)/forward/GPU/CUDA
4856791
ns4961226.5
ns0.98
vgg16(32, 32, 3, 32)/zygote/CPU/2 thread(s)
1348016042
ns1283646541
ns1.05
vgg16(32, 32, 3, 32)/zygote/CPU/4 thread(s)
954731500
ns1000220875
ns0.95
vgg16(32, 32, 3, 32)/zygote/CPU/8 thread(s)
902430042
ns967608250
ns0.93
vgg16(32, 32, 3, 32)/zygote/CPU/1 thread(s)
1436227042
ns1517743458
ns0.95
vgg16(32, 32, 3, 32)/zygote/GPU/CUDA
17707051
ns20595575
ns0.86
lenet(28, 28, 1, 64)/forward/CPU/2 thread(s)
1406270.5
ns1395833
ns1.01
lenet(28, 28, 1, 64)/forward/CPU/4 thread(s)
1689375
ns2080292
ns0.81
lenet(28, 28, 1, 64)/forward/CPU/8 thread(s)
5724417
ns12485458.5
ns0.46
lenet(28, 28, 1, 64)/forward/CPU/1 thread(s)
1362208
ns1302667
ns1.05
lenet(28, 28, 1, 64)/forward/GPU/CUDA
274629.5
ns269163.5
ns1.02
lenet(28, 28, 1, 64)/zygote/CPU/2 thread(s)
6793229
ns6772625
ns1.00
lenet(28, 28, 1, 64)/zygote/CPU/4 thread(s)
13240979
ns12497500
ns1.06
lenet(28, 28, 1, 64)/zygote/CPU/8 thread(s)
20057958
ns35063125
ns0.57
lenet(28, 28, 1, 64)/zygote/CPU/1 thread(s)
6132021
ns6112062.5
ns1.00
lenet(28, 28, 1, 64)/zygote/GPU/CUDA
1337532
ns1304323.5
ns1.03
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
70499708
ns70519875
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
43790125
ns43552854.5
ns1.01
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
39477041
ns40665042
ns0.97
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
132604313
ns132440791
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/GPU/CUDA
1860034
ns1881079
ns0.99
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
392007437
ns383762458
ns1.02
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
295848708
ns296140812.5
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
280677917
ns285609291
ns0.98
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
534601479
ns534397854
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
12292612
ns12301453.5
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
416452583
ns408556958
ns1.02
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
375603417
ns400978979
ns0.94
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
739467187
ns2804634541
ns0.26
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
710966083
ns711262542
ns1.00
vgg16(32, 32, 3, 128)/forward/CPU/2 thread(s)
1207602875
ns1187451625
ns1.02
vgg16(32, 32, 3, 128)/forward/CPU/4 thread(s)
827285146
ns687900354.5
ns1.20
vgg16(32, 32, 3, 128)/forward/CPU/8 thread(s)
626243583
ns675103208
ns0.93
vgg16(32, 32, 3, 128)/forward/CPU/1 thread(s)
1871309417
ns1861428958
ns1.01
vgg16(32, 32, 3, 128)/forward/GPU/CUDA
12310335.5
ns12318124
ns1.00
vgg16(32, 32, 3, 128)/zygote/CPU/2 thread(s)
3551769562
ns3592131166.5
ns0.99
vgg16(32, 32, 3, 128)/zygote/CPU/4 thread(s)
2825829208
ns2766931750
ns1.02
vgg16(32, 32, 3, 128)/zygote/CPU/8 thread(s)
2685591583
ns2824402500
ns0.95
vgg16(32, 32, 3, 128)/zygote/CPU/1 thread(s)
4971858334
ns4976486417
ns1.00
vgg16(32, 32, 3, 128)/zygote/GPU/CUDA
49471030
ns49597364
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
3412375
ns3424958
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2057459
ns2061667
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2495666
ns2433145.5
ns1.03
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
6041833
ns6019834
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/GPU/CUDA
291550
ns292066
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
25761312.5
ns25565333
ns1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
18421458
ns18594958.5
ns0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
18746750
ns19135937.5
ns0.98
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
39078208
ns38817750
ns1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
2478018
ns2470565.5
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
54288604.5
ns54013709
ns1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
80339812
ns79042812.5
ns1.02
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
172422791.5
ns1231143917
ns0.14
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
45738917
ns45483208
ns1.01
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
1785146.5
ns1777063
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1089812.5
ns1089250
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1563333
ns1458375
ns1.07
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
3036041.5
ns3025333.5
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/GPU/CUDA
213535
ns213113
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
12483792
ns12533917
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
9199624.5
ns9196875
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
9583333
ns9657604
ns0.99
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
18985687.5
ns18966125
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1543618
ns1540833
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
17577042
ns17644917
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
14335042
ns14332041.5
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
14575208
ns14718041
ns0.99
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
22223083
ns22160020.5
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
70534292
ns70550708.5
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
43780458
ns43600063
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
39542083
ns40717729
ns0.97
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
132641021
ns132485291.5
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/GPU/CUDA
1884243
ns1949867
ns0.97
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
359089500
ns359581209
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
289136354
ns290937584
ns0.99
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
286223375
ns290441875
ns0.99
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
621740812.5
ns618706270.5
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
13381466.5
ns13347845
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
426322958.5
ns418387125
ns1.02
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
433456542
ns422322583
ns1.03
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
700179458.5
ns2895773375
ns0.24
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
717066417
ns716354500
ns1.00
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/2 thread(s)
1546604
ns1599208
ns0.97
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/4 thread(s)
1018625
ns1232938
ns0.83
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/8 thread(s)
1239542
ns1235167
ns1.00
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/1 thread(s)
2344354
ns2315292
ns1.01
mlp7layer_bn(gelu)(32 x 256)/forward/GPU/CUDA
590077
ns545112
ns1.08
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/2 thread(s)
8789625
ns8862792
ns0.99
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/4 thread(s)
13433958
ns12950812.5
ns1.04
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/8 thread(s)
30660000
ns58793854.5
ns0.52
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/1 thread(s)
9813916.5
ns9809334
ns1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/GPU/CUDA
1433625.5
ns1492967
ns0.96
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/2 thread(s)
17774292
ns17734000
ns1.00
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/4 thread(s)
16943020.5
ns17361750
ns0.98
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/8 thread(s)
29389541.5
ns77291708
ns0.38
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/1 thread(s)
14403021
ns12986750
ns1.11
Dense(512 => 512, relu)(512 x 128)/forward/CPU/2 thread(s)
793729.5
ns790937.5
ns1.00
Dense(512 => 512, relu)(512 x 128)/forward/CPU/4 thread(s)
652583
ns498875
ns1.31
Dense(512 => 512, relu)(512 x 128)/forward/CPU/8 thread(s)
1033604.5
ns3817250
ns0.27
Dense(512 => 512, relu)(512 x 128)/forward/CPU/1 thread(s)
729000
ns725042
ns1.01
Dense(512 => 512, relu)(512 x 128)/forward/GPU/CUDA
47334
ns48794
ns0.97
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/2 thread(s)
1545896
ns1520854
ns1.02
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/4 thread(s)
1028188
ns1049917
ns0.98
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/8 thread(s)
1490062.5
ns11421458
ns0.13
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/1 thread(s)
2291333.5
ns2273792
ns1.01
Dense(512 => 512, relu)(512 x 128)/zygote/GPU/CUDA
233057
ns234942.5
ns0.99
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/2 thread(s)
1740750
ns1695771
ns1.03
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/4 thread(s)
1245833
ns1271708.5
ns0.98
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/8 thread(s)
2032792
ns11370292
ns0.18
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/1 thread(s)
2317334
ns2291667
ns1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
3401083.5
ns3401646
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2039458
ns2056833.5
ns0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2508479.5
ns2425042
ns1.03
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
6024958
ns5995666
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/GPU/CUDA
283499
ns285706.5
ns0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
23993958
ns24118875
ns0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
17161937.5
ns17258875
ns0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
17120500.5
ns17604666.5
ns0.97
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
37516042
ns37430416
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
2407574.5
ns2402296
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
52415917
ns52414125
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
80120833
ns83928292
ns0.95
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
170100749.5
ns1219142792
ns0.14
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
44596958
ns44392708
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
249959541
ns250407625
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
148554375
ns148238917
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
115651042
ns121351666
ns0.95
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
447689729.5
ns447459458
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/GPU/CUDA
5447214
ns5336731
ns1.02
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
1138134750
ns1129539417
ns1.01
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
879402625.5
ns882674958
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
806396250
ns815371709
ns0.99
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
1745958208
ns1744905709
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
29364193
ns28378262
ns1.03
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
1044290250
ns1064772187.5
ns0.98
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
968104500
ns964672209
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
1275516417
ns3904894687.5
ns0.33
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
1730982917
ns1742644958
ns0.99
mlp7layer_bn(relu)(32 x 256)/forward/CPU/2 thread(s)
1288812
ns1302229
ns0.99
mlp7layer_bn(relu)(32 x 256)/forward/CPU/4 thread(s)
756416
ns969520.5
ns0.78
mlp7layer_bn(relu)(32 x 256)/forward/CPU/8 thread(s)
959708
ns945000
ns1.02
mlp7layer_bn(relu)(32 x 256)/forward/CPU/1 thread(s)
2062479
ns1957604.5
ns1.05
mlp7layer_bn(relu)(32 x 256)/forward/GPU/CUDA
585184
ns576593
ns1.01
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/2 thread(s)
5766917
ns5874604
ns0.98
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/4 thread(s)
8702542
ns6492542
ns1.34
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/8 thread(s)
24377042
ns49631083.5
ns0.49
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/1 thread(s)
7093250
ns6374042
ns1.11
mlp7layer_bn(relu)(32 x 256)/zygote/GPU/CUDA
1392456
ns1416373
ns0.98
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/2 thread(s)
10756875
ns10778396
ns1.00
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/4 thread(s)
9739500
ns9918667
ns0.98
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/8 thread(s)
17286625
ns61084708
ns0.28
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/1 thread(s)
8840896
ns7443125
ns1.19
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/2 thread(s)
466250
ns483500
ns0.96
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/4 thread(s)
473729
ns368667
ns1.28
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/8 thread(s)
2043167
ns4189542
ns0.49
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/1 thread(s)
88875
ns88833
ns1.00
Dense(128 => 128, gelu)(128 x 128)/forward/GPU/CUDA
28086
ns29045
ns0.97
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/2 thread(s)
366958
ns379541.5
ns0.97
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/4 thread(s)
426458
ns446708
ns0.95
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/8 thread(s)
4380541
ns12533396
ns0.35
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/1 thread(s)
261333
ns265208
ns0.99
Dense(128 => 128, gelu)(128 x 128)/zygote/GPU/CUDA
223772
ns227079.5
ns0.99
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/2 thread(s)
685500
ns707459
ns0.97
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/4 thread(s)
693562.5
ns726271
ns0.95
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/8 thread(s)
1081875
ns6598416.5
ns0.16
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/1 thread(s)
445124.5
ns446958
ns1.00
Dense(128 => 128, relu)(128 x 128)/forward/CPU/2 thread(s)
412208.5
ns426312.5
ns0.97
Dense(128 => 128, relu)(128 x 128)/forward/CPU/4 thread(s)
412083
ns303666.5
ns1.36
Dense(128 => 128, relu)(128 x 128)/forward/CPU/8 thread(s)
743708.5
ns2289542
ns0.32
Dense(128 => 128, relu)(128 x 128)/forward/CPU/1 thread(s)
53833.5
ns53354
ns1.01
Dense(128 => 128, relu)(128 x 128)/forward/GPU/CUDA
28451
ns28741
ns0.99
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/2 thread(s)
316042
ns334125
ns0.95
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/4 thread(s)
306854
ns342916
ns0.89
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/8 thread(s)
383125
ns6128625
ns0.06251402231332477
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/1 thread(s)
153375
ns157292
ns0.98
Dense(128 => 128, relu)(128 x 128)/zygote/GPU/CUDA
209063
ns211268
ns0.99
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/2 thread(s)
381500
ns400375
ns0.95
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/4 thread(s)
374500
ns410291.5
ns0.91
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/8 thread(s)
928500
ns5806792
ns0.16
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/1 thread(s)
174125
ns174625
ns1.00
vgg16(32, 32, 3, 64)/forward/CPU/2 thread(s)
610538959
ns603293416
ns1.01
vgg16(32, 32, 3, 64)/forward/CPU/4 thread(s)
428730667
ns425511812
ns1.01
vgg16(32, 32, 3, 64)/forward/CPU/8 thread(s)
372122895.5
ns412612646
ns0.90
vgg16(32, 32, 3, 64)/forward/CPU/1 thread(s)
876687334
ns872101687
ns1.01
vgg16(32, 32, 3, 64)/forward/GPU/CUDA
7021716
ns7026013.5
ns1.00
vgg16(32, 32, 3, 64)/zygote/CPU/2 thread(s)
2064097437.5
ns2054393875
ns1.00
vgg16(32, 32, 3, 64)/zygote/CPU/4 thread(s)
1567205042
ns1618390313
ns0.97
vgg16(32, 32, 3, 64)/zygote/CPU/8 thread(s)
1600946021
ns1720878584
ns0.93
vgg16(32, 32, 3, 64)/zygote/CPU/1 thread(s)
2761837666
ns2755903459
ns1.00
vgg16(32, 32, 3, 64)/zygote/GPU/CUDA
25734048.5
ns25904879.5
ns0.99
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/2 thread(s)
520166.5
ns521083
ns1.00
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/4 thread(s)
401500
ns435709
ns0.92
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/8 thread(s)
1846521
ns6187563
ns0.30
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/1 thread(s)
867666.5
ns868791.5
ns1.00
Dense(512 => 512, gelu)(512 x 128)/forward/GPU/CUDA
47339
ns48586
ns0.97
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/2 thread(s)
1916250
ns1892687.5
ns1.01
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/4 thread(s)
1752834
ns2331750
ns0.75
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/8 thread(s)
14787833
ns18962458.5
ns0.78
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/1 thread(s)
2737958
ns2772834
ns0.99
Dense(512 => 512, gelu)(512 x 128)/zygote/GPU/CUDA
251262
ns250849
ns1.00
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/2 thread(s)
3223791
ns2728041
ns1.18
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/4 thread(s)
2284333.5
ns2319791.5
ns0.98
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/8 thread(s)
3794125
ns12893125
ns0.29
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/1 thread(s)
3375125
ns3385499.5
ns1.00
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/2 thread(s)
1489937.5
ns1496750
ns1.00
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/4 thread(s)
1019958
ns1181458.5
ns0.86
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/8 thread(s)
1178083
ns1208500
ns0.97
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/1 thread(s)
2338125
ns2224187.5
ns1.05
mlp7layer_bn(tanh)(32 x 256)/forward/GPU/CUDA
585370
ns589325.5
ns0.99
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/2 thread(s)
5755625
ns5771979
ns1.00
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/4 thread(s)
7937604.5
ns6453250
ns1.23
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/8 thread(s)
24739167
ns51827709
ns0.48
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/1 thread(s)
7285791
ns7242187
ns1.01
mlp7layer_bn(tanh)(32 x 256)/zygote/GPU/CUDA
1357578.5
ns1343442.5
ns1.01
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/2 thread(s)
12561104
ns12776250
ns0.98
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/4 thread(s)
11774166
ns12078916.5
ns0.97
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/8 thread(s)
20486917
ns60709375.5
ns0.34
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/1 thread(s)
10798187.5
ns10444333.5
ns1.03
Dense(16 => 16, relu)(16 x 128)/forward/CPU/2 thread(s)
2750
ns2625
ns1.05
Dense(16 => 16, relu)(16 x 128)/forward/CPU/4 thread(s)
2458
ns2667
ns0.92
Dense(16 => 16, relu)(16 x 128)/forward/CPU/8 thread(s)
3375
ns2833.5
ns1.19
Dense(16 => 16, relu)(16 x 128)/forward/CPU/1 thread(s)
3521
ns5229.5
ns0.67
Dense(16 => 16, relu)(16 x 128)/forward/GPU/CUDA
24874
ns25112
ns0.99
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/2 thread(s)
8875
ns8916
ns1.00
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/4 thread(s)
8541
ns8833
ns0.97
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/8 thread(s)
8625
ns8917
ns0.97
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/1 thread(s)
8834
ns8792
ns1.00
Dense(16 => 16, relu)(16 x 128)/zygote/GPU/CUDA
209705.5
ns209631.5
ns1.00
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/2 thread(s)
16792
ns16562.5
ns1.01
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/4 thread(s)
16542
ns16542
ns1
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/8 thread(s)
16875
ns16667
ns1.01
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/1 thread(s)
10917
ns10709
ns1.02
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/2 thread(s)
10125
ns11563
ns0.88
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/4 thread(s)
14834
ns15666
ns0.95
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/8 thread(s)
10916
ns13250
ns0.82
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/1 thread(s)
7584
ns7458
ns1.02
Dense(16 => 16, gelu)(16 x 128)/forward/GPU/CUDA
24720
ns25221
ns0.98
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/2 thread(s)
22750
ns22729.5
ns1.00
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/4 thread(s)
22500
ns22458
ns1.00
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/8 thread(s)
22541
ns22541
ns1
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/1 thread(s)
22291
ns22584
ns0.99
Dense(16 => 16, gelu)(16 x 128)/zygote/GPU/CUDA
230541.5
ns232899
ns0.99
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/2 thread(s)
52666.5
ns52375
ns1.01
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/4 thread(s)
52250
ns52542
ns0.99
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/8 thread(s)
52250
ns52625
ns0.99
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/1 thread(s)
43792
ns44042
ns0.99
Dense(128 => 128, identity)(128 x 128)/forward/CPU/2 thread(s)
29167
ns28166.5
ns1.04
Dense(128 => 128, identity)(128 x 128)/forward/CPU/4 thread(s)
28750
ns29333
ns0.98
Dense(128 => 128, identity)(128 x 128)/forward/CPU/8 thread(s)
28021
ns29417
ns0.95
Dense(128 => 128, identity)(128 x 128)/forward/CPU/1 thread(s)
46375
ns46334
ns1.00
Dense(128 => 128, identity)(128 x 128)/forward/GPU/CUDA
25582
ns25984
ns0.98
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/2 thread(s)
209292
ns208979
ns1.00
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/4 thread(s)
271250
ns267604.5
ns1.01
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/8 thread(s)
4055167
ns4061417
ns1.00
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/1 thread(s)
147958
ns154708
ns0.96
Dense(128 => 128, identity)(128 x 128)/zygote/GPU/CUDA
217500
ns224895
ns0.97
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/2 thread(s)
305437.5
ns311208
ns0.98
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/4 thread(s)
305812.5
ns297208
ns1.03
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/8 thread(s)
476375
ns666000
ns0.72
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/1 thread(s)
160917
ns161917
ns0.99
Dense(16 => 16, identity)(16 x 128)/forward/CPU/2 thread(s)
1750
ns1959
ns0.89
Dense(16 => 16, identity)(16 x 128)/forward/CPU/4 thread(s)
1792
ns2000
ns0.90
Dense(16 => 16, identity)(16 x 128)/forward/CPU/8 thread(s)
2625
ns2625
ns1
Dense(16 => 16, identity)(16 x 128)/forward/CPU/1 thread(s)
1917
ns2208
ns0.87
Dense(16 => 16, identity)(16 x 128)/forward/GPU/CUDA
22679
ns23260
ns0.98
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/2 thread(s)
7645.5
ns7125
ns1.07
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/4 thread(s)
7667
ns7333
ns1.05
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/8 thread(s)
7812.5
ns7792
ns1.00
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/1 thread(s)
7833
ns7542
ns1.04
Dense(16 => 16, identity)(16 x 128)/zygote/GPU/CUDA
265749
ns274053.5
ns0.97
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/2 thread(s)
11542
ns11833.5
ns0.98
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/4 thread(s)
11542
ns11458
ns1.01
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/8 thread(s)
11542
ns11375
ns1.01
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/1 thread(s)
7125
ns7167
ns0.99
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
80203833
ns79918583
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
47894084
ns49163917
ns0.97
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
44850250
ns44855000
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
151606708
ns151319874.5
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/GPU/CUDA
2675189
ns2719014
ns0.98
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
679504291
ns601785792
ns1.13
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
410648500
ns411474666
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
396007395.5
ns397225187.5
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
692029833
ns684946583
ns1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
14599746.5
ns14617212
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
693561354.5
ns685314624.5
ns1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
670877709
ns667050292
ns1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
978990875
ns953815417
ns1.03
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
994723417
ns997319750
ns1.00
This comment was automatically generated by workflow using github-action-benchmark.