-
Notifications
You must be signed in to change notification settings - Fork 59
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
fix: update GravitationalWaveform tutorial
- Loading branch information
Showing
2 changed files
with
13 additions
and
13 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
92e8469
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lux Benchmarks
Dense(512 => 512, identity)(512 x 128)/forward/CPU/2 thread(s)
415000
ns415000
ns1
Dense(512 => 512, identity)(512 x 128)/forward/CPU/4 thread(s)
244167
ns244000
ns1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/8 thread(s)
243917
ns244395.5
ns1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/1 thread(s)
740083
ns741625
ns1.00
Dense(512 => 512, identity)(512 x 128)/forward/GPU/CUDA
43793
ns43299
ns1.01
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/2 thread(s)
1280333
ns1269375
ns1.01
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/4 thread(s)
1268791
ns1242021
ns1.02
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/8 thread(s)
16455125
ns16399916
ns1.00
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/1 thread(s)
2193625.5
ns2241187
ns0.98
Dense(512 => 512, identity)(512 x 128)/zygote/GPU/CUDA
205231
ns204002
ns1.01
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/2 thread(s)
1311917
ns1349042
ns0.97
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/4 thread(s)
1301792
ns1294500
ns1.01
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/8 thread(s)
16522625
ns16764541
ns0.99
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/1 thread(s)
2229625
ns2232041.5
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
1672666
ns1752375
ns0.95
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1078166
ns1093646
ns0.99
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1511041.5
ns1492959
ns1.01
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
2994458
ns3024125
ns0.99
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/GPU/CUDA
207884
ns207489.5
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
12154146
ns12156687.5
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
8856791
ns8836167
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
9297792
ns9199229.5
ns1.01
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
18579708
ns18606000
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1492665
ns1487095.5
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
17297396
ns17315979
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
13998833
ns13972625
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
14511000
ns14444750
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
21839416
ns21832209
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
250544729
ns250603708
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
148581208
ns148666500
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
116355916.5
ns116680792
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
447348667
ns446998834
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/GPU/CUDA
5449372
ns5475042
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
1226769166
ns1221297083
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
930331417
ns932803916
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
829560312.5
ns830875312.5
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
1631272125
ns1633169958
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
31620503.5
ns31270992
ns1.01
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
1143568125
ns1142742084
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
993275583.5
ns992312021
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
1332092333.5
ns1338745271
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
1732940916.5
ns1731363313
ns1.00
lenet(28, 28, 1, 32)/forward/CPU/2 thread(s)
1119875
ns1118896
ns1.00
lenet(28, 28, 1, 32)/forward/CPU/4 thread(s)
1650333
ns1648500
ns1.00
lenet(28, 28, 1, 32)/forward/CPU/8 thread(s)
3433334
ns3550750
ns0.97
lenet(28, 28, 1, 32)/forward/CPU/1 thread(s)
782354
ns779895.5
ns1.00
lenet(28, 28, 1, 32)/forward/GPU/CUDA
263984.5
ns263872
ns1.00
lenet(28, 28, 1, 32)/zygote/CPU/2 thread(s)
2986166
ns2989959
ns1.00
lenet(28, 28, 1, 32)/zygote/CPU/4 thread(s)
4134521
ns4148875
ns1.00
lenet(28, 28, 1, 32)/zygote/CPU/8 thread(s)
9684479
ns10113125
ns0.96
lenet(28, 28, 1, 32)/zygote/CPU/1 thread(s)
3141166
ns3158708.5
ns0.99
lenet(28, 28, 1, 32)/zygote/GPU/CUDA
1099110
ns1091990.5
ns1.01
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
2222125
ns2343042
ns0.95
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1310979
ns1334167
ns0.98
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1561042
ns1566042
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
4207458
ns4207541
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/GPU/CUDA
208127
ns208150
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
19407062.5
ns19429625
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
16092937.5
ns16109333
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
17317479
ns17422500
ns0.99
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
25877354.5
ns25879770.5
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1588570
ns1587958
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
34283042
ns34137833
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
31029667
ns30953770.5
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
31324334
ns31126834
ns1.01
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
36972625
ns36603292
ns1.01
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
4535728.5
ns4553458
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2550437.5
ns2548333
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2682521
ns2680479
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
8376542
ns8392292
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/GPU/CUDA
420059
ns419498
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
38787729
ns39173771
ns0.99
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
32133646
ns32069583
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
32252916
ns32250333
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
51916459
ns51921166
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
2624143
ns2616748
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
88908791
ns89448084
ns0.99
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
114840750
ns115210625
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
227998375
ns223006041
ns1.02
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
74777479
ns74834709
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
269000958
ns268830916
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
156605625
ns155940083
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
123282250
ns123481354.5
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
485266417
ns485017458
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/GPU/CUDA
7007944
ns7045661
ns0.99
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
1477600500.5
ns1468709770.5
ns1.01
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
1177860417
ns1170852875
ns1.01
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
1059255604.5
ns1060936145.5
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
2001527437.5
ns2003258354
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
34509709
ns34686788.5
ns0.99
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
1725457125
ns1717082916
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
1535708771
ns1544192313
ns0.99
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
1892793750
ns1911836416
ns0.99
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
2208396292
ns2210230895.5
ns1.00
lenet(28, 28, 1, 128)/forward/CPU/2 thread(s)
2072875
ns2101000.5
ns0.99
lenet(28, 28, 1, 128)/forward/CPU/4 thread(s)
3011791
ns3069792
ns0.98
lenet(28, 28, 1, 128)/forward/CPU/8 thread(s)
8320459
ns7782562.5
ns1.07
lenet(28, 28, 1, 128)/forward/CPU/1 thread(s)
2450499.5
ns2470375
ns0.99
lenet(28, 28, 1, 128)/forward/GPU/CUDA
268533.5
ns266658.5
ns1.01
lenet(28, 28, 1, 128)/zygote/CPU/2 thread(s)
9519292
ns9672229
ns0.98
lenet(28, 28, 1, 128)/zygote/CPU/4 thread(s)
12095020.5
ns12092375
ns1.00
lenet(28, 28, 1, 128)/zygote/CPU/8 thread(s)
24991500
ns24297291.5
ns1.03
lenet(28, 28, 1, 128)/zygote/CPU/1 thread(s)
11770084
ns11738333
ns1.00
lenet(28, 28, 1, 128)/zygote/GPU/CUDA
1173232
ns1162784.5
ns1.01
vgg16(32, 32, 3, 32)/forward/CPU/2 thread(s)
383052437.5
ns379589354
ns1.01
vgg16(32, 32, 3, 32)/forward/CPU/4 thread(s)
311828042
ns309610249.5
ns1.01
vgg16(32, 32, 3, 32)/forward/CPU/8 thread(s)
269993541.5
ns271697750
ns0.99
vgg16(32, 32, 3, 32)/forward/CPU/1 thread(s)
452443833.5
ns452186458.5
ns1.00
vgg16(32, 32, 3, 32)/forward/GPU/CUDA
4865362.5
ns4826362
ns1.01
vgg16(32, 32, 3, 32)/zygote/CPU/2 thread(s)
1155538583
ns1153829542
ns1.00
vgg16(32, 32, 3, 32)/zygote/CPU/4 thread(s)
936810083
ns937182416
ns1.00
vgg16(32, 32, 3, 32)/zygote/CPU/8 thread(s)
959183583
ns946442459
ns1.01
vgg16(32, 32, 3, 32)/zygote/CPU/1 thread(s)
1397577000
ns1402726625
ns1.00
vgg16(32, 32, 3, 32)/zygote/GPU/CUDA
19191910
ns17871272
ns1.07
lenet(28, 28, 1, 64)/forward/CPU/2 thread(s)
1053520.5
ns1060583.5
ns0.99
lenet(28, 28, 1, 64)/forward/CPU/4 thread(s)
1668459
ns1665104
ns1.00
lenet(28, 28, 1, 64)/forward/CPU/8 thread(s)
5692083
ns5684333
ns1.00
lenet(28, 28, 1, 64)/forward/CPU/1 thread(s)
1396104.5
ns1293250
ns1.08
lenet(28, 28, 1, 64)/forward/GPU/CUDA
270444.5
ns266056.5
ns1.02
lenet(28, 28, 1, 64)/zygote/CPU/2 thread(s)
6494584
ns6298000
ns1.03
lenet(28, 28, 1, 64)/zygote/CPU/4 thread(s)
13134333
ns13078042
ns1.00
lenet(28, 28, 1, 64)/zygote/CPU/8 thread(s)
19522667
ns19625729.5
ns0.99
lenet(28, 28, 1, 64)/zygote/CPU/1 thread(s)
6062833
ns6036063
ns1.00
lenet(28, 28, 1, 64)/zygote/GPU/CUDA
1205114.5
ns1205857
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
70593167
ns70540979
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
43687500
ns43826125
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
39756500
ns39756625
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
132546521
ns132576562.5
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/GPU/CUDA
1861025.5
ns1928339
ns0.97
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
356256979
ns355456104
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
270180000
ns270074458
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
253147750
ns254482208.5
ns0.99
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
535028854
ns534597416.5
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
12303646
ns12296862
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
400021667
ns395449625
ns1.01
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
374059625
ns371290834
ns1.01
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
723689958.5
ns729009728.5
ns0.99
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
712462250
ns711332250
ns1.00
vgg16(32, 32, 3, 128)/forward/CPU/2 thread(s)
1195955667
ns1188343167
ns1.01
vgg16(32, 32, 3, 128)/forward/CPU/4 thread(s)
833640041.5
ns829330374.5
ns1.01
vgg16(32, 32, 3, 128)/forward/CPU/8 thread(s)
641220229.5
ns641218812.5
ns1.00
vgg16(32, 32, 3, 128)/forward/CPU/1 thread(s)
1769113729
ns1770400208.5
ns1.00
vgg16(32, 32, 3, 128)/forward/GPU/CUDA
12497145
ns12316126
ns1.01
vgg16(32, 32, 3, 128)/zygote/CPU/2 thread(s)
3639556520.5
ns3615525250
ns1.01
vgg16(32, 32, 3, 128)/zygote/CPU/4 thread(s)
2825360333
ns2829654958
ns1.00
vgg16(32, 32, 3, 128)/zygote/CPU/8 thread(s)
2702765709
ns2706402500
ns1.00
vgg16(32, 32, 3, 128)/zygote/CPU/1 thread(s)
5019640833
ns5029947166
ns1.00
vgg16(32, 32, 3, 128)/zygote/GPU/CUDA
49951471
ns49544504.5
ns1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
3421500
ns3431458
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2074979
ns2061979
ns1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2545666
ns2539292
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
6030125
ns6030625
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/GPU/CUDA
343299
ns290543
ns1.18
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
26132666.5
ns25968458
ns1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
19030500
ns18987208
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
19345021
ns19553000
ns0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
39337834
ns39349375
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
2467033.5
ns2461393.5
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
54504542
ns54481083
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
81980333
ns83276583.5
ns0.98
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
173279167
ns179067500
ns0.97
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
45606041
ns45593208
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
1787396
ns1786750
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1095125
ns1105896
ns0.99
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1559166
ns1574250
ns0.99
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
3050791
ns3030083
ns1.01
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/GPU/CUDA
213819
ns212590.5
ns1.01
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
12546291
ns12547583.5
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
9225062.5
ns9224833
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
9642333.5
ns9634083
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
19019500
ns19000834
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1532922
ns1540163
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
17668667
ns17638166.5
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
14332167
ns14340604.5
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
14597000
ns14590667
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
22175750.5
ns22206375
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
70541417
ns70473374.5
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
43674667
ns43776770.5
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
39704500
ns39709833
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
132649271
ns132558937.5
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/GPU/CUDA
1938611
ns1941367.5
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
361084062.5
ns360278666
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
347061583.5
ns348229917
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
305013375
ns304100458
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
723885708
ns724915208
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
13388921
ns13369766.5
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
425519667
ns420187917
ns1.01
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
427658750
ns421516541
ns1.01
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
736440729.5
ns709368500.5
ns1.04
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
715989083
ns716565750
ns1.00
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/2 thread(s)
1596542
ns1595104.5
ns1.00
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/4 thread(s)
1135916
ns1155875
ns0.98
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/8 thread(s)
1138166.5
ns1141042
ns1.00
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/1 thread(s)
2412708
ns2398709
ns1.01
mlp7layer_bn(gelu)(32 x 256)/forward/GPU/CUDA
587435
ns589817.5
ns1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/2 thread(s)
8847312
ns8847459
ns1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/4 thread(s)
13684021
ns13657208.5
ns1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/8 thread(s)
32863792
ns33215000
ns0.99
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/1 thread(s)
9875083
ns9861333.5
ns1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/GPU/CUDA
1416297.5
ns1439685.5
ns0.98
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/2 thread(s)
16549687.5
ns16590416.5
ns1.00
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/4 thread(s)
22946333.5
ns23377417
ns0.98
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/8 thread(s)
47499854
ns48606521
ns0.98
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/1 thread(s)
13135792
ns13167729
ns1.00
Dense(512 => 512, relu)(512 x 128)/forward/CPU/2 thread(s)
827646
ns827750
ns1.00
Dense(512 => 512, relu)(512 x 128)/forward/CPU/4 thread(s)
514125
ns572417
ns0.90
Dense(512 => 512, relu)(512 x 128)/forward/CPU/8 thread(s)
1076104
ns1030375
ns1.04
Dense(512 => 512, relu)(512 x 128)/forward/CPU/1 thread(s)
725021
ns723625
ns1.00
Dense(512 => 512, relu)(512 x 128)/forward/GPU/CUDA
47722
ns47526
ns1.00
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/2 thread(s)
1531958
ns1545916
ns0.99
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/4 thread(s)
1005542
ns1016104.5
ns0.99
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/8 thread(s)
1422834
ns1496709
ns0.95
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/1 thread(s)
2290271
ns2249771.5
ns1.02
Dense(512 => 512, relu)(512 x 128)/zygote/GPU/CUDA
235161
ns233321.5
ns1.01
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/2 thread(s)
1550625
ns1542666
ns1.01
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/4 thread(s)
1063666.5
ns1081042
ns0.98
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/8 thread(s)
1456541
ns1462375
ns1.00
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/1 thread(s)
2260042
ns2263999.5
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
3417917
ns3411417
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2065041
ns2072375
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2482708
ns2519729
ns0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
6009500
ns6003959
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/GPU/CUDA
284432
ns286692.5
ns0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
24080042
ns24108583
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
17195500
ns17195125
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
17121125
ns17134396
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
37501854
ns37539229
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
2416353
ns2409084
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
52890167
ns52901854
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
84990875
ns84174146
ns1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
173811125
ns176459666.5
ns0.98
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
44527208
ns44549125
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
250510875
ns250550666
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
148711500
ns148903458
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
116106354
ns116179937.5
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
447706104
ns447642729
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/GPU/CUDA
5473947
ns5446946
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
1104910333
ns1105006292
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
852696229
ns855140500
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
828124666.5
ns828516104.5
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
1753883208
ns1750786541
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
29129663
ns28975639.5
ns1.01
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
1027987062.5
ns1027977458.5
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
967528166
ns971875209
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
1323494083.5
ns1322670249.5
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
1721562854.5
ns1720633478.5
ns1.00
mlp7layer_bn(relu)(32 x 256)/forward/CPU/2 thread(s)
1199000
ns1103938
ns1.09
mlp7layer_bn(relu)(32 x 256)/forward/CPU/4 thread(s)
722000
ns681708
ns1.06
mlp7layer_bn(relu)(32 x 256)/forward/CPU/8 thread(s)
723333.5
ns780771
ns0.93
mlp7layer_bn(relu)(32 x 256)/forward/CPU/1 thread(s)
2059938
ns2053729
ns1.00
mlp7layer_bn(relu)(32 x 256)/forward/GPU/CUDA
566089.5
ns565890
ns1.00
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/2 thread(s)
5883354
ns5865542
ns1.00
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/4 thread(s)
9012521
ns8901292
ns1.01
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/8 thread(s)
26898459
ns27017958
ns1.00
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/1 thread(s)
7112042
ns7111541
ns1.00
mlp7layer_bn(relu)(32 x 256)/zygote/GPU/CUDA
1371381.5
ns1352222
ns1.01
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/2 thread(s)
9684083
ns9684916.5
ns1.00
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/4 thread(s)
16051250
ns16130125
ns1.00
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/8 thread(s)
33056542
ns34128937.5
ns0.97
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/1 thread(s)
7626499.5
ns7630583
ns1.00
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/2 thread(s)
522916.5
ns519541
ns1.01
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/4 thread(s)
390125.5
ns425396
ns0.92
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/8 thread(s)
3390917
ns2668499.5
ns1.27
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/1 thread(s)
89292
ns88542
ns1.01
Dense(128 => 128, gelu)(128 x 128)/forward/GPU/CUDA
28324
ns27675
ns1.02
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/2 thread(s)
380812.5
ns379958
ns1.00
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/4 thread(s)
444875
ns444000
ns1.00
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/8 thread(s)
5040083.5
ns4753729.5
ns1.06
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/1 thread(s)
259041
ns258583
ns1.00
Dense(128 => 128, gelu)(128 x 128)/zygote/GPU/CUDA
219450.5
ns218086
ns1.01
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/2 thread(s)
411083
ns413521
ns0.99
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/4 thread(s)
475270.5
ns474959
ns1.00
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/8 thread(s)
4889250
ns4525333
ns1.08
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/1 thread(s)
271084
ns273666
ns0.99
Dense(128 => 128, relu)(128 x 128)/forward/CPU/2 thread(s)
465208.5
ns466645.5
ns1.00
Dense(128 => 128, relu)(128 x 128)/forward/CPU/4 thread(s)
318584
ns359083.5
ns0.89
Dense(128 => 128, relu)(128 x 128)/forward/CPU/8 thread(s)
778771
ns903250
ns0.86
Dense(128 => 128, relu)(128 x 128)/forward/CPU/1 thread(s)
54354.5
ns53375
ns1.02
Dense(128 => 128, relu)(128 x 128)/forward/GPU/CUDA
28220
ns27956
ns1.01
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/2 thread(s)
340333
ns339958.5
ns1.00
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/4 thread(s)
341958
ns341541
ns1.00
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/8 thread(s)
734125
ns663834
ns1.11
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/1 thread(s)
151417
ns151770.5
ns1.00
Dense(128 => 128, relu)(128 x 128)/zygote/GPU/CUDA
205814.5
ns204249
ns1.01
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/2 thread(s)
351792
ns354416
ns0.99
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/4 thread(s)
356604.5
ns356167
ns1.00
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/8 thread(s)
935583
ns638708.5
ns1.46
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/1 thread(s)
151000
ns151083
ns1.00
vgg16(32, 32, 3, 64)/forward/CPU/2 thread(s)
606312458
ns601408417
ns1.01
vgg16(32, 32, 3, 64)/forward/CPU/4 thread(s)
430997020.5
ns429624958.5
ns1.00
vgg16(32, 32, 3, 64)/forward/CPU/8 thread(s)
382921125
ns381216438
ns1.00
vgg16(32, 32, 3, 64)/forward/CPU/1 thread(s)
871105000
ns871782875
ns1.00
vgg16(32, 32, 3, 64)/forward/GPU/CUDA
7038469
ns7027859.5
ns1.00
vgg16(32, 32, 3, 64)/zygote/CPU/2 thread(s)
2005974042
ns1999886687.5
ns1.00
vgg16(32, 32, 3, 64)/zygote/CPU/4 thread(s)
1610239562.5
ns1620871562.5
ns0.99
vgg16(32, 32, 3, 64)/zygote/CPU/8 thread(s)
1558401520.5
ns1551986813
ns1.00
vgg16(32, 32, 3, 64)/zygote/CPU/1 thread(s)
2631627625
ns2627061917
ns1.00
vgg16(32, 32, 3, 64)/zygote/GPU/CUDA
26000726
ns26164340
ns0.99
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/2 thread(s)
539604
ns532083
ns1.01
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/4 thread(s)
396875
ns394416.5
ns1.01
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/8 thread(s)
3106167
ns2876833
ns1.08
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/1 thread(s)
866292
ns866292
ns1
Dense(512 => 512, gelu)(512 x 128)/forward/GPU/CUDA
47775
ns47203
ns1.01
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/2 thread(s)
1813250
ns1837916
ns0.99
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/4 thread(s)
1736667
ns1743166.5
ns1.00
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/8 thread(s)
16480542
ns16426958
ns1.00
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/1 thread(s)
2648000
ns2767000
ns0.96
Dense(512 => 512, gelu)(512 x 128)/zygote/GPU/CUDA
246886
ns245777
ns1.00
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/2 thread(s)
1867042
ns1958146
ns0.95
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/4 thread(s)
1816500
ns1839417
ns0.99
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/8 thread(s)
16523458
ns16382583
ns1.01
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/1 thread(s)
2741770.5
ns2793000
ns0.98
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/2 thread(s)
1439604.5
ns1483854.5
ns0.97
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/4 thread(s)
934625
ns1015145.5
ns0.92
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/8 thread(s)
1053375.5
ns1027417
ns1.03
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/1 thread(s)
2331625
ns2341250
ns1.00
mlp7layer_bn(tanh)(32 x 256)/forward/GPU/CUDA
580680
ns585975.5
ns0.99
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/2 thread(s)
5896895.5
ns5894542
ns1.00
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/4 thread(s)
8530979
ns8413959
ns1.01
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/8 thread(s)
26479875.5
ns25732583.5
ns1.03
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/1 thread(s)
7269958
ns7339459
ns0.99
mlp7layer_bn(tanh)(32 x 256)/zygote/GPU/CUDA
1365923.5
ns1337532.5
ns1.02
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/2 thread(s)
11687917
ns11684084
ns1.00
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/4 thread(s)
18462792
ns18235479.5
ns1.01
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/8 thread(s)
39354708.5
ns38621167
ns1.02
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/1 thread(s)
9551562.5
ns9550958
ns1.00
Dense(16 => 16, relu)(16 x 128)/forward/CPU/2 thread(s)
4541.5
ns2854.5
ns1.59
Dense(16 => 16, relu)(16 x 128)/forward/CPU/4 thread(s)
3000
ns2542
ns1.18
Dense(16 => 16, relu)(16 x 128)/forward/CPU/8 thread(s)
3333
ns3458
ns0.96
Dense(16 => 16, relu)(16 x 128)/forward/CPU/1 thread(s)
4750
ns2833
ns1.68
Dense(16 => 16, relu)(16 x 128)/forward/GPU/CUDA
25041
ns24549
ns1.02
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/2 thread(s)
7333.5
ns7208.5
ns1.02
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/4 thread(s)
7208
ns7292
ns0.99
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/8 thread(s)
7187.5
ns7312.5
ns0.98
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/1 thread(s)
7208
ns7166
ns1.01
Dense(16 => 16, relu)(16 x 128)/zygote/GPU/CUDA
213760.5
ns208459
ns1.03
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/2 thread(s)
8500
ns8416
ns1.01
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/4 thread(s)
8333
ns8583
ns0.97
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/8 thread(s)
8459
ns8542
ns0.99
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/1 thread(s)
6167
ns6042
ns1.02
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/2 thread(s)
10375
ns10604
ns0.98
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/4 thread(s)
13833
ns13125
ns1.05
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/8 thread(s)
11229.5
ns11375
ns0.99
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/1 thread(s)
9250
ns7750
ns1.19
Dense(16 => 16, gelu)(16 x 128)/forward/GPU/CUDA
25667
ns24844
ns1.03
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/2 thread(s)
20041
ns19833
ns1.01
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/4 thread(s)
19917
ns20250
ns0.98
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/8 thread(s)
20083
ns20125
ns1.00
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/1 thread(s)
19584
ns19875
ns0.99
Dense(16 => 16, gelu)(16 x 128)/zygote/GPU/CUDA
233795.5
ns227881
ns1.03
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/2 thread(s)
23833
ns23583
ns1.01
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/4 thread(s)
23541.5
ns23959
ns0.98
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/8 thread(s)
23750
ns23916
ns0.99
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/1 thread(s)
21333
ns21250
ns1.00
Dense(128 => 128, identity)(128 x 128)/forward/CPU/2 thread(s)
28542
ns28750
ns0.99
Dense(128 => 128, identity)(128 x 128)/forward/CPU/4 thread(s)
28542
ns28625
ns1.00
Dense(128 => 128, identity)(128 x 128)/forward/CPU/8 thread(s)
28750
ns28834
ns1.00
Dense(128 => 128, identity)(128 x 128)/forward/CPU/1 thread(s)
46083
ns46792
ns0.98
Dense(128 => 128, identity)(128 x 128)/forward/GPU/CUDA
26413
ns25949
ns1.02
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/2 thread(s)
227625
ns227833.5
ns1.00
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/4 thread(s)
277333
ns271458
ns1.02
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/8 thread(s)
3752584
ns4319479
ns0.87
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/1 thread(s)
145792
ns145375
ns1.00
Dense(128 => 128, identity)(128 x 128)/zygote/GPU/CUDA
215287
ns205742
ns1.05
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/2 thread(s)
246083
ns247458.5
ns0.99
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/4 thread(s)
294959
ns289334
ns1.02
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/8 thread(s)
4140167
ns4121125
ns1.00
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/1 thread(s)
145458
ns145417
ns1.00
Dense(16 => 16, identity)(16 x 128)/forward/CPU/2 thread(s)
3875
ns2250
ns1.72
Dense(16 => 16, identity)(16 x 128)/forward/CPU/4 thread(s)
1792
ns2125
ns0.84
Dense(16 => 16, identity)(16 x 128)/forward/CPU/8 thread(s)
2291.5
ns2292
ns1.00
Dense(16 => 16, identity)(16 x 128)/forward/CPU/1 thread(s)
1958
ns2084
ns0.94
Dense(16 => 16, identity)(16 x 128)/forward/GPU/CUDA
23326
ns22754
ns1.03
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/2 thread(s)
5333
ns5458
ns0.98
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/4 thread(s)
5125
ns5209
ns0.98
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/8 thread(s)
5250
ns5375
ns0.98
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/1 thread(s)
5125
ns5250
ns0.98
Dense(16 => 16, identity)(16 x 128)/zygote/GPU/CUDA
246332
ns246015
ns1.00
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/2 thread(s)
7625
ns7625
ns1
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/4 thread(s)
7416
ns7584
ns0.98
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/8 thread(s)
7770.5
ns7625
ns1.02
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/1 thread(s)
5250
ns5209
ns1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
80124625
ns79946667
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
47921000
ns47907313
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
43331166.5
ns43314937.5
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
151470167
ns151418166
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/GPU/CUDA
2687344
ns2713892
ns0.99
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
672319791
ns607476458
ns1.11
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
413871833
ns410727459
ns1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
397456333.5
ns397021709
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
687252833
ns682137042
ns1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
14598552.5
ns14584860
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
695248479.5
ns713299583
ns0.97
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
677318208
ns679401291
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
996212291
ns1004795584
ns0.99
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
997847458
ns1000525250
ns1.00
This comment was automatically generated by workflow using github-action-benchmark.