You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Did you know you can add release notes too? Just add markdown formatted text underneath the comment after the text
"Release notes:" and it will be added to the registry PR, and if TagBot is installed it will also be added to the
release that TagBot creates. i.e.
0af6fd2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JuliaRegistrator register
0af6fd2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Registration pull request created: JuliaRegistries/General/115730
Tip: Release Notes
Did you know you can add release notes too? Just add markdown formatted text underneath the comment after the text
"Release notes:" and it will be added to the registry PR, and if TagBot is installed it will also be added to the
release that TagBot creates. i.e.
To add them here just re-invoke and the PR will be updated.
Tagging
After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version.
This will be done automatically if the Julia TagBot GitHub Action is installed, or can be done manually through the github interface, or via:
0af6fd2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lux Benchmarks
Dense(512 => 512, identity)(512 x 128)/forward/CPU/2 thread(s)
414541
ns415000
ns1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/4 thread(s)
243125
ns244167
ns1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/8 thread(s)
243417
ns243917
ns1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/1 thread(s)
739334
ns740083
ns1.00
Dense(512 => 512, identity)(512 x 128)/forward/GPU/CUDA
41982
ns43793
ns0.96
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/2 thread(s)
1346458
ns1280333
ns1.05
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/4 thread(s)
1259125
ns1268791
ns0.99
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/8 thread(s)
16338395.5
ns16455125
ns0.99
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/1 thread(s)
2194459
ns2193625.5
ns1.00
Dense(512 => 512, identity)(512 x 128)/zygote/GPU/CUDA
189479
ns205231
ns0.92
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/2 thread(s)
1355250
ns1311917
ns1.03
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/4 thread(s)
1262583
ns1301792
ns0.97
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/8 thread(s)
16497166
ns16522625
ns1.00
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/1 thread(s)
2228292
ns2229625
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
1762584
ns1672666
ns1.05
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1092041
ns1078166
ns1.01
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1568312.5
ns1511041.5
ns1.04
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
2959792
ns2994458
ns0.99
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/GPU/CUDA
205112
ns207884
ns0.99
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
12142250.5
ns12154146
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
8835812
ns8856791
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
9242459
ns9297792
ns0.99
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
18585833
ns18579708
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1492755
ns1492665
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
17309458
ns17297396
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
13985458
ns13998833
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
14508333.5
ns14511000
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
21837437.5
ns21839416
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
250564062.5
ns250544729
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
148884708
ns148581208
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
116487583.5
ns116355916.5
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
447218375
ns447348667
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/GPU/CUDA
5458403
ns5449372
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
1224183333
ns1226769166
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
933732541
ns930331417
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
831622729.5
ns829560312.5
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
1635560416
ns1631272125
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
35425612
ns31620503.5
ns1.12
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
1152095666
ns1143568125
ns1.01
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
993335854.5
ns993275583.5
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
1312403208.5
ns1332092333.5
ns0.99
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
1733303125
ns1732940916.5
ns1.00
lenet(28, 28, 1, 32)/forward/CPU/2 thread(s)
1117750
ns1119875
ns1.00
lenet(28, 28, 1, 32)/forward/CPU/4 thread(s)
1598750.5
ns1650333
ns0.97
lenet(28, 28, 1, 32)/forward/CPU/8 thread(s)
3754208
ns3433334
ns1.09
lenet(28, 28, 1, 32)/forward/CPU/1 thread(s)
783083.5
ns782354
ns1.00
lenet(28, 28, 1, 32)/forward/GPU/CUDA
257385.5
ns263984.5
ns0.98
lenet(28, 28, 1, 32)/zygote/CPU/2 thread(s)
2990333
ns2986166
ns1.00
lenet(28, 28, 1, 32)/zygote/CPU/4 thread(s)
4119917
ns4134521
ns1.00
lenet(28, 28, 1, 32)/zygote/CPU/8 thread(s)
11230812
ns9684479
ns1.16
lenet(28, 28, 1, 32)/zygote/CPU/1 thread(s)
3197229
ns3141166
ns1.02
lenet(28, 28, 1, 32)/zygote/GPU/CUDA
1046508.5
ns1099110
ns0.95
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
2301708
ns2222125
ns1.04
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1307291
ns1310979
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1562625
ns1561042
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
4228312.5
ns4207458
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/GPU/CUDA
207810
ns208127
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
19397854
ns19407062.5
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
16093709
ns16092937.5
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
17349000
ns17317479
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
25897042
ns25877354.5
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1578941.5
ns1588570
ns0.99
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
33935833
ns34283042
ns0.99
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
30962375
ns31029667
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
31278499.5
ns31324334
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
37009375
ns36972625
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
4535625.5
ns4535728.5
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2541958
ns2550437.5
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2676729
ns2682521
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
8399042
ns8376542
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/GPU/CUDA
421981
ns420059
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
38713291
ns38787729
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
32067250
ns32133646
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
32235000
ns32252916
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
52021374.5
ns51916459
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
2612398
ns2624143
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
89340500
ns88908791
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
115182063
ns114840750
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
226080500
ns227998375
ns0.99
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
74787250
ns74777479
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
268783625
ns269000958
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
156455958
ns156605625
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
123498687
ns123282250
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
485358792
ns485266417
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/GPU/CUDA
6886357
ns7007944
ns0.98
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
1473244563
ns1477600500.5
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
1170669416
ns1177860417
ns0.99
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
1068236875
ns1059255604.5
ns1.01
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
2007699500
ns2001527437.5
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
33116264.5
ns34509709
ns0.96
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
1722112916
ns1725457125
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
1526979229.5
ns1535708771
ns0.99
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
1854066584
ns1892793750
ns0.98
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
2208474333
ns2208396292
ns1.00
lenet(28, 28, 1, 128)/forward/CPU/2 thread(s)
2098500
ns2072875
ns1.01
lenet(28, 28, 1, 128)/forward/CPU/4 thread(s)
3028958
ns3011791
ns1.01
lenet(28, 28, 1, 128)/forward/CPU/8 thread(s)
9208917
ns8320459
ns1.11
lenet(28, 28, 1, 128)/forward/CPU/1 thread(s)
2506917
ns2450499.5
ns1.02
lenet(28, 28, 1, 128)/forward/GPU/CUDA
250413
ns268533.5
ns0.93
lenet(28, 28, 1, 128)/zygote/CPU/2 thread(s)
9640708.5
ns9519292
ns1.01
lenet(28, 28, 1, 128)/zygote/CPU/4 thread(s)
12027146
ns12095020.5
ns0.99
lenet(28, 28, 1, 128)/zygote/CPU/8 thread(s)
23902417
ns24991500
ns0.96
lenet(28, 28, 1, 128)/zygote/CPU/1 thread(s)
11729666.5
ns11770084
ns1.00
lenet(28, 28, 1, 128)/zygote/GPU/CUDA
1076622
ns1173232
ns0.92
vgg16(32, 32, 3, 32)/forward/CPU/2 thread(s)
381127000
ns383052437.5
ns0.99
vgg16(32, 32, 3, 32)/forward/CPU/4 thread(s)
309063896
ns311828042
ns0.99
vgg16(32, 32, 3, 32)/forward/CPU/8 thread(s)
262337458
ns269993541.5
ns0.97
vgg16(32, 32, 3, 32)/forward/CPU/1 thread(s)
453768395.5
ns452443833.5
ns1.00
vgg16(32, 32, 3, 32)/forward/GPU/CUDA
4853131.5
ns4865362.5
ns1.00
vgg16(32, 32, 3, 32)/zygote/CPU/2 thread(s)
1158643375
ns1155538583
ns1.00
vgg16(32, 32, 3, 32)/zygote/CPU/4 thread(s)
938969625
ns936810083
ns1.00
vgg16(32, 32, 3, 32)/zygote/CPU/8 thread(s)
963762041
ns959183583
ns1.00
vgg16(32, 32, 3, 32)/zygote/CPU/1 thread(s)
1577954500
ns1397577000
ns1.13
vgg16(32, 32, 3, 32)/zygote/GPU/CUDA
18263437
ns19191910
ns0.95
lenet(28, 28, 1, 64)/forward/CPU/2 thread(s)
1057541
ns1053520.5
ns1.00
lenet(28, 28, 1, 64)/forward/CPU/4 thread(s)
1646584
ns1668459
ns0.99
lenet(28, 28, 1, 64)/forward/CPU/8 thread(s)
5451187.5
ns5692083
ns0.96
lenet(28, 28, 1, 64)/forward/CPU/1 thread(s)
1369833
ns1396104.5
ns0.98
lenet(28, 28, 1, 64)/forward/GPU/CUDA
249867.5
ns270444.5
ns0.92
lenet(28, 28, 1, 64)/zygote/CPU/2 thread(s)
6519125
ns6494584
ns1.00
lenet(28, 28, 1, 64)/zygote/CPU/4 thread(s)
13095375
ns13134333
ns1.00
lenet(28, 28, 1, 64)/zygote/CPU/8 thread(s)
18952875
ns19522667
ns0.97
lenet(28, 28, 1, 64)/zygote/CPU/1 thread(s)
5956188
ns6062833
ns0.98
lenet(28, 28, 1, 64)/zygote/GPU/CUDA
1117980
ns1205114.5
ns0.93
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
70611125
ns70593167
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
43793833
ns43687500
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
39731937.5
ns39756500
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
132592666.5
ns132546521
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/GPU/CUDA
1854679
ns1861025.5
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
356226125
ns356256979
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
270572708
ns270180000
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
254771104
ns253147750
ns1.01
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
534785021
ns535028854
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
12169343.5
ns12303646
ns0.99
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
397173792
ns400021667
ns0.99
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
374448417
ns374059625
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
694189709
ns723689958.5
ns0.96
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
713602000
ns712462250
ns1.00
vgg16(32, 32, 3, 128)/forward/CPU/2 thread(s)
1192048542
ns1195955667
ns1.00
vgg16(32, 32, 3, 128)/forward/CPU/4 thread(s)
831171854.5
ns833640041.5
ns1.00
vgg16(32, 32, 3, 128)/forward/CPU/8 thread(s)
640669937.5
ns641220229.5
ns1.00
vgg16(32, 32, 3, 128)/forward/CPU/1 thread(s)
1863728709
ns1769113729
ns1.05
vgg16(32, 32, 3, 128)/forward/GPU/CUDA
12532816
ns12497145
ns1.00
vgg16(32, 32, 3, 128)/zygote/CPU/2 thread(s)
3618000479
ns3639556520.5
ns0.99
vgg16(32, 32, 3, 128)/zygote/CPU/4 thread(s)
2826885834
ns2825360333
ns1.00
vgg16(32, 32, 3, 128)/zygote/CPU/8 thread(s)
2713615750
ns2702765709
ns1.00
vgg16(32, 32, 3, 128)/zygote/CPU/1 thread(s)
5011045750
ns5019640833
ns1.00
vgg16(32, 32, 3, 128)/zygote/GPU/CUDA
50402599.5
ns49951471
ns1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
3419875
ns3421500
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2065250
ns2074979
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2533333.5
ns2545666
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
6025000
ns6030125
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/GPU/CUDA
337456.5
ns343299
ns0.98
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
26007750
ns26132666.5
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
18963500
ns19030500
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
19385500.5
ns19345021
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
39322542
ns39337834
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
2456325
ns2467033.5
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
54524645.5
ns54504542
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
82936812.5
ns81980333
ns1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
182520645.5
ns173279167
ns1.05
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
45591687.5
ns45606041
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
1779791
ns1787396
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1093417
ns1095125
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1588750
ns1559166
ns1.02
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
3034417
ns3050791
ns0.99
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/GPU/CUDA
208871.5
ns213819
ns0.98
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
12541750
ns12546291
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
9203791.5
ns9225062.5
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
9610729
ns9642333.5
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
18987500
ns19019500
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1509397.5
ns1532922
ns0.98
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
17660375
ns17668667
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
14333917
ns14332167
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
14545395.5
ns14597000
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
22202979
ns22175750.5
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
70525542
ns70541417
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
43667833
ns43674667
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
39614000
ns39704500
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
132626625
ns132649271
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/GPU/CUDA
1873199.5
ns1938611
ns0.97
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
360567208
ns361084062.5
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
348361750
ns347061583.5
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
305569375
ns305013375
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
726844292
ns723885708
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
13232795
ns13388921
ns0.99
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
421270563
ns425519667
ns0.99
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
430893167
ns427658750
ns1.01
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
718848354
ns736440729.5
ns0.98
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
717686250
ns715989083
ns1.00
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/2 thread(s)
1569104
ns1596542
ns0.98
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/4 thread(s)
1153229.5
ns1135916
ns1.02
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/8 thread(s)
1137979
ns1138166.5
ns1.00
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/1 thread(s)
2453021
ns2412708
ns1.02
mlp7layer_bn(gelu)(32 x 256)/forward/GPU/CUDA
547553.5
ns587435
ns0.93
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/2 thread(s)
8870542
ns8847312
ns1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/4 thread(s)
13929562.5
ns13684021
ns1.02
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/8 thread(s)
34331208.5
ns32863792
ns1.04
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/1 thread(s)
9863166.5
ns9875083
ns1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/GPU/CUDA
1211835
ns1416297.5
ns0.86
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/2 thread(s)
16622562.5
ns16549687.5
ns1.00
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/4 thread(s)
22981042
ns22946333.5
ns1.00
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/8 thread(s)
47608083
ns47499854
ns1.00
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/1 thread(s)
13135583
ns13135792
ns1.00
Dense(512 => 512, relu)(512 x 128)/forward/CPU/2 thread(s)
823083.5
ns827646
ns0.99
Dense(512 => 512, relu)(512 x 128)/forward/CPU/4 thread(s)
519667
ns514125
ns1.01
Dense(512 => 512, relu)(512 x 128)/forward/CPU/8 thread(s)
1027437.5
ns1076104
ns0.95
Dense(512 => 512, relu)(512 x 128)/forward/CPU/1 thread(s)
725812.5
ns725021
ns1.00
Dense(512 => 512, relu)(512 x 128)/forward/GPU/CUDA
45006
ns47722
ns0.94
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/2 thread(s)
1550396
ns1531958
ns1.01
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/4 thread(s)
1010312
ns1005542
ns1.00
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/8 thread(s)
1368833
ns1422834
ns0.96
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/1 thread(s)
2294958
ns2290271
ns1.00
Dense(512 => 512, relu)(512 x 128)/zygote/GPU/CUDA
206718.5
ns235161
ns0.88
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/2 thread(s)
1539958
ns1550625
ns0.99
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/4 thread(s)
1027083
ns1063666.5
ns0.97
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/8 thread(s)
1434958.5
ns1456541
ns0.99
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/1 thread(s)
2262000
ns2260042
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
3395958.5
ns3417917
ns0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2043562.5
ns2065041
ns0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2510645.5
ns2482708
ns1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
6010208
ns6009500
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/GPU/CUDA
278141
ns284432
ns0.98
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
24077834
ns24080042
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
17174000
ns17195500
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
17077208
ns17121125
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
37552104.5
ns37501854
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
2386026
ns2416353
ns0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
52856250.5
ns52890167
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
85034770.5
ns84990875
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
179760229.5
ns173811125
ns1.03
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
44607791.5
ns44527208
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
250724750
ns250510875
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
148549708
ns148711500
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
116156104
ns116106354
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
448130375
ns447706104
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/GPU/CUDA
5468960
ns5473947
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
1101806791
ns1104910333
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
854968041.5
ns852696229
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
832535437.5
ns828124666.5
ns1.01
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
1749893542
ns1753883208
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
33175899
ns29129663
ns1.14
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
1028461562.5
ns1027987062.5
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
965583583
ns967528166
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
1298573500
ns1323494083.5
ns0.98
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
1729774999.5
ns1721562854.5
ns1.00
mlp7layer_bn(relu)(32 x 256)/forward/CPU/2 thread(s)
1199187.5
ns1199000
ns1.00
mlp7layer_bn(relu)(32 x 256)/forward/CPU/4 thread(s)
679771
ns722000
ns0.94
mlp7layer_bn(relu)(32 x 256)/forward/CPU/8 thread(s)
679125
ns723333.5
ns0.94
mlp7layer_bn(relu)(32 x 256)/forward/CPU/1 thread(s)
1968791.5
ns2059938
ns0.96
mlp7layer_bn(relu)(32 x 256)/forward/GPU/CUDA
548897.5
ns566089.5
ns0.97
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/2 thread(s)
5868125
ns5883354
ns1.00
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/4 thread(s)
8863000
ns9012521
ns0.98
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/8 thread(s)
26024792
ns26898459
ns0.97
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/1 thread(s)
7119500
ns7112042
ns1.00
mlp7layer_bn(relu)(32 x 256)/zygote/GPU/CUDA
1215129
ns1371381.5
ns0.89
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/2 thread(s)
9672145.5
ns9684083
ns1.00
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/4 thread(s)
16106084
ns16051250
ns1.00
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/8 thread(s)
34274604
ns33056542
ns1.04
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/1 thread(s)
7600542
ns7626499.5
ns1.00
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/2 thread(s)
517041
ns522916.5
ns0.99
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/4 thread(s)
378770.5
ns390125.5
ns0.97
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/8 thread(s)
2622459
ns3390917
ns0.77
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/1 thread(s)
89167
ns89292
ns1.00
Dense(128 => 128, gelu)(128 x 128)/forward/GPU/CUDA
25879
ns28324
ns0.91
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/2 thread(s)
380083.5
ns380812.5
ns1.00
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/4 thread(s)
442125
ns444875
ns0.99
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/8 thread(s)
4454792
ns5040083.5
ns0.88
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/1 thread(s)
258584
ns259041
ns1.00
Dense(128 => 128, gelu)(128 x 128)/zygote/GPU/CUDA
187434
ns219450.5
ns0.85
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/2 thread(s)
411084
ns411083
ns1.00
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/4 thread(s)
472750
ns475270.5
ns0.99
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/8 thread(s)
4766688
ns4889250
ns0.97
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/1 thread(s)
270916
ns271084
ns1.00
Dense(128 => 128, relu)(128 x 128)/forward/CPU/2 thread(s)
468292
ns465208.5
ns1.01
Dense(128 => 128, relu)(128 x 128)/forward/CPU/4 thread(s)
318292
ns318584
ns1.00
Dense(128 => 128, relu)(128 x 128)/forward/CPU/8 thread(s)
735291
ns778771
ns0.94
Dense(128 => 128, relu)(128 x 128)/forward/CPU/1 thread(s)
52750
ns54354.5
ns0.97
Dense(128 => 128, relu)(128 x 128)/forward/GPU/CUDA
25952
ns28220
ns0.92
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/2 thread(s)
338166.5
ns340333
ns0.99
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/4 thread(s)
339291
ns341958
ns0.99
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/8 thread(s)
517208.5
ns734125
ns0.70
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/1 thread(s)
151625
ns151417
ns1.00
Dense(128 => 128, relu)(128 x 128)/zygote/GPU/CUDA
180334.5
ns205814.5
ns0.88
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/2 thread(s)
352187.5
ns351792
ns1.00
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/4 thread(s)
354833
ns356604.5
ns1.00
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/8 thread(s)
918834
ns935583
ns0.98
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/1 thread(s)
150875
ns151000
ns1.00
vgg16(32, 32, 3, 64)/forward/CPU/2 thread(s)
603168958
ns606312458
ns0.99
vgg16(32, 32, 3, 64)/forward/CPU/4 thread(s)
428255645.5
ns430997020.5
ns0.99
vgg16(32, 32, 3, 64)/forward/CPU/8 thread(s)
387565458
ns382921125
ns1.01
vgg16(32, 32, 3, 64)/forward/CPU/1 thread(s)
872582458
ns871105000
ns1.00
vgg16(32, 32, 3, 64)/forward/GPU/CUDA
7022166.5
ns7038469
ns1.00
vgg16(32, 32, 3, 64)/zygote/CPU/2 thread(s)
2000369604.5
ns2005974042
ns1.00
vgg16(32, 32, 3, 64)/zygote/CPU/4 thread(s)
1635627395.5
ns1610239562.5
ns1.02
vgg16(32, 32, 3, 64)/zygote/CPU/8 thread(s)
1619963354.5
ns1558401520.5
ns1.04
vgg16(32, 32, 3, 64)/zygote/CPU/1 thread(s)
2622232458
ns2631627625
ns1.00
vgg16(32, 32, 3, 64)/zygote/GPU/CUDA
26705505
ns26000726
ns1.03
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/2 thread(s)
521375
ns539604
ns0.97
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/4 thread(s)
396229.5
ns396875
ns1.00
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/8 thread(s)
2919542
ns3106167
ns0.94
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/1 thread(s)
865354.5
ns866292
ns1.00
Dense(512 => 512, gelu)(512 x 128)/forward/GPU/CUDA
45399
ns47775
ns0.95
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/2 thread(s)
1819958
ns1813250
ns1.00
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/4 thread(s)
1772584
ns1736667
ns1.02
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/8 thread(s)
16344771.5
ns16480542
ns0.99
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/1 thread(s)
2757458
ns2648000
ns1.04
Dense(512 => 512, gelu)(512 x 128)/zygote/GPU/CUDA
216751.5
ns246886
ns0.88
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/2 thread(s)
1889750
ns1867042
ns1.01
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/4 thread(s)
1865354
ns1816500
ns1.03
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/8 thread(s)
16504208.5
ns16523458
ns1.00
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/1 thread(s)
2783666.5
ns2741770.5
ns1.02
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/2 thread(s)
1349166
ns1439604.5
ns0.94
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/4 thread(s)
935416
ns934625
ns1.00
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/8 thread(s)
1011125
ns1053375.5
ns0.96
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/1 thread(s)
2227541
ns2331625
ns0.96
mlp7layer_bn(tanh)(32 x 256)/forward/GPU/CUDA
544870
ns580680
ns0.94
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/2 thread(s)
5903542
ns5896895.5
ns1.00
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/4 thread(s)
8786042
ns8530979
ns1.03
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/8 thread(s)
26388917
ns26479875.5
ns1.00
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/1 thread(s)
7336208
ns7269958
ns1.01
mlp7layer_bn(tanh)(32 x 256)/zygote/GPU/CUDA
1179081.5
ns1365923.5
ns0.86
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/2 thread(s)
11703291.5
ns11687917
ns1.00
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/4 thread(s)
18250333
ns18462792
ns0.99
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/8 thread(s)
38934750
ns39354708.5
ns0.99
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/1 thread(s)
9538708
ns9551562.5
ns1.00
Dense(16 => 16, relu)(16 x 128)/forward/CPU/2 thread(s)
2625
ns4541.5
ns0.58
Dense(16 => 16, relu)(16 x 128)/forward/CPU/4 thread(s)
2875
ns3000
ns0.96
Dense(16 => 16, relu)(16 x 128)/forward/CPU/8 thread(s)
3583
ns3333
ns1.08
Dense(16 => 16, relu)(16 x 128)/forward/CPU/1 thread(s)
2584
ns4750
ns0.54
Dense(16 => 16, relu)(16 x 128)/forward/GPU/CUDA
22526
ns25041
ns0.90
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/2 thread(s)
7187.5
ns7333.5
ns0.98
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/4 thread(s)
7334
ns7208
ns1.02
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/8 thread(s)
7208
ns7187.5
ns1.00
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/1 thread(s)
7333
ns7208
ns1.02
Dense(16 => 16, relu)(16 x 128)/zygote/GPU/CUDA
179733.5
ns213760.5
ns0.84
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/2 thread(s)
8250
ns8500
ns0.97
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/4 thread(s)
8208.5
ns8333
ns0.99
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/8 thread(s)
8500
ns8459
ns1.00
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/1 thread(s)
5958
ns6167
ns0.97
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/2 thread(s)
10542
ns10375
ns1.02
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/4 thread(s)
13063
ns13833
ns0.94
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/8 thread(s)
10625
ns11229.5
ns0.95
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/1 thread(s)
7208
ns9250
ns0.78
Dense(16 => 16, gelu)(16 x 128)/forward/GPU/CUDA
22650
ns25667
ns0.88
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/2 thread(s)
19917
ns20041
ns0.99
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/4 thread(s)
19750
ns19917
ns0.99
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/8 thread(s)
19958
ns20083
ns0.99
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/1 thread(s)
20042
ns19584
ns1.02
Dense(16 => 16, gelu)(16 x 128)/zygote/GPU/CUDA
194280
ns233795.5
ns0.83
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/2 thread(s)
23500
ns23833
ns0.99
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/4 thread(s)
23458
ns23541.5
ns1.00
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/8 thread(s)
23750
ns23750
ns1
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/1 thread(s)
21250
ns21333
ns1.00
Dense(128 => 128, identity)(128 x 128)/forward/CPU/2 thread(s)
28958
ns28542
ns1.01
Dense(128 => 128, identity)(128 x 128)/forward/CPU/4 thread(s)
28334
ns28542
ns0.99
Dense(128 => 128, identity)(128 x 128)/forward/CPU/8 thread(s)
28459
ns28750
ns0.99
Dense(128 => 128, identity)(128 x 128)/forward/CPU/1 thread(s)
46209
ns46083
ns1.00
Dense(128 => 128, identity)(128 x 128)/forward/GPU/CUDA
24228
ns26413
ns0.92
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/2 thread(s)
229125
ns227625
ns1.01
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/4 thread(s)
279750
ns277333
ns1.01
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/8 thread(s)
4047542
ns3752584
ns1.08
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/1 thread(s)
145125
ns145792
ns1.00
Dense(128 => 128, identity)(128 x 128)/zygote/GPU/CUDA
188066
ns215287
ns0.87
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/2 thread(s)
246875
ns246083
ns1.00
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/4 thread(s)
297500
ns294959
ns1.01
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/8 thread(s)
4136041
ns4140167
ns1.00
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/1 thread(s)
145375
ns145458
ns1.00
Dense(16 => 16, identity)(16 x 128)/forward/CPU/2 thread(s)
2000
ns3875
ns0.52
Dense(16 => 16, identity)(16 x 128)/forward/CPU/4 thread(s)
2000
ns1792
ns1.12
Dense(16 => 16, identity)(16 x 128)/forward/CPU/8 thread(s)
2604.5
ns2291.5
ns1.14
Dense(16 => 16, identity)(16 x 128)/forward/CPU/1 thread(s)
1833
ns1958
ns0.94
Dense(16 => 16, identity)(16 x 128)/forward/GPU/CUDA
21214
ns23326
ns0.91
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/2 thread(s)
5083.5
ns5333
ns0.95
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/4 thread(s)
5333
ns5125
ns1.04
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/8 thread(s)
5167
ns5250
ns0.98
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/1 thread(s)
5375
ns5125
ns1.05
Dense(16 => 16, identity)(16 x 128)/zygote/GPU/CUDA
230179
ns246332
ns0.93
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/2 thread(s)
7375
ns7625
ns0.97
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/4 thread(s)
7416
ns7416
ns1
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/8 thread(s)
7500
ns7770.5
ns0.97
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/1 thread(s)
5125
ns5250
ns0.98
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
80141959
ns80124625
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
47888229
ns47921000
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
43255041.5
ns43331166.5
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
151557417
ns151470167
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/GPU/CUDA
2714800
ns2687344
ns1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
606608917
ns672319791
ns0.90
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
409612958
ns413871833
ns0.99
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
397283458.5
ns397456333.5
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
684454375
ns687252833
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
17052645
ns14598552.5
ns1.17
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
715895500
ns695248479.5
ns1.03
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
683149083
ns677318208
ns1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
989952709
ns996212291
ns0.99
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
998157291
ns997847458
ns1.00
This comment was automatically generated by workflow using github-action-benchmark.