Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
chore: bump compat for DataAugmentation to 0.3 for package DDIM, (kee…
…p existing compat) (#877) Co-authored-by: CompatHelper Julia <compathelper_noreply@julialang.org>
- Loading branch information
59f83fc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lux Benchmarks
Dense(512 => 512, identity)(512 x 128)/forward/CPU/2 thread(s)
412520.5
ns414937.5
ns0.99
Dense(512 => 512, identity)(512 x 128)/forward/CPU/4 thread(s)
323042
ns322917
ns1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/8 thread(s)
323583
ns323167
ns1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/1 thread(s)
752166.5
ns739334
ns1.02
Dense(512 => 512, identity)(512 x 128)/forward/GPU/CUDA
44168
ns43603
ns1.01
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/2 thread(s)
1384083
ns1281041.5
ns1.08
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/4 thread(s)
2451854
ns2448000
ns1.00
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/8 thread(s)
14238812.5
ns14112208.5
ns1.01
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/1 thread(s)
2239125
ns2281500
ns0.98
Dense(512 => 512, identity)(512 x 128)/zygote/GPU/CUDA
210250
ns209418
ns1.00
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/2 thread(s)
1411875
ns1389292
ns1.02
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/4 thread(s)
897520.5
ns885541
ns1.01
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/8 thread(s)
1516292
ns1564334
ns0.97
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/1 thread(s)
2210229
ns2244666
ns0.98
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
1725583
ns1768458
ns0.98
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1017708.5
ns1070292
ns0.95
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1538333
ns1534708
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
3006583
ns2945750
ns1.02
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/GPU/CUDA
210559
ns210107
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
12112667
ns12156916
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
8809666.5
ns8795791
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
9192709
ns9216583
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
18570834
ns18566125
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1504910
ns1491618
ns1.01
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
17273542
ns17331499.5
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
13992292
ns13987542
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
14538625
ns14472812.5
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
21824875
ns21820291
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
249443729
ns249342958.5
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
148456250
ns148241750
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
115795563
ns116015000
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
454024458
ns453798708
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/GPU/CUDA
5474002
ns5449223
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
1144391209
ns1146266250
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
981113333
ns981704875
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
853440021
ns841522708.5
ns1.01
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
1805007208
ns1759323917
ns1.03
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
31357343
ns31586701
ns0.99
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
1034466750
ns1042907459
ns0.99
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
1009660729.5
ns1000416291.5
ns1.01
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
1324456604
ns1298076750
ns1.02
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
1728354792
ns1737205625
ns0.99
lenet(28, 28, 1, 32)/forward/CPU/2 thread(s)
1093583
ns1119562.5
ns0.98
lenet(28, 28, 1, 32)/forward/CPU/4 thread(s)
1583083
ns1622875
ns0.98
lenet(28, 28, 1, 32)/forward/CPU/8 thread(s)
3678000
ns3548958
ns1.04
lenet(28, 28, 1, 32)/forward/CPU/1 thread(s)
779625
ns785583
ns0.99
lenet(28, 28, 1, 32)/forward/GPU/CUDA
273068.5
ns274324
ns1.00
lenet(28, 28, 1, 32)/zygote/CPU/2 thread(s)
2985458.5
ns3038000
ns0.98
lenet(28, 28, 1, 32)/zygote/CPU/4 thread(s)
4106125
ns4083437.5
ns1.01
lenet(28, 28, 1, 32)/zygote/CPU/8 thread(s)
10555937
ns11037583
ns0.96
lenet(28, 28, 1, 32)/zygote/CPU/1 thread(s)
3131667
ns3144521
ns1.00
lenet(28, 28, 1, 32)/zygote/GPU/CUDA
1134574.5
ns1135705
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
2275083
ns2308333.5
ns0.99
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1429583
ns1430208
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1656125
ns1667021
ns0.99
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
4200438
ns4209459
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/GPU/CUDA
210634
ns210374
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
19375958
ns19417292
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
16086292
ns16085209
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
17180583
ns17361667
ns0.99
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
25782875
ns25874854.5
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1606705
ns1598161.5
ns1.01
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
34182625
ns34253375
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
30811875
ns30840208
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
31108104
ns31540625
ns0.99
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
36403791
ns36820354
ns0.99
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
4540667
ns4533500
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2769500.5
ns2754437.5
ns1.01
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2921250
ns2922958.5
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
8391917
ns8379896
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/GPU/CUDA
423308
ns425541
ns0.99
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
39022250
ns38931833.5
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
32067021
ns32059166
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
32250916.5
ns32304958
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
51820375
ns51832000
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
2657162.5
ns2625272.5
ns1.01
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
88606874.5
ns89088083.5
ns0.99
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
113796125
ns114374167
ns0.99
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
223648041
ns224208209
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
74335583.5
ns74528979
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
267029417
ns268596959
ns0.99
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
158942229.5
ns159233333.5
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
126886229
ns126780333
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
487631541
ns484901875
ns1.01
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/GPU/CUDA
6889435
ns7002114
ns0.98
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
1474300812.5
ns1474144916.5
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
1174433750
ns1144467750
ns1.03
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
1063095500
ns1075737187.5
ns0.99
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
2007751479
ns2026289333.5
ns0.99
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
34685949
ns34635293
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
1689349708
ns1704908667
ns0.99
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
1535787500
ns1477917583.5
ns1.04
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
1814518792
ns1882348542
ns0.96
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
2211056708.5
ns2231847042
ns0.99
lenet(28, 28, 1, 128)/forward/CPU/2 thread(s)
2089187.5
ns2004166.5
ns1.04
lenet(28, 28, 1, 128)/forward/CPU/4 thread(s)
2976458
ns2569125
ns1.16
lenet(28, 28, 1, 128)/forward/CPU/8 thread(s)
7304583
ns6929417
ns1.05
lenet(28, 28, 1, 128)/forward/CPU/1 thread(s)
2476917
ns2435021
ns1.02
lenet(28, 28, 1, 128)/forward/GPU/CUDA
272072.5
ns267867
ns1.02
lenet(28, 28, 1, 128)/zygote/CPU/2 thread(s)
9643854
ns9579229.5
ns1.01
lenet(28, 28, 1, 128)/zygote/CPU/4 thread(s)
12014792
ns11450124.5
ns1.05
lenet(28, 28, 1, 128)/zygote/CPU/8 thread(s)
25647896
ns24113603.5
ns1.06
lenet(28, 28, 1, 128)/zygote/CPU/1 thread(s)
11736104
ns11704333
ns1.00
lenet(28, 28, 1, 128)/zygote/GPU/CUDA
1173736.5
ns1169053.5
ns1.00
vgg16(32, 32, 3, 32)/forward/CPU/2 thread(s)
380778209
ns380411333
ns1.00
vgg16(32, 32, 3, 32)/forward/CPU/4 thread(s)
282717792
ns282013417
ns1.00
vgg16(32, 32, 3, 32)/forward/CPU/8 thread(s)
238251708.5
ns241718292
ns0.99
vgg16(32, 32, 3, 32)/forward/CPU/1 thread(s)
453270208
ns452199062
ns1.00
vgg16(32, 32, 3, 32)/forward/GPU/CUDA
4856475
ns4861447
ns1.00
vgg16(32, 32, 3, 32)/zygote/CPU/2 thread(s)
1156978917
ns1177026875
ns0.98
vgg16(32, 32, 3, 32)/zygote/CPU/4 thread(s)
919622250
ns911798208
ns1.01
vgg16(32, 32, 3, 32)/zygote/CPU/8 thread(s)
945107000
ns959270583
ns0.99
vgg16(32, 32, 3, 32)/zygote/CPU/1 thread(s)
1428489000
ns1420082458
ns1.01
vgg16(32, 32, 3, 32)/zygote/GPU/CUDA
17978082
ns18016614
ns1.00
lenet(28, 28, 1, 64)/forward/CPU/2 thread(s)
1021959
ns1506958
ns0.68
lenet(28, 28, 1, 64)/forward/CPU/4 thread(s)
2001250
ns1619042
ns1.24
lenet(28, 28, 1, 64)/forward/CPU/8 thread(s)
6008000
ns6078187.5
ns0.99
lenet(28, 28, 1, 64)/forward/CPU/1 thread(s)
1374000
ns1294916
ns1.06
lenet(28, 28, 1, 64)/forward/GPU/CUDA
268964
ns267932
ns1.00
lenet(28, 28, 1, 64)/zygote/CPU/2 thread(s)
6414395.5
ns6812000
ns0.94
lenet(28, 28, 1, 64)/zygote/CPU/4 thread(s)
12403896
ns13135583
ns0.94
lenet(28, 28, 1, 64)/zygote/CPU/8 thread(s)
20716333
ns19155666
ns1.08
lenet(28, 28, 1, 64)/zygote/CPU/1 thread(s)
6079792
ns6056354
ns1.00
lenet(28, 28, 1, 64)/zygote/GPU/CUDA
1209955
ns1212499
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
70501749.5
ns70511583
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
43580771
ns43537375
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
39491375
ns39409417
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
132802458.5
ns133783708
ns0.99
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/GPU/CUDA
1859689
ns1933585.5
ns0.96
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
384818104
ns381557562.5
ns1.01
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
295632667
ns295764895.5
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
281694167
ns281324083
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
534727063
ns535257270.5
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
12284399.5
ns12290544.5
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
396068167
ns412505875
ns0.96
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
409321729.5
ns373209375
ns1.10
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
678917958
ns688004291
ns0.99
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
711312959
ns709404875
ns1.00
vgg16(32, 32, 3, 128)/forward/CPU/2 thread(s)
1190798042
ns1186087125
ns1.00
vgg16(32, 32, 3, 128)/forward/CPU/4 thread(s)
688321229
ns688362479
ns1.00
vgg16(32, 32, 3, 128)/forward/CPU/8 thread(s)
630150084
ns626514875
ns1.01
vgg16(32, 32, 3, 128)/forward/CPU/1 thread(s)
1776546083
ns1778854333.5
ns1.00
vgg16(32, 32, 3, 128)/forward/GPU/CUDA
12315985
ns12319166.5
ns1.00
vgg16(32, 32, 3, 128)/zygote/CPU/2 thread(s)
3607588771
ns3506982229
ns1.03
vgg16(32, 32, 3, 128)/zygote/CPU/4 thread(s)
2756374750
ns2794034750
ns0.99
vgg16(32, 32, 3, 128)/zygote/CPU/8 thread(s)
2714951667
ns2699392833
ns1.01
vgg16(32, 32, 3, 128)/zygote/CPU/1 thread(s)
4951023834
ns4950907833
ns1.00
vgg16(32, 32, 3, 128)/zygote/GPU/CUDA
49373771
ns49414957
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
3429083.5
ns3424125
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2066792
ns2051500
ns1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2527666
ns2533250
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
6016750
ns6031895.5
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/GPU/CUDA
311191
ns313046
ns0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
25518541
ns25554687.5
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
18527417
ns18540916
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
18707833
ns18962271
ns0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
38890083
ns38399291
ns1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
2479107
ns2470998
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
54171458
ns54650500
ns0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
78979625
ns78908438
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
171331479
ns169063625
ns1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
45540167
ns45558958
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
1785458
ns1786417
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1046062.5
ns1086125
ns0.96
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1583208.5
ns1603104.5
ns0.99
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
3024416.5
ns3030083
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/GPU/CUDA
213982
ns214935.5
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
12521375
ns12546292
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
9184167
ns9205583.5
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
9599958.5
ns9646125
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
18940458
ns18948583
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1538264
ns1529511.5
ns1.01
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
17640750
ns17691958
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
14307771
ns14322292
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
14507583
ns14657000
ns0.99
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
22177500
ns22150500
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
70512937
ns70485250
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
43444479.5
ns43560250
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
39626750
ns39651125
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
132598874.5
ns132456250
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/GPU/CUDA
1950639
ns1934167.5
ns1.01
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
359565417
ns359706666
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
293550333
ns289803812
ns1.01
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
287837104.5
ns287024520.5
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
622550708.5
ns620943458
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
13384881.5
ns13389118
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
419108729
ns418207750.5
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
424758959
ns426872417
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
717519375
ns708863792
ns1.01
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
716499833
ns714272667
ns1.00
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/2 thread(s)
1521229
ns1467229.5
ns1.04
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/4 thread(s)
1235833
ns1164958
ns1.06
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/8 thread(s)
1246625
ns1223792
ns1.02
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/1 thread(s)
2300875
ns2308375
ns1.00
mlp7layer_bn(gelu)(32 x 256)/forward/GPU/CUDA
587061.5
ns583756
ns1.01
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/2 thread(s)
8812333
ns8755666.5
ns1.01
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/4 thread(s)
12926416
ns12812083
ns1.01
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/8 thread(s)
30195584
ns31879208
ns0.95
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/1 thread(s)
9787000
ns9792458
ns1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/GPU/CUDA
1419851.5
ns1392390
ns1.02
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/2 thread(s)
18056125
ns17932750
ns1.01
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/4 thread(s)
16803125
ns17135625
ns0.98
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/8 thread(s)
29287584
ns29811583
ns0.98
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/1 thread(s)
14378083
ns14460729.5
ns0.99
Dense(512 => 512, relu)(512 x 128)/forward/CPU/2 thread(s)
805145.5
ns823083.5
ns0.98
Dense(512 => 512, relu)(512 x 128)/forward/CPU/4 thread(s)
589041.5
ns620625
ns0.95
Dense(512 => 512, relu)(512 x 128)/forward/CPU/8 thread(s)
1034812.5
ns1022854.5
ns1.01
Dense(512 => 512, relu)(512 x 128)/forward/CPU/1 thread(s)
726750
ns740791
ns0.98
Dense(512 => 512, relu)(512 x 128)/forward/GPU/CUDA
47938.5
ns47357.5
ns1.01
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/2 thread(s)
1542875
ns1528750
ns1.01
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/4 thread(s)
1000270.5
ns953917
ns1.05
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/8 thread(s)
1504041
ns1387583
ns1.08
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/1 thread(s)
2294104
ns2279146
ns1.01
Dense(512 => 512, relu)(512 x 128)/zygote/GPU/CUDA
236494.5
ns233369
ns1.01
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/2 thread(s)
1722687.5
ns1748792
ns0.99
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/4 thread(s)
1250438
ns1258250
ns0.99
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/8 thread(s)
1858854.5
ns1680104
ns1.11
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/1 thread(s)
2311917
ns2337833.5
ns0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
3404416
ns3398000.5
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2046208
ns2032875
ns1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2516916.5
ns2524750
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
6013625
ns5998916
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/GPU/CUDA
285181.5
ns282348
ns1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
24021312.5
ns24156145.5
ns0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
17217833
ns17254937.5
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
17101666.5
ns17217979.5
ns0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
37551396
ns37524604.5
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
2407620
ns2399084
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
52545812.5
ns52823083
ns0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
80522312.5
ns80975187.5
ns0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
166982250.5
ns170431562.5
ns0.98
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
44529604
ns44543937.5
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
250184208.5
ns251011396
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
147977833
ns148156125
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
115557083.5
ns115824354
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
447150583.5
ns454908937.5
ns0.98
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/GPU/CUDA
5457630
ns5336248.5
ns1.02
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
1128644583
ns1130772458
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
881731833.5
ns881484167
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
805115667
ns804587958
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
1757118042
ns1745692959
ns1.01
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
28927493
ns28847342
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
1058828646
ns1027064583.5
ns1.03
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
973248125
ns959640250
ns1.01
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
1362518583
ns1261786916
ns1.08
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
1744326604
ns1731191479.5
ns1.01
mlp7layer_bn(relu)(32 x 256)/forward/CPU/2 thread(s)
1317667
ns1173624.5
ns1.12
mlp7layer_bn(relu)(32 x 256)/forward/CPU/4 thread(s)
936250
ns906000
ns1.03
mlp7layer_bn(relu)(32 x 256)/forward/CPU/8 thread(s)
907396
ns939334
ns0.97
mlp7layer_bn(relu)(32 x 256)/forward/CPU/1 thread(s)
2059708
ns2039708.5
ns1.01
mlp7layer_bn(relu)(32 x 256)/forward/GPU/CUDA
573972.5
ns570174
ns1.01
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/2 thread(s)
5872667
ns5806917
ns1.01
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/4 thread(s)
6537417
ns7014250
ns0.93
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/8 thread(s)
24586229.5
ns25017291.5
ns0.98
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/1 thread(s)
7039792
ns7060041
ns1.00
mlp7layer_bn(relu)(32 x 256)/zygote/GPU/CUDA
1375117
ns1340455.5
ns1.03
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/2 thread(s)
11464417
ns11530292
ns0.99
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/4 thread(s)
10266333
ns8850020.5
ns1.16
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/8 thread(s)
17693667
ns17434458
ns1.01
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/1 thread(s)
8866896
ns8551437.5
ns1.04
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/2 thread(s)
487208
ns506417
ns0.96
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/4 thread(s)
474584
ns273625
ns1.73
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/8 thread(s)
2175853.5
ns2396979
ns0.91
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/1 thread(s)
87541
ns90000
ns0.97
Dense(128 => 128, gelu)(128 x 128)/forward/GPU/CUDA
28408
ns27635
ns1.03
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/2 thread(s)
383437.5
ns385625
ns0.99
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/4 thread(s)
444333.5
ns348812.5
ns1.27
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/8 thread(s)
4385583
ns4572979.5
ns0.96
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/1 thread(s)
268292
ns262125
ns1.02
Dense(128 => 128, gelu)(128 x 128)/zygote/GPU/CUDA
225901
ns220978.5
ns1.02
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/2 thread(s)
706959
ns707916
ns1.00
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/4 thread(s)
722500
ns579562.5
ns1.25
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/8 thread(s)
1069791
ns1057604
ns1.01
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/1 thread(s)
447125
ns449729
ns0.99
Dense(128 => 128, relu)(128 x 128)/forward/CPU/2 thread(s)
432125
ns456750
ns0.95
Dense(128 => 128, relu)(128 x 128)/forward/CPU/4 thread(s)
418166
ns212166
ns1.97
Dense(128 => 128, relu)(128 x 128)/forward/CPU/8 thread(s)
742500
ns729000
ns1.02
Dense(128 => 128, relu)(128 x 128)/forward/CPU/1 thread(s)
53208
ns54895.5
ns0.97
Dense(128 => 128, relu)(128 x 128)/forward/GPU/CUDA
28501
ns27483.5
ns1.04
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/2 thread(s)
338770.5
ns339208
ns1.00
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/4 thread(s)
338750
ns194896
ns1.74
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/8 thread(s)
737375
ns864542
ns0.85
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/1 thread(s)
154208
ns153187.5
ns1.01
Dense(128 => 128, relu)(128 x 128)/zygote/GPU/CUDA
210566
ns206000
ns1.02
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/2 thread(s)
404125
ns406291
ns0.99
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/4 thread(s)
405916.5
ns262083.5
ns1.55
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/8 thread(s)
983208
ns828042
ns1.19
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/1 thread(s)
174750
ns173792
ns1.01
vgg16(32, 32, 3, 64)/forward/CPU/2 thread(s)
603527917
ns600740375
ns1.00
vgg16(32, 32, 3, 64)/forward/CPU/4 thread(s)
431057458.5
ns425777500
ns1.01
vgg16(32, 32, 3, 64)/forward/CPU/8 thread(s)
375361437.5
ns373716000
ns1.00
vgg16(32, 32, 3, 64)/forward/CPU/1 thread(s)
872552854
ns873713812.5
ns1.00
vgg16(32, 32, 3, 64)/forward/GPU/CUDA
7040620
ns7032511.5
ns1.00
vgg16(32, 32, 3, 64)/zygote/CPU/2 thread(s)
1986550813
ns2084258688
ns0.95
vgg16(32, 32, 3, 64)/zygote/CPU/4 thread(s)
1668902250
ns1651169312.5
ns1.01
vgg16(32, 32, 3, 64)/zygote/CPU/8 thread(s)
1651138625
ns1580932771
ns1.04
vgg16(32, 32, 3, 64)/zygote/CPU/1 thread(s)
2764176416
ns2753232709
ns1.00
vgg16(32, 32, 3, 64)/zygote/GPU/CUDA
25979788.5
ns26093846
ns1.00
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/2 thread(s)
521833
ns534708
ns0.98
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/4 thread(s)
437250
ns428292
ns1.02
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/8 thread(s)
1710708
ns1851583
ns0.92
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/1 thread(s)
866062.5
ns866334
ns1.00
Dense(512 => 512, gelu)(512 x 128)/forward/GPU/CUDA
47823
ns46927
ns1.02
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/2 thread(s)
1842562.5
ns1888062.5
ns0.98
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/4 thread(s)
2356875
ns2316896
ns1.02
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/8 thread(s)
14345020.5
ns14585209
ns0.98
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/1 thread(s)
2764166
ns2757583.5
ns1.00
Dense(512 => 512, gelu)(512 x 128)/zygote/GPU/CUDA
252466.5
ns247984.5
ns1.02
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/2 thread(s)
2751750
ns2751959
ns1.00
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/4 thread(s)
2316083
ns2279292
ns1.02
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/8 thread(s)
4360708
ns3318791.5
ns1.31
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/1 thread(s)
4727708
ns3395625
ns1.39
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/2 thread(s)
1581500
ns1510000
ns1.05
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/4 thread(s)
1216229.5
ns1177708
ns1.03
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/8 thread(s)
1177645.5
ns1195583
ns0.98
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/1 thread(s)
2314729
ns2315167
ns1.00
mlp7layer_bn(tanh)(32 x 256)/forward/GPU/CUDA
547137
ns588506
ns0.93
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/2 thread(s)
5877292
ns5715958.5
ns1.03
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/4 thread(s)
6745916.5
ns6618896
ns1.02
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/8 thread(s)
24550687.5
ns24170542
ns1.02
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/1 thread(s)
7266312
ns7277583.5
ns1.00
mlp7layer_bn(tanh)(32 x 256)/zygote/GPU/CUDA
1351645
ns1377478
ns0.98
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/2 thread(s)
12285333.5
ns12783958
ns0.96
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/4 thread(s)
12037124.5
ns11833292
ns1.02
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/8 thread(s)
20466187
ns19658396.5
ns1.04
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/1 thread(s)
10853417
ns9760416.5
ns1.11
Dense(16 => 16, relu)(16 x 128)/forward/CPU/2 thread(s)
2500
ns2750
ns0.91
Dense(16 => 16, relu)(16 x 128)/forward/CPU/4 thread(s)
2750
ns2583
ns1.06
Dense(16 => 16, relu)(16 x 128)/forward/CPU/8 thread(s)
3416
ns3250
ns1.05
Dense(16 => 16, relu)(16 x 128)/forward/CPU/1 thread(s)
3041
ns4771
ns0.64
Dense(16 => 16, relu)(16 x 128)/forward/GPU/CUDA
24989
ns24855
ns1.01
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/2 thread(s)
8333
ns8708
ns0.96
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/4 thread(s)
8625
ns8500
ns1.01
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/8 thread(s)
8667
ns8416
ns1.03
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/1 thread(s)
8770.5
ns8479.5
ns1.03
Dense(16 => 16, relu)(16 x 128)/zygote/GPU/CUDA
213236.5
ns213745.5
ns1.00
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/2 thread(s)
16750
ns16583
ns1.01
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/4 thread(s)
16375
ns16583
ns0.99
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/8 thread(s)
16792
ns16625
ns1.01
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/1 thread(s)
10917
ns10709
ns1.02
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/2 thread(s)
10792
ns11791
ns0.92
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/4 thread(s)
18083
ns16125
ns1.12
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/8 thread(s)
11666
ns11750
ns0.99
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/1 thread(s)
7666.5
ns7583
ns1.01
Dense(16 => 16, gelu)(16 x 128)/forward/GPU/CUDA
24865.5
ns24983
ns1.00
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/2 thread(s)
22333
ns22292
ns1.00
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/4 thread(s)
22291
ns22625
ns0.99
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/8 thread(s)
22500
ns22416.5
ns1.00
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/1 thread(s)
22375
ns22541
ns0.99
Dense(16 => 16, gelu)(16 x 128)/zygote/GPU/CUDA
233562.5
ns235287.5
ns0.99
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/2 thread(s)
52042
ns52250
ns1.00
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/4 thread(s)
52125
ns52375
ns1.00
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/8 thread(s)
52270.5
ns52417
ns1.00
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/1 thread(s)
44000
ns43792
ns1.00
Dense(128 => 128, identity)(128 x 128)/forward/CPU/2 thread(s)
28979.5
ns29333
ns0.99
Dense(128 => 128, identity)(128 x 128)/forward/CPU/4 thread(s)
29208
ns28791
ns1.01
Dense(128 => 128, identity)(128 x 128)/forward/CPU/8 thread(s)
28458
ns29167
ns0.98
Dense(128 => 128, identity)(128 x 128)/forward/CPU/1 thread(s)
46209
ns46167
ns1.00
Dense(128 => 128, identity)(128 x 128)/forward/GPU/CUDA
26274
ns26056
ns1.01
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/2 thread(s)
229062.5
ns209667
ns1.09
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/4 thread(s)
263041
ns257250
ns1.02
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/8 thread(s)
4056646
ns4075916
ns1.00
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/1 thread(s)
154437.5
ns147625
ns1.05
Dense(128 => 128, identity)(128 x 128)/zygote/GPU/CUDA
215509
ns220948.5
ns0.98
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/2 thread(s)
329834
ns308542
ns1.07
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/4 thread(s)
292583
ns282917
ns1.03
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/8 thread(s)
817500
ns767042
ns1.07
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/1 thread(s)
161708
ns161708
ns1
Dense(16 => 16, identity)(16 x 128)/forward/CPU/2 thread(s)
2041
ns2042
ns1.00
Dense(16 => 16, identity)(16 x 128)/forward/CPU/4 thread(s)
1833
ns1958
ns0.94
Dense(16 => 16, identity)(16 x 128)/forward/CPU/8 thread(s)
2750
ns2312.5
ns1.19
Dense(16 => 16, identity)(16 x 128)/forward/CPU/1 thread(s)
1917
ns1958
ns0.98
Dense(16 => 16, identity)(16 x 128)/forward/GPU/CUDA
23258
ns22938
ns1.01
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/2 thread(s)
7208
ns7375
ns0.98
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/4 thread(s)
7042
ns7250
ns0.97
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/8 thread(s)
7750
ns7625
ns1.02
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/1 thread(s)
7125
ns7250
ns0.98
Dense(16 => 16, identity)(16 x 128)/zygote/GPU/CUDA
267733.5
ns263592.5
ns1.02
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/2 thread(s)
11334
ns11292
ns1.00
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/4 thread(s)
11375
ns11458
ns0.99
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/8 thread(s)
11708
ns11500
ns1.02
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/1 thread(s)
6958
ns7000
ns0.99
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
79930209
ns79852292
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
49066500
ns49068812.5
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
45049708
ns45007187.5
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
151430167
ns151374416
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/GPU/CUDA
2719840
ns2720111.5
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
497512959
ns607847917
ns0.82
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
411297375
ns412172583
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
396546125
ns398297875
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
736651313
ns737514583.5
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
14587409
ns14594549
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
709337374.5
ns713373500
ns0.99
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
664763792
ns665302083
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
1022853709
ns1010864625
ns1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
996468292
ns998393833
ns1.00
This comment was automatically generated by workflow using github-action-benchmark.