Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
docs: clarify line about "not saving the model" (#965)
* Remove line about "not saving the model" Not sure what this is but it seems counterintuitive. Feel free to reject or modify. * Update examples/SimpleRNN/main.jl
- Loading branch information
aabeafb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lux Benchmarks
Dense(512 => 512, identity)(512 x 128)/forward/CPU/2 thread(s)
414750
ns415083
ns1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/4 thread(s)
243291
ns243562.5
ns1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/8 thread(s)
243812
ns243917
ns1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/1 thread(s)
740250
ns740187.5
ns1.00
Dense(512 => 512, identity)(512 x 128)/forward/GPU/CUDA
44241.5
ns43145
ns1.03
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/2 thread(s)
1300541
ns1349145.5
ns0.96
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/4 thread(s)
1208729.5
ns1217021
ns0.99
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/8 thread(s)
16389813
ns16523666
ns0.99
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/1 thread(s)
2193750.5
ns2260375
ns0.97
Dense(512 => 512, identity)(512 x 128)/zygote/GPU/CUDA
204422
ns198205.5
ns1.03
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/2 thread(s)
1351250
ns1319125
ns1.02
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/4 thread(s)
1304500
ns1304979
ns1.00
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/8 thread(s)
16451042
ns16162208.5
ns1.02
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/1 thread(s)
2234083
ns2198917
ns1.02
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
1778229
ns1670458
ns1.06
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1089166
ns1107375
ns0.98
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1517729.5
ns1527771
ns0.99
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
2822937.5
ns3019125
ns0.94
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/GPU/CUDA
213434.5
ns211316
ns1.01
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
12152042
ns12175041
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
8825458
ns8824145.5
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
9241187.5
ns9233625
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
18620000
ns18591583
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1927990
ns1930057
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
17288833
ns17307313
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
13973166
ns13969291.5
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
14514209
ns14519583
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
21820500.5
ns21863458
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
250505834
ns250175667
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
148891667
ns148788625
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
116479042
ns116216917
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
446831708
ns446783750
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/GPU/CUDA
5489940.5
ns5483992
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
1222461916
ns1221582792
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
933470000
ns934823708
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
831062083.5
ns825393979
ns1.01
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
1628130417
ns1634434500
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
31351227
ns31104295
ns1.01
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
1143937584
ns1147938166
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
996799687.5
ns996908396
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
1394632583
ns1315038312.5
ns1.06
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
1729939125
ns1733258437.5
ns1.00
lenet(28, 28, 1, 32)/forward/CPU/2 thread(s)
1124916
ns1124250
ns1.00
lenet(28, 28, 1, 32)/forward/CPU/4 thread(s)
1654938
ns1648541.5
ns1.00
lenet(28, 28, 1, 32)/forward/CPU/8 thread(s)
3728958.5
ns3458500
ns1.08
lenet(28, 28, 1, 32)/forward/CPU/1 thread(s)
783791
ns790708
ns0.99
lenet(28, 28, 1, 32)/forward/GPU/CUDA
271605.5
ns276890
ns0.98
lenet(28, 28, 1, 32)/zygote/CPU/2 thread(s)
2988375
ns2989917
ns1.00
lenet(28, 28, 1, 32)/zygote/CPU/4 thread(s)
4138125
ns4140375
ns1.00
lenet(28, 28, 1, 32)/zygote/CPU/8 thread(s)
9730708
ns10581541.5
ns0.92
lenet(28, 28, 1, 32)/zygote/CPU/1 thread(s)
3139396
ns3136958
ns1.00
lenet(28, 28, 1, 32)/zygote/GPU/CUDA
1090612
ns1129684
ns0.97
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
2394854
ns2390166
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1356083
ns1353000
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1583167
ns1581708
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
4315958
ns4332708
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/GPU/CUDA
210126
ns210207
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
20322937.5
ns20303291.5
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
16992041
ns16973958
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
18180104.5
ns18209958
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
26773500
ns26748042
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
2003457
ns2004316
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
46196000
ns44366000
ns1.04
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
41023041
ns40975041.5
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
41211229
ns41237167
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
47740229.5
ns47733416.5
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
4671645.5
ns4673042
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2597375
ns2607958
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2731125
ns2740083
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
8666750
ns8646250
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/GPU/CUDA
472288
ns471597
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
40564771
ns40513208
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
33968749.5
ns33898583
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
34077228.5
ns34004896
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
53636000
ns53682375
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
3237384.5
ns3025195
ns1.07
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
114136958
ns109957125
ns1.04
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
137210958
ns136423624.5
ns1.01
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
252781791.5
ns249203917
ns1.01
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
96450834
ns96417375
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
270801125
ns270485625
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
157803395.5
ns157422417
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
125472208.5
ns125021063
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
489251916
ns489717917
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/GPU/CUDA
7025130.5
ns6887253.5
ns1.02
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
1502032791.5
ns1500178749.5
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
1212619792
ns1209776166
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
1094094249.5
ns1101673604
ns0.99
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
2035942062.5
ns2033012896.5
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
34741051
ns34855481.5
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
2092986416.5
ns2031056270.5
ns1.03
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
1853923708
ns1850536958
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
2191693458
ns2173376541.5
ns1.01
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
2559907833
ns2563569208
ns1.00
lenet(28, 28, 1, 128)/forward/CPU/2 thread(s)
2043042
ns2043208
ns1.00
lenet(28, 28, 1, 128)/forward/CPU/4 thread(s)
3065417
ns3056708
ns1.00
lenet(28, 28, 1, 128)/forward/CPU/8 thread(s)
7640000
ns8256479.5
ns0.93
lenet(28, 28, 1, 128)/forward/CPU/1 thread(s)
2471937.5
ns2476666
ns1.00
lenet(28, 28, 1, 128)/forward/GPU/CUDA
271311
ns276146.5
ns0.98
lenet(28, 28, 1, 128)/zygote/CPU/2 thread(s)
9382958
ns9654583
ns0.97
lenet(28, 28, 1, 128)/zygote/CPU/4 thread(s)
11975979
ns12054625
ns0.99
lenet(28, 28, 1, 128)/zygote/CPU/8 thread(s)
23565167
ns24288042
ns0.97
lenet(28, 28, 1, 128)/zygote/CPU/1 thread(s)
11755708
ns11746854.5
ns1.00
lenet(28, 28, 1, 128)/zygote/GPU/CUDA
1167532
ns1181147.5
ns0.99
vgg16(32, 32, 3, 32)/forward/CPU/2 thread(s)
380391500
ns381419291.5
ns1.00
vgg16(32, 32, 3, 32)/forward/CPU/4 thread(s)
309372541.5
ns308744166.5
ns1.00
vgg16(32, 32, 3, 32)/forward/CPU/8 thread(s)
258576062.5
ns262197666.5
ns0.99
vgg16(32, 32, 3, 32)/forward/CPU/1 thread(s)
453610708.5
ns453805292
ns1.00
vgg16(32, 32, 3, 32)/forward/GPU/CUDA
4830626
ns4853504
ns1.00
vgg16(32, 32, 3, 32)/zygote/CPU/2 thread(s)
1155083167
ns1144266542
ns1.01
vgg16(32, 32, 3, 32)/zygote/CPU/4 thread(s)
977011375
ns964566583
ns1.01
vgg16(32, 32, 3, 32)/zygote/CPU/8 thread(s)
968797667
ns971379334
ns1.00
vgg16(32, 32, 3, 32)/zygote/CPU/1 thread(s)
1402145750
ns1404606542
ns1.00
vgg16(32, 32, 3, 32)/zygote/GPU/CUDA
16260465
ns16465783
ns0.99
lenet(28, 28, 1, 64)/forward/CPU/2 thread(s)
1055895.5
ns1058521
ns1.00
lenet(28, 28, 1, 64)/forward/CPU/4 thread(s)
1667979.5
ns1665374.5
ns1.00
lenet(28, 28, 1, 64)/forward/CPU/8 thread(s)
6843083
ns6526666
ns1.05
lenet(28, 28, 1, 64)/forward/CPU/1 thread(s)
1384584
ns1370042
ns1.01
lenet(28, 28, 1, 64)/forward/GPU/CUDA
277409
ns274033.5
ns1.01
lenet(28, 28, 1, 64)/zygote/CPU/2 thread(s)
6359084
ns6516541
ns0.98
lenet(28, 28, 1, 64)/zygote/CPU/4 thread(s)
13140791
ns13102708.5
ns1.00
lenet(28, 28, 1, 64)/zygote/CPU/8 thread(s)
19435708
ns18363000
ns1.06
lenet(28, 28, 1, 64)/zygote/CPU/1 thread(s)
6077249.5
ns6084354.5
ns1.00
lenet(28, 28, 1, 64)/zygote/GPU/CUDA
1208939
ns1233343.5
ns0.98
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
70453792
ns70574042
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
43702291.5
ns43797125
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
39707333.5
ns39782958.5
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
132609749.5
ns132781271
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/GPU/CUDA
1874873
ns1956000
ns0.96
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
355396771
ns355154708.5
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
270399625
ns270770334
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
253531375
ns254052708
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
534771792
ns534690875
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
13205988
ns13245522.5
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
395708167
ns396827750
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
370167417
ns372318834
ns0.99
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
693674958.5
ns671683959
ns1.03
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
713328458
ns713207834
ns1.00
vgg16(32, 32, 3, 128)/forward/CPU/2 thread(s)
1188384834
ns1189840458
ns1.00
vgg16(32, 32, 3, 128)/forward/CPU/4 thread(s)
834836604
ns834600270.5
ns1.00
vgg16(32, 32, 3, 128)/forward/CPU/8 thread(s)
637057646.5
ns643996000
ns0.99
vgg16(32, 32, 3, 128)/forward/CPU/1 thread(s)
1772765937.5
ns1771218270.5
ns1.00
vgg16(32, 32, 3, 128)/forward/GPU/CUDA
12404766
ns12386792
ns1.00
vgg16(32, 32, 3, 128)/zygote/CPU/2 thread(s)
3636048750.5
ns3632394041.5
ns1.00
vgg16(32, 32, 3, 128)/zygote/CPU/4 thread(s)
2830058583
ns2819490917
ns1.00
vgg16(32, 32, 3, 128)/zygote/CPU/8 thread(s)
2710979542
ns2703852750
ns1.00
vgg16(32, 32, 3, 128)/zygote/CPU/1 thread(s)
5040694000
ns5046837084
ns1.00
vgg16(32, 32, 3, 128)/zygote/GPU/CUDA
49203468
ns49275819
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
3430792
ns3417875
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2072500
ns2080042
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2523083
ns2540459
ns0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
6041625
ns6037792
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/GPU/CUDA
576145
ns571807.5
ns1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
25980041
ns25947667
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
18928584
ns18971396.5
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
19397250
ns19516791.5
ns0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
39276562.5
ns39348958.5
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
3194284
ns3001343
ns1.06
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
55601354.5
ns55429166.5
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
82732437.5
ns81557583
ns1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
172634208
ns172942167
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
45521875
ns45661541.5
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
1786354
ns1786354.5
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1103937.5
ns1106458
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1593375
ns1570978.5
ns1.01
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
3039125
ns3033375
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/GPU/CUDA
217854.5
ns214775.5
ns1.01
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
12554187.5
ns12557750
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
9216896
ns9236583.5
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
9695292
ns9630708
ns1.01
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
19000833.5
ns19044937.5
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1987678
ns1985531
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
17658520.5
ns17664084
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
14335542
ns14332709
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
14581459
ns14595146
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
22170084
ns22201042
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
70566666
ns70526583
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
43756000
ns43708417
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
39751249.5
ns39735812.5
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
132626292
ns132615771
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/GPU/CUDA
1957247.5
ns1938634
ns1.01
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
360065750
ns360222063
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
348987521
ns348659791.5
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
305633709
ns302374833.5
ns1.01
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
727326125
ns727881666
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
14293471
ns14325162
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
419491583.5
ns419531958.5
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
436122125
ns434088375
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
697189791.5
ns691688416.5
ns1.01
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
717699792
ns717541625
ns1.00
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/2 thread(s)
1670125
ns1673625
ns1.00
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/4 thread(s)
1362354
ns1384958
ns0.98
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/8 thread(s)
1381854
ns1378083
ns1.00
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/1 thread(s)
2542833
ns2664374.5
ns0.95
mlp7layer_bn(gelu)(32 x 256)/forward/GPU/CUDA
588475
ns568730
ns1.03
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/2 thread(s)
9263749.5
ns9240188
ns1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/4 thread(s)
15705708
ns14792541.5
ns1.06
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/8 thread(s)
32641083.5
ns32052875
ns1.02
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/1 thread(s)
10211375
ns10208834
ns1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/GPU/CUDA
1444503
ns1422888
ns1.02
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/2 thread(s)
17200333
ns22285625
ns0.77
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/4 thread(s)
28628708
ns28463000
ns1.01
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/8 thread(s)
55361125.5
ns56517729
ns0.98
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/1 thread(s)
18888770.5
ns18854687.5
ns1.00
Dense(512 => 512, relu)(512 x 128)/forward/CPU/2 thread(s)
694124.5
ns699792
ns0.99
Dense(512 => 512, relu)(512 x 128)/forward/CPU/4 thread(s)
597229
ns644209
ns0.93
Dense(512 => 512, relu)(512 x 128)/forward/CPU/8 thread(s)
1060208
ns1065062.5
ns1.00
Dense(512 => 512, relu)(512 x 128)/forward/CPU/1 thread(s)
728834
ns728292
ns1.00
Dense(512 => 512, relu)(512 x 128)/forward/GPU/CUDA
47763
ns47086.5
ns1.01
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/2 thread(s)
1471625
ns1513416
ns0.97
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/4 thread(s)
1008083.5
ns1010604
ns1.00
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/8 thread(s)
1445375
ns1606083
ns0.90
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/1 thread(s)
2292479
ns2291666
ns1.00
Dense(512 => 512, relu)(512 x 128)/zygote/GPU/CUDA
233511
ns226725.5
ns1.03
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/2 thread(s)
1548708
ns1516750.5
ns1.02
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/4 thread(s)
1059208
ns1076208
ns0.98
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/8 thread(s)
1684000
ns1449125
ns1.16
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/1 thread(s)
2239812.5
ns2256125
ns0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
3399917
ns3415417
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2063416.5
ns2053167
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2497542
ns2513229.5
ns0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
6012541
ns6017583.5
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/GPU/CUDA
573050
ns568598
ns1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
24040667
ns24077208
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
17197333
ns17182291.5
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
17115834
ns17150417
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
37555250
ns37549833
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
3173627
ns2938820
ns1.08
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
53697479
ns53630958.5
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
82955166.5
ns81466625
ns1.02
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
172570500
ns169486084
ns1.02
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
44541583
ns44624500
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
250473458
ns250522209
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
148642041
ns148626708
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
116173562.5
ns116110708.5
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
448035771
ns447858917
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/GPU/CUDA
5470189
ns5427690.5
ns1.01
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
1105702334
ns1104123000
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
856343812.5
ns859505875
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
831890979.5
ns829538646
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
1754046583
ns1754815708
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
28859350.5
ns28735758
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
1019485791.5
ns1018403979.5
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
978846583
ns983568208
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
1297118667
ns1335719333
ns0.97
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
1724027750
ns1728379395.5
ns1.00
mlp7layer_bn(relu)(32 x 256)/forward/CPU/2 thread(s)
1083542
ns1082292
ns1.00
mlp7layer_bn(relu)(32 x 256)/forward/CPU/4 thread(s)
795958
ns764959
ns1.04
mlp7layer_bn(relu)(32 x 256)/forward/CPU/8 thread(s)
685709
ns682709
ns1.00
mlp7layer_bn(relu)(32 x 256)/forward/CPU/1 thread(s)
2058417
ns2044125
ns1.01
mlp7layer_bn(relu)(32 x 256)/forward/GPU/CUDA
568330.5
ns554259.5
ns1.03
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/2 thread(s)
5923958
ns5934375
ns1.00
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/4 thread(s)
9029417
ns9162896
ns0.99
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/8 thread(s)
26532333
ns26061854.5
ns1.02
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/1 thread(s)
6412458
ns7111479
ns0.90
mlp7layer_bn(relu)(32 x 256)/zygote/GPU/CUDA
1397840
ns1357512.5
ns1.03
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/2 thread(s)
9312833.5
ns9683542
ns0.96
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/4 thread(s)
16060416.5
ns16162959
ns0.99
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/8 thread(s)
33894583
ns33355375
ns1.02
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/1 thread(s)
7611459
ns7620375
ns1.00
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/2 thread(s)
386041
ns388541
ns0.99
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/4 thread(s)
458083.5
ns518208.5
ns0.88
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/8 thread(s)
3025042
ns3052583
ns0.99
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/1 thread(s)
89917
ns89500
ns1.00
Dense(128 => 128, gelu)(128 x 128)/forward/GPU/CUDA
28382
ns27832
ns1.02
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/2 thread(s)
405854
ns404666
ns1.00
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/4 thread(s)
456708
ns454791
ns1.00
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/8 thread(s)
4404625
ns4601375
ns0.96
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/1 thread(s)
273541
ns280000
ns0.98
Dense(128 => 128, gelu)(128 x 128)/zygote/GPU/CUDA
217486
ns213087
ns1.02
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/2 thread(s)
713041
ns677583
ns1.05
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/4 thread(s)
728791
ns726708.5
ns1.00
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/8 thread(s)
5034583
ns4653542
ns1.08
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/1 thread(s)
510750
ns522959
ns0.98
Dense(128 => 128, relu)(128 x 128)/forward/CPU/2 thread(s)
330208.5
ns334437.5
ns0.99
Dense(128 => 128, relu)(128 x 128)/forward/CPU/4 thread(s)
387375
ns451521
ns0.86
Dense(128 => 128, relu)(128 x 128)/forward/CPU/8 thread(s)
759854
ns774437.5
ns0.98
Dense(128 => 128, relu)(128 x 128)/forward/CPU/1 thread(s)
54333.5
ns52833
ns1.03
Dense(128 => 128, relu)(128 x 128)/forward/GPU/CUDA
28377
ns28056
ns1.01
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/2 thread(s)
354479.5
ns352584
ns1.01
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/4 thread(s)
337167
ns333875
ns1.01
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/8 thread(s)
889417
ns902834
ns0.99
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/1 thread(s)
151333
ns151959
ns1.00
Dense(128 => 128, relu)(128 x 128)/zygote/GPU/CUDA
204706.5
ns199603.5
ns1.03
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/2 thread(s)
368875
ns367333
ns1.00
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/4 thread(s)
351334
ns348125
ns1.01
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/8 thread(s)
517084
ns945562.5
ns0.55
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/1 thread(s)
151020.5
ns151375
ns1.00
vgg16(32, 32, 3, 64)/forward/CPU/2 thread(s)
601511250
ns601502916
ns1.00
vgg16(32, 32, 3, 64)/forward/CPU/4 thread(s)
431188729
ns430191604
ns1.00
vgg16(32, 32, 3, 64)/forward/CPU/8 thread(s)
393417333.5
ns390437000
ns1.01
vgg16(32, 32, 3, 64)/forward/CPU/1 thread(s)
872312292
ns871755417
ns1.00
vgg16(32, 32, 3, 64)/forward/GPU/CUDA
7628702
ns7623148
ns1.00
vgg16(32, 32, 3, 64)/zygote/CPU/2 thread(s)
1997345666
ns1994407979.5
ns1.00
vgg16(32, 32, 3, 64)/zygote/CPU/4 thread(s)
1639731729
ns1636880541.5
ns1.00
vgg16(32, 32, 3, 64)/zygote/CPU/8 thread(s)
1653667375
ns1572982645.5
ns1.05
vgg16(32, 32, 3, 64)/zygote/CPU/1 thread(s)
2659433292
ns2658913333
ns1.00
vgg16(32, 32, 3, 64)/zygote/GPU/CUDA
26464110
ns26625956
ns0.99
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/2 thread(s)
527583
ns525833
ns1.00
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/4 thread(s)
403020.5
ns401229.5
ns1.00
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/8 thread(s)
2740959
ns2770750
ns0.99
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/1 thread(s)
874042
ns872645.5
ns1.00
Dense(512 => 512, gelu)(512 x 128)/forward/GPU/CUDA
47791.5
ns46979
ns1.02
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/2 thread(s)
1937458
ns1876563
ns1.03
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/4 thread(s)
1800375
ns1830166.5
ns0.98
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/8 thread(s)
16241167
ns16303459
ns1.00
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/1 thread(s)
2831458
ns2794834
ns1.01
Dense(512 => 512, gelu)(512 x 128)/zygote/GPU/CUDA
247132
ns240187.5
ns1.03
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/2 thread(s)
3023000
ns2919520.5
ns1.04
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/4 thread(s)
5005208
ns5015167
ns1.00
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/8 thread(s)
16718208
ns16524271
ns1.01
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/1 thread(s)
3710916.5
ns3743292
ns0.99
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/2 thread(s)
1477209
ns1368417
ns1.08
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/4 thread(s)
948500
ns979958
ns0.97
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/8 thread(s)
932209
ns930917
ns1.00
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/1 thread(s)
2349125
ns2342208.5
ns1.00
mlp7layer_bn(tanh)(32 x 256)/forward/GPU/CUDA
589164
ns565552
ns1.04
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/2 thread(s)
5873625
ns5910334
ns0.99
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/4 thread(s)
8384708
ns8430229
ns0.99
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/8 thread(s)
25818812.5
ns25837625
ns1.00
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/1 thread(s)
7145417
ns7325812
ns0.98
mlp7layer_bn(tanh)(32 x 256)/zygote/GPU/CUDA
1381067
ns1327441
ns1.04
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/2 thread(s)
11656666
ns11696354
ns1.00
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/4 thread(s)
17362667
ns18020208.5
ns0.96
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/8 thread(s)
36263250
ns39373729
ns0.92
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/1 thread(s)
9526166.5
ns9553833
ns1.00
Dense(16 => 16, relu)(16 x 128)/forward/CPU/2 thread(s)
2916
ns2459
ns1.19
Dense(16 => 16, relu)(16 x 128)/forward/CPU/4 thread(s)
2708
ns2416
ns1.12
Dense(16 => 16, relu)(16 x 128)/forward/CPU/8 thread(s)
3500
ns2792
ns1.25
Dense(16 => 16, relu)(16 x 128)/forward/CPU/1 thread(s)
2667
ns4583
ns0.58
Dense(16 => 16, relu)(16 x 128)/forward/GPU/CUDA
25448
ns24428
ns1.04
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/2 thread(s)
7250
ns7291
ns0.99
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/4 thread(s)
7250
ns6958
ns1.04
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/8 thread(s)
7334
ns7333
ns1.00
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/1 thread(s)
7167
ns6750
ns1.06
Dense(16 => 16, relu)(16 x 128)/zygote/GPU/CUDA
210626
ns200289
ns1.05
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/2 thread(s)
8375
ns8416
ns1.00
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/4 thread(s)
8334
ns8333
ns1.00
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/8 thread(s)
8458
ns8250
ns1.03
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/1 thread(s)
6167
ns5625
ns1.10
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/2 thread(s)
10250
ns10459
ns0.98
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/4 thread(s)
12459
ns12958
ns0.96
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/8 thread(s)
10875
ns11333.5
ns0.96
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/1 thread(s)
7959
ns7791
ns1.02
Dense(16 => 16, gelu)(16 x 128)/forward/GPU/CUDA
25816
ns24856
ns1.04
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/2 thread(s)
21542
ns21709
ns0.99
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/4 thread(s)
21708
ns21459
ns1.01
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/8 thread(s)
21709
ns21792
ns1.00
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/1 thread(s)
21417
ns21167
ns1.01
Dense(16 => 16, gelu)(16 x 128)/zygote/GPU/CUDA
227232.5
ns220349.5
ns1.03
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/2 thread(s)
57500
ns53584
ns1.07
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/4 thread(s)
53667
ns53583
ns1.00
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/8 thread(s)
53500
ns53770.5
ns0.99
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/1 thread(s)
51375
ns51125
ns1.00
Dense(128 => 128, identity)(128 x 128)/forward/CPU/2 thread(s)
28834
ns28750
ns1.00
Dense(128 => 128, identity)(128 x 128)/forward/CPU/4 thread(s)
29020.5
ns28916
ns1.00
Dense(128 => 128, identity)(128 x 128)/forward/CPU/8 thread(s)
28667
ns28875
ns0.99
Dense(128 => 128, identity)(128 x 128)/forward/CPU/1 thread(s)
46333
ns45875
ns1.01
Dense(128 => 128, identity)(128 x 128)/forward/GPU/CUDA
27068
ns26054
ns1.04
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/2 thread(s)
225125
ns228541
ns0.99
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/4 thread(s)
275292
ns275333
ns1.00
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/8 thread(s)
3755229.5
ns4217667
ns0.89
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/1 thread(s)
145084
ns145250
ns1.00
Dense(128 => 128, identity)(128 x 128)/zygote/GPU/CUDA
206498
ns199681
ns1.03
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/2 thread(s)
241125
ns246459
ns0.98
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/4 thread(s)
293875
ns293145.5
ns1.00
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/8 thread(s)
4193250
ns4145854
ns1.01
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/1 thread(s)
145771
ns145542
ns1.00
Dense(16 => 16, identity)(16 x 128)/forward/CPU/2 thread(s)
1750
ns1959
ns0.89
Dense(16 => 16, identity)(16 x 128)/forward/CPU/4 thread(s)
1833
ns2000
ns0.92
Dense(16 => 16, identity)(16 x 128)/forward/CPU/8 thread(s)
2500
ns2000
ns1.25
Dense(16 => 16, identity)(16 x 128)/forward/CPU/1 thread(s)
2000
ns1708
ns1.17
Dense(16 => 16, identity)(16 x 128)/forward/GPU/CUDA
24021
ns22940
ns1.05
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/2 thread(s)
5250
ns5334
ns0.98
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/4 thread(s)
5250
ns5125
ns1.02
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/8 thread(s)
5375
ns5166
ns1.04
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/1 thread(s)
5292
ns4792
ns1.10
Dense(16 => 16, identity)(16 x 128)/zygote/GPU/CUDA
243494.5
ns232790
ns1.05
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/2 thread(s)
7416
ns7417
ns1.00
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/4 thread(s)
7417
ns7375
ns1.01
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/8 thread(s)
7500
ns7459
ns1.01
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/1 thread(s)
5167
ns5250
ns0.98
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
81062125
ns81082749.5
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
48607333
ns48527208
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
43732646
ns43737084
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
153570458
ns153734041
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/GPU/CUDA
2719369
ns2717702
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
619835291
ns621583083
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
427440708
ns427560417
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
410233145.5
ns412343333.5
ns0.99
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
698627167
ns697842291
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
15605483
ns15532428
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
906046958
ns851105979
ns1.06
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
847422938
ns840062312.5
ns1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
1164315146
ns1156974917
ns1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
1175767687.5
ns1177103062.5
ns1.00
This comment was automatically generated by workflow using github-action-benchmark.