Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
chore(deps): bump crate-ci/typos from 1.25.0 to 1.26.0 (#978)
Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.25.0 to 1.26.0. - [Release notes](https://github.com/crate-ci/typos/releases) - [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md) - [Commits](crate-ci/typos@v1.25.0...v1.26.0) --- updated-dependencies: - dependency-name: crate-ci/typos dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
- Loading branch information
1e783df
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lux Benchmarks
Dense(512 => 512, identity)(512 x 128)/forward/CPU/2 thread(s)
410479.5
ns412250
ns1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/4 thread(s)
322979
ns244083
ns1.32
Dense(512 => 512, identity)(512 x 128)/forward/CPU/8 thread(s)
243583
ns322041
ns0.76
Dense(512 => 512, identity)(512 x 128)/forward/CPU/1 thread(s)
740125
ns739625
ns1.00
Dense(512 => 512, identity)(512 x 128)/forward/GPU/CUDA
43310
ns43576
ns0.99
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/2 thread(s)
1312625
ns1368688
ns0.96
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/4 thread(s)
2418334
ns1198625
ns2.02
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/8 thread(s)
16373020.5
ns13918417
ns1.18
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/1 thread(s)
958000
ns929312.5
ns1.03
Dense(512 => 512, identity)(512 x 128)/zygote/GPU/CUDA
190740
ns190464
ns1.00
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/2 thread(s)
1378500
ns1348750
ns1.02
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/4 thread(s)
2610979.5
ns1282083
ns2.04
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/8 thread(s)
16066041
ns13837312.5
ns1.16
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/1 thread(s)
967958
ns987250
ns0.98
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
1773750
ns1655917
ns1.07
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1093875
ns1089000
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1520104
ns1532499.5
ns0.99
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
2458417
ns2439708
ns1.01
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/GPU/CUDA
209499
ns211500
ns0.99
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
12121583
ns12136437.5
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
8834833
ns8847479
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
9223542
ns9240938
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
17972771
ns17956208
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1903079
ns1905747
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
17300562
ns17305250
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
13987625
ns13985416
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
14513146
ns14505584
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
21072834
ns21107833
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
250439208
ns249894083
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
148115625
ns148856208
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
117228750
ns115718875
ns1.01
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
104041542
ns101619125
ns1.02
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/GPU/CUDA
5463821
ns5485492
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
1224682250
ns1228009625
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
933837625
ns931338167
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
835803479
ns829169479
ns1.01
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
628560812
ns628483479
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
35032007
ns38151835
ns0.92
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
1141719792
ns1134889125
ns1.01
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
983678666.5
ns992066062.5
ns0.99
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
1377974646
ns1309459854
ns1.05
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
746244021
ns745440771
ns1.00
lenet(28, 28, 1, 32)/forward/CPU/2 thread(s)
1114917
ns1092042
ns1.02
lenet(28, 28, 1, 32)/forward/CPU/4 thread(s)
1628542
ns1645709
ns0.99
lenet(28, 28, 1, 32)/forward/CPU/8 thread(s)
4086771
ns3466333
ns1.18
lenet(28, 28, 1, 32)/forward/CPU/1 thread(s)
959792
ns957250
ns1.00
lenet(28, 28, 1, 32)/forward/GPU/CUDA
272035
ns270549.5
ns1.01
lenet(28, 28, 1, 32)/zygote/CPU/2 thread(s)
2981354.5
ns2979042
ns1.00
lenet(28, 28, 1, 32)/zygote/CPU/4 thread(s)
4115937.5
ns4110542
ns1.00
lenet(28, 28, 1, 32)/zygote/CPU/8 thread(s)
9608958
ns10529229
ns0.91
lenet(28, 28, 1, 32)/zygote/CPU/1 thread(s)
3297500.5
ns3308833
ns1.00
lenet(28, 28, 1, 32)/zygote/GPU/CUDA
1076584
ns1070477
ns1.01
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
2355125
ns2350792
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1453000
ns1364187.5
ns1.07
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1602646
ns1709000
ns0.94
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
3770125
ns3666666.5
ns1.03
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/GPU/CUDA
215196
ns210396
ns1.02
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
20246500
ns20275459
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
16965833.5
ns16981437
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
18330417
ns18162375
ns1.01
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
26150209
ns26198500
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1980657
ns1979369
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
44324250
ns46206895.5
ns0.96
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
41015042
ns41017187.5
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
41295750
ns41176208.5
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
47634416
ns47588917
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
4656667
ns4669000
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2867250
ns2603916
ns1.10
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2754917
ns2999833
ns0.92
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
7179750
ns7252188
ns0.99
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/GPU/CUDA
515735.5
ns517525.5
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
40447166.5
ns40878729.5
ns0.99
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
33885499.5
ns33994250
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
34257187.5
ns33958333
ns1.01
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
51082812.5
ns51263292
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
3174195
ns3013320.5
ns1.05
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
109744583
ns113392541.5
ns0.97
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
135227938
ns136850541
ns0.99
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
270381750
ns250011854.5
ns1.08
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
95391167
ns95314208
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
270563333
ns270234083
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
161054417
ns157676542
ns1.02
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
125340042
ns128100708
ns0.98
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
146582812.5
ns144520145.5
ns1.01
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/GPU/CUDA
7052057
ns7091283
ns0.99
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
1502349770.5
ns1503173291.5
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
1201703584
ns1201978125
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
1090436625
ns1103595666.5
ns0.99
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
1030635583
ns1028790125.5
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
33863530
ns33654931
ns1.01
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
2004525437
ns2089411062.5
ns0.96
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
1793970792
ns1851532083
ns0.97
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
2094682166.5
ns2117297604.5
ns0.99
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
1594796917
ns1605439208
ns0.99
lenet(28, 28, 1, 128)/forward/CPU/2 thread(s)
1816417
ns2066438
ns0.88
lenet(28, 28, 1, 128)/forward/CPU/4 thread(s)
2535417
ns3005354
ns0.84
lenet(28, 28, 1, 128)/forward/CPU/8 thread(s)
9580729.5
ns7102958.5
ns1.35
lenet(28, 28, 1, 128)/forward/CPU/1 thread(s)
2124083
ns2151875
ns0.99
lenet(28, 28, 1, 128)/forward/GPU/CUDA
265598
ns270072.5
ns0.98
lenet(28, 28, 1, 128)/zygote/CPU/2 thread(s)
9396125
ns9657334
ns0.97
lenet(28, 28, 1, 128)/zygote/CPU/4 thread(s)
11490250
ns11945459
ns0.96
lenet(28, 28, 1, 128)/zygote/CPU/8 thread(s)
25636708
ns23020875
ns1.11
lenet(28, 28, 1, 128)/zygote/CPU/1 thread(s)
10456812.5
ns10467750
ns1.00
lenet(28, 28, 1, 128)/zygote/GPU/CUDA
1095109
ns1095059
ns1.00
vgg16(32, 32, 3, 32)/forward/CPU/2 thread(s)
381007729.5
ns381251625
ns1.00
vgg16(32, 32, 3, 32)/forward/CPU/4 thread(s)
283558854
ns309062375
ns0.92
vgg16(32, 32, 3, 32)/forward/CPU/8 thread(s)
264714708
ns241236375
ns1.10
vgg16(32, 32, 3, 32)/forward/CPU/1 thread(s)
179954521
ns180294333.5
ns1.00
vgg16(32, 32, 3, 32)/forward/GPU/CUDA
4874412
ns4847355
ns1.01
vgg16(32, 32, 3, 32)/zygote/CPU/2 thread(s)
1154043958
ns1146004375
ns1.01
vgg16(32, 32, 3, 32)/zygote/CPU/4 thread(s)
991918083
ns966522375
ns1.03
vgg16(32, 32, 3, 32)/zygote/CPU/8 thread(s)
1078324541
ns1026283833
ns1.05
vgg16(32, 32, 3, 32)/zygote/CPU/1 thread(s)
668069084
ns662156542
ns1.01
vgg16(32, 32, 3, 32)/zygote/GPU/CUDA
16315510
ns17798543
ns0.92
lenet(28, 28, 1, 64)/forward/CPU/2 thread(s)
1054520.5
ns1050458
ns1.00
lenet(28, 28, 1, 64)/forward/CPU/4 thread(s)
1957562.5
ns1656750
ns1.18
lenet(28, 28, 1, 64)/forward/CPU/8 thread(s)
6624334
ns6491250
ns1.02
lenet(28, 28, 1, 64)/forward/CPU/1 thread(s)
1352146
ns1312792
ns1.03
lenet(28, 28, 1, 64)/forward/GPU/CUDA
267010
ns270319.5
ns0.99
lenet(28, 28, 1, 64)/zygote/CPU/2 thread(s)
6499937.5
ns6504813
ns1.00
lenet(28, 28, 1, 64)/zygote/CPU/4 thread(s)
13781958
ns13132417
ns1.05
lenet(28, 28, 1, 64)/zygote/CPU/8 thread(s)
20923250
ns19754250
ns1.06
lenet(28, 28, 1, 64)/zygote/CPU/1 thread(s)
5707062.5
ns5741521
ns0.99
lenet(28, 28, 1, 64)/zygote/GPU/CUDA
1115597.5
ns1124270
ns0.99
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
70442792
ns70469479
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
43467103.5
ns43706291.5
ns0.99
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
39734999.5
ns39518625
ns1.01
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
35200125
ns35367542
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/GPU/CUDA
1845136
ns1851430
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
356138708
ns356004604
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
270050583
ns270290792
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
254207104
ns254164750
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
271696541.5
ns271950333.5
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
16499812
ns16539357
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
395249958
ns395899500
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
396501625
ns372060292
ns1.07
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
738492916.5
ns713782625
ns1.03
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
447067000
ns447779125
ns1.00
vgg16(32, 32, 3, 128)/forward/CPU/2 thread(s)
1189294541
ns1190490459
ns1.00
vgg16(32, 32, 3, 128)/forward/CPU/4 thread(s)
689030520.5
ns832670062.5
ns0.83
vgg16(32, 32, 3, 128)/forward/CPU/8 thread(s)
650962625
ns629944291
ns1.03
vgg16(32, 32, 3, 128)/forward/CPU/1 thread(s)
681961562
ns681507396
ns1.00
vgg16(32, 32, 3, 128)/forward/GPU/CUDA
12470086
ns12475051
ns1.00
vgg16(32, 32, 3, 128)/zygote/CPU/2 thread(s)
3681028375
ns3708044854
ns0.99
vgg16(32, 32, 3, 128)/zygote/CPU/4 thread(s)
2822971000
ns2828581542
ns1.00
vgg16(32, 32, 3, 128)/zygote/CPU/8 thread(s)
2698825750
ns2698925958
ns1.00
vgg16(32, 32, 3, 128)/zygote/CPU/1 thread(s)
2121646854.5
ns2137669604.5
ns0.99
vgg16(32, 32, 3, 128)/zygote/GPU/CUDA
49909051
ns49415932
ns1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
3408458
ns3423125
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2063208
ns2078500
ns0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2518458
ns2518458
ns1
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
4888750
ns4870375
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/GPU/CUDA
580004.5
ns586699.5
ns0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
25958666
ns25989500
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
18964292
ns19069958.5
ns0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
19447166.5
ns19259312
ns1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
36745416.5
ns36800833
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
3191777
ns2993892
ns1.07
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
55195125
ns54216125
ns1.02
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
81683979.5
ns83642959
ns0.98
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
174851250
ns174413208.5
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
42883916.5
ns42857708.5
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
1788312.5
ns1784458
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1100250
ns1095646
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1558396
ns1575292
ns0.99
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
2464688
ns2364687
ns1.04
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/GPU/CUDA
215197
ns216504.5
ns0.99
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
12518625
ns12531833
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
9205333
ns9200375
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
9628104
ns9626292
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
18331625
ns18391667
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1949026.5
ns1950268
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
17616875
ns17650333.5
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
14310166
ns14301166
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
14557291.5
ns14560250.5
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
21449812.5
ns21506145.5
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
70367541.5
ns70470354
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
43412916.5
ns43665542
ns0.99
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
39742938
ns39582249.5
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
35448542
ns35175625
ns1.01
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/GPU/CUDA
1795063
ns1838843
ns0.98
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
360004208
ns360077895.5
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
346542937
ns349062958.5
ns0.99
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
307664333.5
ns305213917
ns1.01
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
463480458
ns462206583
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
13962488.5
ns13925027
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
418770999.5
ns417720542
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
421592709
ns426193583
ns0.99
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
780166249.5
ns717833375.5
ns1.09
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
393782854
ns394045333.5
ns1.00
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/2 thread(s)
1880375
ns1908458
ns0.99
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/4 thread(s)
1570562.5
ns1382145.5
ns1.14
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/8 thread(s)
1246416.5
ns1574208
ns0.79
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/1 thread(s)
2596208.5
ns2658583
ns0.98
mlp7layer_bn(gelu)(32 x 256)/forward/GPU/CUDA
564741
ns567560
ns1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/2 thread(s)
9321042
ns9263291
ns1.01
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/4 thread(s)
13025292
ns15741709
ns0.83
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/8 thread(s)
33090166
ns30677874.5
ns1.08
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/1 thread(s)
6518396.5
ns6782125
ns0.96
mlp7layer_bn(gelu)(32 x 256)/zygote/GPU/CUDA
1351683.5
ns1355856
ns1.00
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/2 thread(s)
22256291
ns23068125
ns0.96
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/4 thread(s)
27788229
ns28298875
ns0.98
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/8 thread(s)
54815104
ns49366125
ns1.11
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/1 thread(s)
15723000
ns15664541
ns1.00
Dense(512 => 512, relu)(512 x 128)/forward/CPU/2 thread(s)
660437.5
ns787000
ns0.84
Dense(512 => 512, relu)(512 x 128)/forward/CPU/4 thread(s)
564125.5
ns613416
ns0.92
Dense(512 => 512, relu)(512 x 128)/forward/CPU/8 thread(s)
1067959
ns1014937.5
ns1.05
Dense(512 => 512, relu)(512 x 128)/forward/CPU/1 thread(s)
68833
ns67541.5
ns1.02
Dense(512 => 512, relu)(512 x 128)/forward/GPU/CUDA
48015
ns47213.5
ns1.02
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/2 thread(s)
1518999.5
ns1547187.5
ns0.98
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/4 thread(s)
1050917
ns1017917
ns1.03
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/8 thread(s)
1571000
ns1412645.5
ns1.11
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/1 thread(s)
325084
ns321542
ns1.01
Dense(512 => 512, relu)(512 x 128)/zygote/GPU/CUDA
216110
ns211309
ns1.02
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/2 thread(s)
1555895.5
ns1571042
ns0.99
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/4 thread(s)
1060292
ns1020042
ns1.04
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/8 thread(s)
1624541
ns1402125.5
ns1.16
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/1 thread(s)
374750
ns343812
ns1.09
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
3421708
ns3408000.5
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2057375
ns2049583.5
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2472729
ns2491583.5
ns0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
4540646
ns4842271
ns0.94
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/GPU/CUDA
585099
ns580126
ns1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
24053333
ns24112333.5
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
17186833
ns17188792
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
17114833.5
ns17119042
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
35115834
ns34987687
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
3096781.5
ns2894570.5
ns1.07
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
53599104
ns52602166
ns1.02
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
80093333
ns83256812
ns0.96
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
172009854
ns173355916.5
ns0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
42254666
ns42228833
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
249876333.5
ns250172041.5
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
148299229
ns148659167
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
116785208
ns115831270.5
ns1.01
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
106758125
ns106484375
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/GPU/CUDA
5452339
ns5471067
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
1100542291
ns1103002500
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
855735416.5
ns857541375
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
831274375
ns826884708.5
ns1.01
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
738168166.5
ns740474770.5
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
32317772.5
ns35136266
ns0.92
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
1001895729
ns1006767188
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
966598875
ns974529458
ns0.99
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
1307543687
ns1286053500
ns1.02
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
738405458
ns727101250
ns1.02
mlp7layer_bn(relu)(32 x 256)/forward/CPU/2 thread(s)
1230583
ns1308583
ns0.94
mlp7layer_bn(relu)(32 x 256)/forward/CPU/4 thread(s)
962250
ns664854.5
ns1.45
mlp7layer_bn(relu)(32 x 256)/forward/CPU/8 thread(s)
796604
ns906375
ns0.88
mlp7layer_bn(relu)(32 x 256)/forward/CPU/1 thread(s)
2036541
ns2049458
ns0.99
mlp7layer_bn(relu)(32 x 256)/forward/GPU/CUDA
567146.5
ns565223.5
ns1.00
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/2 thread(s)
5691500
ns5804687.5
ns0.98
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/4 thread(s)
6401396
ns8913625
ns0.72
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/8 thread(s)
25408000
ns24320125
ns1.04
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/1 thread(s)
3697229
ns3694792
ns1.00
mlp7layer_bn(relu)(32 x 256)/zygote/GPU/CUDA
1332396
ns1307349
ns1.02
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/2 thread(s)
9370333
ns9459208
ns0.99
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/4 thread(s)
13058291
ns15996021
ns0.82
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/8 thread(s)
32481708
ns31660167
ns1.03
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/1 thread(s)
4424396
ns4429208.5
ns1.00
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/2 thread(s)
390896
ns433416.5
ns0.90
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/4 thread(s)
458604
ns466208
ns0.98
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/8 thread(s)
2946292
ns1932812
ns1.52
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/1 thread(s)
54375
ns54000
ns1.01
Dense(128 => 128, gelu)(128 x 128)/forward/GPU/CUDA
28214
ns27617
ns1.02
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/2 thread(s)
360312.5
ns370958.5
ns0.97
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/4 thread(s)
439417
ns459083
ns0.96
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/8 thread(s)
5063292
ns4366749.5
ns1.16
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/1 thread(s)
190708
ns193875
ns0.98
Dense(128 => 128, gelu)(128 x 128)/zygote/GPU/CUDA
219423.5
ns216603.5
ns1.01
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/2 thread(s)
632709
ns684292
ns0.92
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/4 thread(s)
711770.5
ns731125
ns0.97
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/8 thread(s)
5249812.5
ns4502166
ns1.17
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/1 thread(s)
429750
ns435458
ns0.99
Dense(128 => 128, relu)(128 x 128)/forward/CPU/2 thread(s)
335333.5
ns377416
ns0.89
Dense(128 => 128, relu)(128 x 128)/forward/CPU/4 thread(s)
393604
ns405042
ns0.97
Dense(128 => 128, relu)(128 x 128)/forward/CPU/8 thread(s)
765792
ns718500
ns1.07
Dense(128 => 128, relu)(128 x 128)/forward/CPU/1 thread(s)
13458
ns12834
ns1.05
Dense(128 => 128, relu)(128 x 128)/forward/GPU/CUDA
28223
ns27924.5
ns1.01
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/2 thread(s)
286125
ns303979.5
ns0.94
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/4 thread(s)
310708
ns340916.5
ns0.91
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/8 thread(s)
733437.5
ns858875
ns0.85
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/1 thread(s)
25916
ns26333
ns0.98
Dense(128 => 128, relu)(128 x 128)/zygote/GPU/CUDA
209427
ns206665
ns1.01
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/2 thread(s)
302000
ns320916.5
ns0.94
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/4 thread(s)
328375
ns355500
ns0.92
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/8 thread(s)
842791.5
ns900792
ns0.94
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/1 thread(s)
28333
ns28875
ns0.98
vgg16(32, 32, 3, 64)/forward/CPU/2 thread(s)
602432125
ns603792041
ns1.00
vgg16(32, 32, 3, 64)/forward/CPU/4 thread(s)
430731937.5
ns430597750
ns1.00
vgg16(32, 32, 3, 64)/forward/CPU/8 thread(s)
392016750
ns375897687.5
ns1.04
vgg16(32, 32, 3, 64)/forward/CPU/1 thread(s)
322757833
ns321301750
ns1.00
vgg16(32, 32, 3, 64)/forward/GPU/CUDA
7676293
ns7676185
ns1.00
vgg16(32, 32, 3, 64)/zygote/CPU/2 thread(s)
2003927916.5
ns2002056937.5
ns1.00
vgg16(32, 32, 3, 64)/zygote/CPU/4 thread(s)
1623931938
ns1637403750
ns0.99
vgg16(32, 32, 3, 64)/zygote/CPU/8 thread(s)
1626427584
ns1658326812.5
ns0.98
vgg16(32, 32, 3, 64)/zygote/CPU/1 thread(s)
1179210042
ns1181133416
ns1.00
vgg16(32, 32, 3, 64)/zygote/GPU/CUDA
27131071
ns27018077.5
ns1.00
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/2 thread(s)
523645.5
ns527292
ns0.99
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/4 thread(s)
450709
ns402500
ns1.12
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/8 thread(s)
2446250
ns1773874.5
ns1.38
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/1 thread(s)
219187.5
ns217896
ns1.01
Dense(512 => 512, gelu)(512 x 128)/forward/GPU/CUDA
47774.5
ns47539
ns1.00
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/2 thread(s)
1875042
ns1972750
ns0.95
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/4 thread(s)
2602792
ns1830041
ns1.42
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/8 thread(s)
16587416.5
ns14502542
ns1.14
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/1 thread(s)
1501583
ns1511084
ns0.99
Dense(512 => 512, gelu)(512 x 128)/zygote/GPU/CUDA
226318.5
ns222835
ns1.02
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/2 thread(s)
2982667
ns3104000
ns0.96
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/4 thread(s)
5736062.5
ns5000208
ns1.15
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/8 thread(s)
17019146
ns15174146
ns1.12
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/1 thread(s)
2470812.5
ns2515479.5
ns0.98
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/2 thread(s)
1498583
ns1599584
ns0.94
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/4 thread(s)
1193771
ns933250
ns1.28
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/8 thread(s)
1029042
ns1233959
ns0.83
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/1 thread(s)
2235875
ns2349500
ns0.95
mlp7layer_bn(tanh)(32 x 256)/forward/GPU/CUDA
572216
ns564727.5
ns1.01
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/2 thread(s)
5950125
ns5989584
ns0.99
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/4 thread(s)
4653916
ns8876479.5
ns0.52
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/8 thread(s)
27167500
ns25076041
ns1.08
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/1 thread(s)
3927896
ns3931104
ns1.00
mlp7layer_bn(tanh)(32 x 256)/zygote/GPU/CUDA
1342658.5
ns1312718
ns1.02
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/2 thread(s)
11627667
ns11659958.5
ns1.00
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/4 thread(s)
14277520.5
ns18499562.5
ns0.77
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/8 thread(s)
36899542
ns34871271.5
ns1.06
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/1 thread(s)
6331458.5
ns6354542
ns1.00
Dense(16 => 16, relu)(16 x 128)/forward/CPU/2 thread(s)
2333
ns4666.5
ns0.50
Dense(16 => 16, relu)(16 x 128)/forward/CPU/4 thread(s)
2166
ns2625
ns0.83
Dense(16 => 16, relu)(16 x 128)/forward/CPU/8 thread(s)
3333
ns4333
ns0.77
Dense(16 => 16, relu)(16 x 128)/forward/CPU/1 thread(s)
2646
ns2292
ns1.15
Dense(16 => 16, relu)(16 x 128)/forward/GPU/CUDA
25097
ns24932
ns1.01
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/2 thread(s)
7333
ns7209
ns1.02
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/4 thread(s)
7125
ns9792
ns0.73
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/8 thread(s)
7375
ns7375
ns1
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/1 thread(s)
7250
ns7208
ns1.01
Dense(16 => 16, relu)(16 x 128)/zygote/GPU/CUDA
189428.5
ns190569.5
ns0.99
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/2 thread(s)
8167
ns8167
ns1
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/4 thread(s)
8250
ns8416
ns0.98
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/8 thread(s)
8542
ns8375
ns1.02
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/1 thread(s)
6083
ns5917
ns1.03
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/2 thread(s)
10667
ns10437.5
ns1.02
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/4 thread(s)
14041.5
ns13583
ns1.03
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/8 thread(s)
11125
ns11104.5
ns1.00
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/1 thread(s)
7333
ns7250
ns1.01
Dense(16 => 16, gelu)(16 x 128)/forward/GPU/CUDA
25251
ns24757
ns1.02
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/2 thread(s)
21917
ns21708
ns1.01
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/4 thread(s)
21708.5
ns21625
ns1.00
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/8 thread(s)
21750
ns21750
ns1
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/1 thread(s)
21916
ns21709
ns1.01
Dense(16 => 16, gelu)(16 x 128)/zygote/GPU/CUDA
198645
ns195121
ns1.02
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/2 thread(s)
53625
ns57500
ns0.93
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/4 thread(s)
53500
ns53500
ns1
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/8 thread(s)
53625
ns53583
ns1.00
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/1 thread(s)
54583
ns55083
ns0.99
Dense(128 => 128, identity)(128 x 128)/forward/CPU/2 thread(s)
28395.5
ns28583
ns0.99
Dense(128 => 128, identity)(128 x 128)/forward/CPU/4 thread(s)
28667
ns28667
ns1
Dense(128 => 128, identity)(128 x 128)/forward/CPU/8 thread(s)
28417
ns29000
ns0.98
Dense(128 => 128, identity)(128 x 128)/forward/CPU/1 thread(s)
46084
ns46334
ns0.99
Dense(128 => 128, identity)(128 x 128)/forward/GPU/CUDA
26326
ns25674
ns1.03
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/2 thread(s)
224125
ns227125
ns0.99
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/4 thread(s)
272959
ns276125
ns0.99
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/8 thread(s)
4409500
ns4228416.5
ns1.04
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/1 thread(s)
65708
ns63084
ns1.04
Dense(128 => 128, identity)(128 x 128)/zygote/GPU/CUDA
170084
ns166940.5
ns1.02
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/2 thread(s)
240562
ns246687
ns0.98
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/4 thread(s)
290792
ns293708
ns0.99
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/8 thread(s)
4409209
ns4174375
ns1.06
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/1 thread(s)
71541
ns68833
ns1.04
Dense(16 => 16, identity)(16 x 128)/forward/CPU/2 thread(s)
1708.5
ns1979.5
ns0.86
Dense(16 => 16, identity)(16 x 128)/forward/CPU/4 thread(s)
1792
ns2042
ns0.88
Dense(16 => 16, identity)(16 x 128)/forward/CPU/8 thread(s)
2541.5
ns2583.5
ns0.98
Dense(16 => 16, identity)(16 x 128)/forward/CPU/1 thread(s)
1917
ns2000
ns0.96
Dense(16 => 16, identity)(16 x 128)/forward/GPU/CUDA
23384
ns22856
ns1.02
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/2 thread(s)
5292
ns5416
ns0.98
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/4 thread(s)
5291
ns5291
ns1
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/8 thread(s)
5459
ns5375
ns1.02
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/1 thread(s)
5208.5
ns5291
ns0.98
Dense(16 => 16, identity)(16 x 128)/zygote/GPU/CUDA
173533
ns171204
ns1.01
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/2 thread(s)
7417
ns7500
ns0.99
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/4 thread(s)
7500
ns7542
ns0.99
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/8 thread(s)
7708
ns7750
ns0.99
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/1 thread(s)
5625
ns5708
ns0.99
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
81107833
ns80930834
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
49783792
ns48596833
ns1.02
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
43745208
ns45693208
ns0.96
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
56305270.5
ns56260583.5
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/GPU/CUDA
2634961
ns2631409
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
620785875
ns622112500
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
429264250
ns426582750
ns1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
416731125
ns411799708
ns1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
507694646.5
ns506749771
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
15139001
ns15162045
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
871599625
ns882246666
ns0.99
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
839558208.5
ns844291292
ns0.99
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
1206593209
ns1135779771
ns1.06
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
921408813
ns925012854.5
ns1.00
This comment was automatically generated by workflow using github-action-benchmark.