-
Notifications
You must be signed in to change notification settings - Fork 59
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
test: run type-stability tests first
- Loading branch information
Showing
3 changed files
with
29 additions
and
24 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ea332be
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lux Benchmarks
Dense(512 => 512, identity)(512 x 128)/forward/CPU/2 thread(s)
412458
ns407750
ns1.01
Dense(512 => 512, identity)(512 x 128)/forward/CPU/4 thread(s)
321709
ns243584
ns1.32
Dense(512 => 512, identity)(512 x 128)/forward/CPU/8 thread(s)
323896
ns322333.5
ns1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/1 thread(s)
741958
ns740729.5
ns1.00
Dense(512 => 512, identity)(512 x 128)/forward/GPU/CUDA
43204
ns43833
ns0.99
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/2 thread(s)
1317396
ns1312437
ns1.00
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/4 thread(s)
2464375
ns1253687.5
ns1.97
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/8 thread(s)
14642958
ns14025833.5
ns1.04
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/1 thread(s)
2196000
ns2194708
ns1.00
Dense(512 => 512, identity)(512 x 128)/zygote/GPU/CUDA
208726.5
ns204686
ns1.02
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/2 thread(s)
1450917
ns1421833
ns1.02
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/4 thread(s)
934625
ns917666
ns1.02
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/8 thread(s)
1697209
ns1521458.5
ns1.12
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/1 thread(s)
2207583
ns2206041.5
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
1781333
ns1636291
ns1.09
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1098208
ns1093166
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1507125
ns1540041
ns0.98
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
2908542
ns2957250
ns0.98
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/GPU/CUDA
209607.5
ns208244
ns1.01
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
12131458
ns12064562
ns1.01
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
8814770.5
ns8829500
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
9252270.5
ns9263417
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
18589083.5
ns18568125
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1490165
ns1510701
ns0.99
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
17289959
ns17268291.5
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
13989667
ns13964333
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
14502875
ns14534875
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
21849666
ns21840708.5
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
249663875
ns249883854.5
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
148521541
ns148685042
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
115933291.5
ns116235625
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
447579458
ns448238208
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/GPU/CUDA
5477215
ns5476091.5
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
1139835959
ns1190941209
ns0.96
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
978934750
ns979930625
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
853295792
ns855870500
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
1789749000
ns1816350500
ns0.99
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
31155061
ns31819565
ns0.98
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
1134051542
ns1035607791
ns1.10
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
999963750
ns995404083.5
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
1308847250.5
ns1363876062.5
ns0.96
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
1730047208
ns1750692104.5
ns0.99
lenet(28, 28, 1, 32)/forward/CPU/2 thread(s)
1099083.5
ns1047645.5
ns1.05
lenet(28, 28, 1, 32)/forward/CPU/4 thread(s)
1611187.5
ns1617437.5
ns1.00
lenet(28, 28, 1, 32)/forward/CPU/8 thread(s)
3499667
ns3577375
ns0.98
lenet(28, 28, 1, 32)/forward/CPU/1 thread(s)
783708.5
ns786333
ns1.00
lenet(28, 28, 1, 32)/forward/GPU/CUDA
269562.5
ns273031.5
ns0.99
lenet(28, 28, 1, 32)/zygote/CPU/2 thread(s)
3018791.5
ns3004708.5
ns1.00
lenet(28, 28, 1, 32)/zygote/CPU/4 thread(s)
4156979
ns4200542
ns0.99
lenet(28, 28, 1, 32)/zygote/CPU/8 thread(s)
10275229.5
ns11777875
ns0.87
lenet(28, 28, 1, 32)/zygote/CPU/1 thread(s)
3215083
ns3164334
ns1.02
lenet(28, 28, 1, 32)/zygote/GPU/CUDA
1193854
ns1198487
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
2332208
ns2248937
ns1.04
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1382583
ns1322917
ns1.05
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1687750
ns1665229
ns1.01
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
4215375.5
ns4201687.5
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/GPU/CUDA
209594
ns209763
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
19388000
ns19436166
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
16069000
ns16129625
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
17326292
ns17429750
ns0.99
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
25910416.5
ns25880667
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1588868.5
ns1610222
ns0.99
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
34088083
ns34173417
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
30937833
ns30876125
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
31230458.5
ns31248229
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
36701916.5
ns36410104.5
ns1.01
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
4541833.5
ns4482791.5
ns1.01
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2768083
ns2543375
ns1.09
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2900875.5
ns2920604
ns0.99
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
8397125
ns8391854
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/GPU/CUDA
420375
ns424436.5
ns0.99
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
38905562.5
ns38997125
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
32031146
ns32110145.5
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
32218187
ns32392750
ns0.99
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
52007937.5
ns51935562
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
2626371
ns2641424
ns0.99
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
89177917
ns88445625
ns1.01
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
114075750
ns115286208.5
ns0.99
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
229081458
ns222634750.5
ns1.03
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
74341583.5
ns74263979
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
268097375
ns267592834
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
159334604
ns156611875
ns1.02
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
126815875
ns127057604.5
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
486160833
ns485576209
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/GPU/CUDA
6980043.5
ns6966689
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
1472796583.5
ns1476015166.5
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
1170472000
ns1172283875
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
1064999083
ns1075732708
ns0.99
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
2007098958.5
ns2016646687.5
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
34638860
ns34531886
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
1689048208
ns1716968542
ns0.98
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
1523433000
ns1543811791.5
ns0.99
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
1884758500
ns1807404167
ns1.04
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
2205616333
ns2233706500
ns0.99
lenet(28, 28, 1, 128)/forward/CPU/2 thread(s)
2082500
ns2003312.5
ns1.04
lenet(28, 28, 1, 128)/forward/CPU/4 thread(s)
2988979.5
ns3069125
ns0.97
lenet(28, 28, 1, 128)/forward/CPU/8 thread(s)
8127625
ns7973333
ns1.02
lenet(28, 28, 1, 128)/forward/CPU/1 thread(s)
2508541.5
ns2392166
ns1.05
lenet(28, 28, 1, 128)/forward/GPU/CUDA
272761
ns279062
ns0.98
lenet(28, 28, 1, 128)/zygote/CPU/2 thread(s)
9735375
ns9610687.5
ns1.01
lenet(28, 28, 1, 128)/zygote/CPU/4 thread(s)
12139000
ns12028187.5
ns1.01
lenet(28, 28, 1, 128)/zygote/CPU/8 thread(s)
25821041
ns25151062.5
ns1.03
lenet(28, 28, 1, 128)/zygote/CPU/1 thread(s)
11698458
ns11817896
ns0.99
lenet(28, 28, 1, 128)/zygote/GPU/CUDA
1272798
ns1290603
ns0.99
vgg16(32, 32, 3, 32)/forward/CPU/2 thread(s)
381449917
ns380792958
ns1.00
vgg16(32, 32, 3, 32)/forward/CPU/4 thread(s)
286724875
ns308976625
ns0.93
vgg16(32, 32, 3, 32)/forward/CPU/8 thread(s)
242181541
ns241644896
ns1.00
vgg16(32, 32, 3, 32)/forward/CPU/1 thread(s)
453170250
ns452303562.5
ns1.00
vgg16(32, 32, 3, 32)/forward/GPU/CUDA
4833185
ns4828914.5
ns1.00
vgg16(32, 32, 3, 32)/zygote/CPU/2 thread(s)
1173336083
ns1171201291
ns1.00
vgg16(32, 32, 3, 32)/zygote/CPU/4 thread(s)
924216958
ns942099041
ns0.98
vgg16(32, 32, 3, 32)/zygote/CPU/8 thread(s)
971657125
ns953747750
ns1.02
vgg16(32, 32, 3, 32)/zygote/CPU/1 thread(s)
1430179375
ns1428813708
ns1.00
vgg16(32, 32, 3, 32)/zygote/GPU/CUDA
17840776
ns17841586
ns1.00
lenet(28, 28, 1, 64)/forward/CPU/2 thread(s)
1403750.5
ns1410500
ns1.00
lenet(28, 28, 1, 64)/forward/CPU/4 thread(s)
2081958
ns1664542
ns1.25
lenet(28, 28, 1, 64)/forward/CPU/8 thread(s)
5722875
ns5488792
ns1.04
lenet(28, 28, 1, 64)/forward/CPU/1 thread(s)
1408458
ns1374687.5
ns1.02
lenet(28, 28, 1, 64)/forward/GPU/CUDA
275280
ns280156
ns0.98
lenet(28, 28, 1, 64)/zygote/CPU/2 thread(s)
6770687
ns6573750
ns1.03
lenet(28, 28, 1, 64)/zygote/CPU/4 thread(s)
12458250
ns13287229
ns0.94
lenet(28, 28, 1, 64)/zygote/CPU/8 thread(s)
21274521
ns18608542
ns1.14
lenet(28, 28, 1, 64)/zygote/CPU/1 thread(s)
6134979
ns6090708
ns1.01
lenet(28, 28, 1, 64)/zygote/GPU/CUDA
1311627
ns1340355
ns0.98
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
70478833
ns70020646
ns1.01
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
43532916
ns43782708
ns0.99
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
39489833
ns39491500
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
132771729
ns132617625.5
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/GPU/CUDA
1936601.5
ns1956022
ns0.99
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
382368791
ns383334125
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
295591666.5
ns296279354
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
282483000
ns282808416
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
535030479
ns539325500
ns0.99
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
12289555.5
ns12276639
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
407420458
ns409376958
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
408775479
ns366908458
ns1.11
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
705784395.5
ns675051667
ns1.05
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
712922750
ns711583625
ns1.00
vgg16(32, 32, 3, 128)/forward/CPU/2 thread(s)
1190190416
ns1188246125
ns1.00
vgg16(32, 32, 3, 128)/forward/CPU/4 thread(s)
691356562.5
ns831292458.5
ns0.83
vgg16(32, 32, 3, 128)/forward/CPU/8 thread(s)
632381292
ns632114354
ns1.00
vgg16(32, 32, 3, 128)/forward/CPU/1 thread(s)
1864383042
ns1865387666
ns1.00
vgg16(32, 32, 3, 128)/forward/GPU/CUDA
12548744.5
ns12542044
ns1.00
vgg16(32, 32, 3, 128)/zygote/CPU/2 thread(s)
3527214854.5
ns3538721354
ns1.00
vgg16(32, 32, 3, 128)/zygote/CPU/4 thread(s)
2750816917
ns2772375167
ns0.99
vgg16(32, 32, 3, 128)/zygote/CPU/8 thread(s)
2723456375
ns2713351709
ns1.00
vgg16(32, 32, 3, 128)/zygote/CPU/1 thread(s)
4906995375
ns4951157583
ns0.99
vgg16(32, 32, 3, 128)/zygote/GPU/CUDA
49787100
ns49614963
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
3430374.5
ns3375250
ns1.02
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2075896
ns2081166.5
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2513604
ns2536500
ns0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
6036208
ns6037687.5
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/GPU/CUDA
290675.5
ns295351
ns0.98
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
25509791
ns25516333
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
18477979.5
ns18518313
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
18929687
ns18846583
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
38972812
ns38898354
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
2459960
ns2476302.5
ns0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
54137604
ns53964667
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
79016146
ns80576917
ns0.98
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
172864042
ns171957042
ns1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
45747729
ns45586062.5
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
1785000
ns1747812.5
ns1.02
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1098833
ns1105729
ns0.99
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1575271
ns1562500
ns1.01
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
3041083
ns3031521
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/GPU/CUDA
213255
ns212548
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
12530062
ns12517833
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
9179500
ns9220687.5
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
9666624.5
ns9561583.5
ns1.01
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
18982583.5
ns18978542
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1539758
ns1527181.5
ns1.01
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
17615791.5
ns17640583
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
14315750.5
ns14342500
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
14612125
ns14447312
ns1.01
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
22193458
ns22205541
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
70464750
ns70074708
ns1.01
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
43492875
ns43766958
ns0.99
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
39563729.5
ns39559750
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
132725062.5
ns132760291.5
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/GPU/CUDA
1890878
ns1957872
ns0.97
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
360162958
ns359826333
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
290966979
ns287628083
ns1.01
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
287495583.5
ns287402500
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
623603729
ns620985562.5
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
13401076
ns13387014.5
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
420131604
ns418344583
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
425616125
ns420938875
ns1.01
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
719362771
ns708918708
ns1.01
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
718603750
ns718150125
ns1.00
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/2 thread(s)
1566208
ns1473791.5
ns1.06
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/4 thread(s)
1239083.5
ns1037208
ns1.19
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/8 thread(s)
1245979.5
ns1169396
ns1.07
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/1 thread(s)
2362041
ns2343958
ns1.01
mlp7layer_bn(gelu)(32 x 256)/forward/GPU/CUDA
589439
ns576538.5
ns1.02
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/2 thread(s)
8832333
ns8746083.5
ns1.01
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/4 thread(s)
12769958
ns13704583
ns0.93
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/8 thread(s)
30689750
ns31580708
ns0.97
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/1 thread(s)
9829292
ns9800812.5
ns1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/GPU/CUDA
1434002
ns1453307
ns0.99
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/2 thread(s)
18037958
ns17895167
ns1.01
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/4 thread(s)
16982896
ns17277875
ns0.98
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/8 thread(s)
30462270.5
ns30555625
ns1.00
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/1 thread(s)
14482959
ns14342520.5
ns1.01
Dense(512 => 512, relu)(512 x 128)/forward/CPU/2 thread(s)
789958.5
ns766563
ns1.03
Dense(512 => 512, relu)(512 x 128)/forward/CPU/4 thread(s)
633083.5
ns521854.5
ns1.21
Dense(512 => 512, relu)(512 x 128)/forward/CPU/8 thread(s)
1036791.5
ns1038167
ns1.00
Dense(512 => 512, relu)(512 x 128)/forward/CPU/1 thread(s)
725125
ns737500
ns0.98
Dense(512 => 512, relu)(512 x 128)/forward/GPU/CUDA
48429
ns48260
ns1.00
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/2 thread(s)
1542250
ns1513333
ns1.02
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/4 thread(s)
1032458.5
ns1063333
ns0.97
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/8 thread(s)
1380125
ns1432583
ns0.96
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/1 thread(s)
2295562.5
ns2270542
ns1.01
Dense(512 => 512, relu)(512 x 128)/zygote/GPU/CUDA
240743.5
ns237514
ns1.01
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/2 thread(s)
1747854
ns1713000
ns1.02
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/4 thread(s)
1235354
ns1297458
ns0.95
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/8 thread(s)
1736479
ns1983417
ns0.88
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/1 thread(s)
2412208
ns2314667
ns1.04
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
3414875
ns3345354.5
ns1.02
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2061771
ns2072187.5
ns0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2477833
ns2520104
ns0.98
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
6017000
ns6022334
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/GPU/CUDA
284081.5
ns285768
ns0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
24039917
ns24075208
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
17178499.5
ns17290417
ns0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
17190666.5
ns17127666.5
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
37578542
ns37508125
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
2405176.5
ns2401522.5
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
52521875
ns52370708
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
78741917
ns85319667
ns0.92
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
170683583
ns170588104.5
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
44627042
ns44585812.5
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
250060541.5
ns249675792
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
148207250
ns148489083
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
115967792
ns115482854
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
448320521
ns447673229
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/GPU/CUDA
5438535.5
ns5439180
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
1129762334
ns1127687125
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
881232895.5
ns883344500.5
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
807642666
ns805915916
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
1746898708
ns1756922958
ns0.99
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
28881644.5
ns29345011
ns0.98
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
1020749770.5
ns1057020520.5
ns0.97
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
971889209
ns963663000
ns1.01
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
1306078959
ns1305738167
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
1723825958.5
ns1740054666.5
ns0.99
mlp7layer_bn(relu)(32 x 256)/forward/CPU/2 thread(s)
1295917
ns1286708
ns1.01
mlp7layer_bn(relu)(32 x 256)/forward/CPU/4 thread(s)
904250
ns773958
ns1.17
mlp7layer_bn(relu)(32 x 256)/forward/CPU/8 thread(s)
957041.5
ns910250
ns1.05
mlp7layer_bn(relu)(32 x 256)/forward/CPU/1 thread(s)
2119271
ns2050125
ns1.03
mlp7layer_bn(relu)(32 x 256)/forward/GPU/CUDA
573283
ns558247.5
ns1.03
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/2 thread(s)
5873750.5
ns5806667
ns1.01
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/4 thread(s)
6045250
ns8935000
ns0.68
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/8 thread(s)
24731625
ns24443500
ns1.01
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/1 thread(s)
7076916
ns7052833
ns1.00
mlp7layer_bn(relu)(32 x 256)/zygote/GPU/CUDA
1333770
ns1343916
ns0.99
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/2 thread(s)
11387000
ns10065667
ns1.13
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/4 thread(s)
10073875
ns10525833
ns0.96
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/8 thread(s)
17896812
ns17854875
ns1.00
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/1 thread(s)
8967708
ns8785625
ns1.02
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/2 thread(s)
479500
ns456166.5
ns1.05
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/4 thread(s)
475500
ns379750
ns1.25
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/8 thread(s)
2159875
ns1895687.5
ns1.14
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/1 thread(s)
89083
ns90208
ns0.99
Dense(128 => 128, gelu)(128 x 128)/forward/GPU/CUDA
28042
ns27683
ns1.01
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/2 thread(s)
383020.5
ns380792
ns1.01
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/4 thread(s)
428916
ns445520.5
ns0.96
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/8 thread(s)
4731438
ns4682292
ns1.01
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/1 thread(s)
266541
ns261167
ns1.02
Dense(128 => 128, gelu)(128 x 128)/zygote/GPU/CUDA
220790.5
ns219550.5
ns1.01
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/2 thread(s)
709084
ns704084
ns1.01
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/4 thread(s)
701625
ns722292
ns0.97
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/8 thread(s)
787375.5
ns997916.5
ns0.79
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/1 thread(s)
445771
ns451917
ns0.99
Dense(128 => 128, relu)(128 x 128)/forward/CPU/2 thread(s)
427875
ns405708
ns1.05
Dense(128 => 128, relu)(128 x 128)/forward/CPU/4 thread(s)
416708.5
ns324500
ns1.28
Dense(128 => 128, relu)(128 x 128)/forward/CPU/8 thread(s)
744000
ns744834
ns1.00
Dense(128 => 128, relu)(128 x 128)/forward/CPU/1 thread(s)
52854
ns54917
ns0.96
Dense(128 => 128, relu)(128 x 128)/forward/GPU/CUDA
27664
ns28053
ns0.99
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/2 thread(s)
340833
ns335750
ns1.02
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/4 thread(s)
317584
ns339729
ns0.93
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/8 thread(s)
868791.5
ns762333
ns1.14
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/1 thread(s)
153625
ns172500
ns0.89
Dense(128 => 128, relu)(128 x 128)/zygote/GPU/CUDA
207528
ns205595.5
ns1.01
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/2 thread(s)
404416
ns404000
ns1.00
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/4 thread(s)
385334
ns406875
ns0.95
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/8 thread(s)
1054292
ns805584
ns1.31
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/1 thread(s)
174000
ns174375
ns1.00
vgg16(32, 32, 3, 64)/forward/CPU/2 thread(s)
603618375
ns601761750
ns1.00
vgg16(32, 32, 3, 64)/forward/CPU/4 thread(s)
428696083
ns429688896
ns1.00
vgg16(32, 32, 3, 64)/forward/CPU/8 thread(s)
377266063
ns380315375
ns0.99
vgg16(32, 32, 3, 64)/forward/CPU/1 thread(s)
876199292
ns872892666.5
ns1.00
vgg16(32, 32, 3, 64)/forward/GPU/CUDA
7024377
ns7026376
ns1.00
vgg16(32, 32, 3, 64)/zygote/CPU/2 thread(s)
1985844104.5
ns1987308375
ns1.00
vgg16(32, 32, 3, 64)/zygote/CPU/4 thread(s)
1661758208.5
ns1621059583.5
ns1.03
vgg16(32, 32, 3, 64)/zygote/CPU/8 thread(s)
1608456437.5
ns1611181167
ns1.00
vgg16(32, 32, 3, 64)/zygote/CPU/1 thread(s)
2755931875
ns2764493083
ns1.00
vgg16(32, 32, 3, 64)/zygote/GPU/CUDA
25990323.5
ns25927495
ns1.00
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/2 thread(s)
522084
ns513625
ns1.02
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/4 thread(s)
433375
ns406334
ns1.07
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/8 thread(s)
2244333.5
ns1661167
ns1.35
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/1 thread(s)
870959
ns865667
ns1.01
Dense(512 => 512, gelu)(512 x 128)/forward/GPU/CUDA
47163.5
ns47567
ns0.99
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/2 thread(s)
1868271
ns1872250
ns1.00
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/4 thread(s)
2327875
ns1794875
ns1.30
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/8 thread(s)
14854667
ns14572917
ns1.02
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/1 thread(s)
2780687.5
ns2776666
ns1.00
Dense(512 => 512, gelu)(512 x 128)/zygote/GPU/CUDA
248248
ns247130.5
ns1.00
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/2 thread(s)
2717000
ns2742584
ns0.99
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/4 thread(s)
2282917
ns2329041.5
ns0.98
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/8 thread(s)
3907250
ns4441625
ns0.88
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/1 thread(s)
3418708
ns3342437.5
ns1.02
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/2 thread(s)
1568167
ns1568250
ns1.00
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/4 thread(s)
1231291.5
ns1047208.5
ns1.18
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/8 thread(s)
1182312.5
ns1211750
ns0.98
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/1 thread(s)
2381916
ns2310625
ns1.03
mlp7layer_bn(tanh)(32 x 256)/forward/GPU/CUDA
584934.5
ns587603
ns1.00
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/2 thread(s)
5788854
ns5769875
ns1.00
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/4 thread(s)
6745833
ns8267854
ns0.82
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/8 thread(s)
24802729
ns24141062
ns1.03
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/1 thread(s)
7285084
ns7272833
ns1.00
mlp7layer_bn(tanh)(32 x 256)/zygote/GPU/CUDA
1358738.5
ns1402858
ns0.97
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/2 thread(s)
13061333
ns12362292
ns1.06
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/4 thread(s)
12025375
ns11920979.5
ns1.01
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/8 thread(s)
21056084
ns21342959
ns0.99
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/1 thread(s)
10835604.5
ns10695063
ns1.01
Dense(16 => 16, relu)(16 x 128)/forward/CPU/2 thread(s)
2666
ns2375
ns1.12
Dense(16 => 16, relu)(16 x 128)/forward/CPU/4 thread(s)
2459
ns2834
ns0.87
Dense(16 => 16, relu)(16 x 128)/forward/CPU/8 thread(s)
3541.5
ns3270.5
ns1.08
Dense(16 => 16, relu)(16 x 128)/forward/CPU/1 thread(s)
2750
ns2750
ns1
Dense(16 => 16, relu)(16 x 128)/forward/GPU/CUDA
24643
ns23858
ns1.03
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/2 thread(s)
8500
ns8750
ns0.97
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/4 thread(s)
8709
ns8583
ns1.01
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/8 thread(s)
8770.5
ns8625
ns1.02
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/1 thread(s)
8458
ns8792
ns0.96
Dense(16 => 16, relu)(16 x 128)/zygote/GPU/CUDA
211404.5
ns217435
ns0.97
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/2 thread(s)
16791
ns16667
ns1.01
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/4 thread(s)
16708
ns16500
ns1.01
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/8 thread(s)
16708
ns16833
ns0.99
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/1 thread(s)
10750
ns10916
ns0.98
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/2 thread(s)
11729
ns14291
ns0.82
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/4 thread(s)
14500
ns15708.5
ns0.92
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/8 thread(s)
11709
ns12750
ns0.92
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/1 thread(s)
7833
ns7562.5
ns1.04
Dense(16 => 16, gelu)(16 x 128)/forward/GPU/CUDA
24689
ns25549
ns0.97
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/2 thread(s)
22667
ns22500
ns1.01
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/4 thread(s)
22250
ns22208
ns1.00
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/8 thread(s)
22459
ns22542
ns1.00
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/1 thread(s)
22500
ns22459
ns1.00
Dense(16 => 16, gelu)(16 x 128)/zygote/GPU/CUDA
232898
ns236055
ns0.99
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/2 thread(s)
52291.5
ns52312.5
ns1.00
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/4 thread(s)
52500
ns52125
ns1.01
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/8 thread(s)
52521
ns52500
ns1.00
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/1 thread(s)
44000
ns43917
ns1.00
Dense(128 => 128, identity)(128 x 128)/forward/CPU/2 thread(s)
28750
ns29000
ns0.99
Dense(128 => 128, identity)(128 x 128)/forward/CPU/4 thread(s)
29334
ns29041
ns1.01
Dense(128 => 128, identity)(128 x 128)/forward/CPU/8 thread(s)
29208
ns29250
ns1.00
Dense(128 => 128, identity)(128 x 128)/forward/CPU/1 thread(s)
46916
ns46458.5
ns1.01
Dense(128 => 128, identity)(128 x 128)/forward/GPU/CUDA
25952
ns26404
ns0.98
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/2 thread(s)
211958.5
ns209708
ns1.01
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/4 thread(s)
261208
ns267167
ns0.98
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/8 thread(s)
4169541.5
ns4263083.5
ns0.98
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/1 thread(s)
153125
ns148000
ns1.03
Dense(128 => 128, identity)(128 x 128)/zygote/GPU/CUDA
217493
ns216477.5
ns1.00
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/2 thread(s)
317500
ns314042
ns1.01
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/4 thread(s)
290167
ns301625
ns0.96
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/8 thread(s)
796854.5
ns772375
ns1.03
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/1 thread(s)
161500
ns160834
ns1.00
Dense(16 => 16, identity)(16 x 128)/forward/CPU/2 thread(s)
1792
ns1833
ns0.98
Dense(16 => 16, identity)(16 x 128)/forward/CPU/4 thread(s)
1875
ns2000
ns0.94
Dense(16 => 16, identity)(16 x 128)/forward/CPU/8 thread(s)
2625
ns2417
ns1.09
Dense(16 => 16, identity)(16 x 128)/forward/CPU/1 thread(s)
1917
ns2479
ns0.77
Dense(16 => 16, identity)(16 x 128)/forward/GPU/CUDA
22908
ns23762
ns0.96
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/2 thread(s)
7416
ns7417
ns1.00
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/4 thread(s)
7208
ns7125
ns1.01
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/8 thread(s)
7625
ns7584
ns1.01
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/1 thread(s)
7625
ns7187.5
ns1.06
Dense(16 => 16, identity)(16 x 128)/zygote/GPU/CUDA
268483
ns255200
ns1.05
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/2 thread(s)
11250
ns11250
ns1
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/4 thread(s)
11625
ns11708
ns0.99
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/8 thread(s)
11542
ns11708
ns0.99
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/1 thread(s)
6833
ns6958
ns0.98
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
79894667
ns79879875
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
49133813
ns47906812.5
ns1.03
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
44971167
ns44940625
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
151617667
ns152149250
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/GPU/CUDA
2714974.5
ns2721915
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
472351667
ns662482292
ns0.71
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
408027541
ns413355625
ns0.99
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
398391084
ns398984125
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
687897666
ns733919021
ns0.94
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
14607484.5
ns14581195
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
686060271
ns709711916.5
ns0.97
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
657056541
ns666381083
ns0.99
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
1003771958
ns1014202458
ns0.99
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
999509292
ns999737375
ns1.00
This comment was automatically generated by workflow using github-action-benchmark.