Skip to content

Commit

Permalink
chore: apply formatting suggestion
Browse files Browse the repository at this point in the history
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
  • Loading branch information
avik-pal and github-actions[bot] committed Sep 4, 2024
1 parent 1a1bb12 commit 49f49a5
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions test/shared_testsetup.jl
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,8 @@ end
maybe_rewrite_to_crosscor(layer) = layer
function maybe_rewrite_to_crosscor(layer::Conv)
return CrossCor(layer.activation, layer.in_chs, layer.out_chs, layer.kernel_size,
layer.stride, layer.pad, layer.dilation, layer.groups, layer.init_weight,
layer.init_bias, layer.use_bias)
layer.stride, layer.pad, layer.dilation, layer.groups,
layer.init_weight, layer.init_bias, layer.use_bias)
end

function maybe_rewrite_to_crosscor(mode, model)
Expand Down

3 comments on commit 49f49a5

@avik-pal
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JuliaRegistrator
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Registration pull request created: JuliaRegistries/General/114566

Tip: Release Notes

Did you know you can add release notes too? Just add markdown formatted text underneath the comment after the text
"Release notes:" and it will be added to the registry PR, and if TagBot is installed it will also be added to the
release that TagBot creates. i.e.

@JuliaRegistrator register

Release notes:

## Breaking changes

- blah

To add them here just re-invoke and the PR will be updated.

Tagging

After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version.

This will be done automatically if the Julia TagBot GitHub Action is installed, or can be done manually through the github interface, or via:

git tag -a v0.5.68 -m "<description of version>" 49f49a5cc51ac47f59feb9d6f8a57b737b6c7358
git push origin v0.5.68

@github-actions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lux Benchmarks

Benchmark suite Current: 49f49a5 Previous: ea332be Ratio
Dense(512 => 512, identity)(512 x 128)/forward/CPU/2 thread(s) 414937.5 ns 412458 ns 1.01
Dense(512 => 512, identity)(512 x 128)/forward/CPU/4 thread(s) 322917 ns 321709 ns 1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/8 thread(s) 323167 ns 323896 ns 1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/1 thread(s) 739334 ns 741958 ns 1.00
Dense(512 => 512, identity)(512 x 128)/forward/GPU/CUDA 43603 ns 43204 ns 1.01
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/2 thread(s) 1281041.5 ns 1317396 ns 0.97
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/4 thread(s) 2448000 ns 2464375 ns 0.99
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/8 thread(s) 14112208.5 ns 14642958 ns 0.96
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/1 thread(s) 2281500 ns 2196000 ns 1.04
Dense(512 => 512, identity)(512 x 128)/zygote/GPU/CUDA 209418 ns 208726.5 ns 1.00
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/2 thread(s) 1389292 ns 1450917 ns 0.96
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/4 thread(s) 885541 ns 934625 ns 0.95
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/8 thread(s) 1564334 ns 1697209 ns 0.92
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/1 thread(s) 2244666 ns 2207583 ns 1.02
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s) 1768458 ns 1781333 ns 0.99
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s) 1070292 ns 1098208 ns 0.97
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s) 1534708 ns 1507125 ns 1.02
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s) 2945750 ns 2908542 ns 1.01
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/GPU/CUDA 210107 ns 209607.5 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s) 12156916 ns 12131458 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s) 8795791 ns 8814770.5 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s) 9216583 ns 9252270.5 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s) 18566125 ns 18589083.5 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/GPU/CUDA 1491618 ns 1490165 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s) 17331499.5 ns 17289959 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s) 13987542 ns 13989667 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s) 14472812.5 ns 14502875 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s) 21820291 ns 21849666 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s) 249342958.5 ns 249663875 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s) 148241750 ns 148521541 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s) 116015000 ns 115933291.5 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s) 453798708 ns 447579458 ns 1.01
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/GPU/CUDA 5449223 ns 5477215 ns 0.99
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s) 1146266250 ns 1139835959 ns 1.01
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s) 981704875 ns 978934750 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s) 841522708.5 ns 853295792 ns 0.99
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s) 1759323917 ns 1789749000 ns 0.98
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA 31586701 ns 31155061 ns 1.01
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s) 1042907459 ns 1134051542 ns 0.92
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s) 1000416291.5 ns 999963750 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s) 1298076750 ns 1308847250.5 ns 0.99
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s) 1737205625 ns 1730047208 ns 1.00
lenet(28, 28, 1, 32)/forward/CPU/2 thread(s) 1119562.5 ns 1099083.5 ns 1.02
lenet(28, 28, 1, 32)/forward/CPU/4 thread(s) 1622875 ns 1611187.5 ns 1.01
lenet(28, 28, 1, 32)/forward/CPU/8 thread(s) 3548958 ns 3499667 ns 1.01
lenet(28, 28, 1, 32)/forward/CPU/1 thread(s) 785583 ns 783708.5 ns 1.00
lenet(28, 28, 1, 32)/forward/GPU/CUDA 274324 ns 269562.5 ns 1.02
lenet(28, 28, 1, 32)/zygote/CPU/2 thread(s) 3038000 ns 3018791.5 ns 1.01
lenet(28, 28, 1, 32)/zygote/CPU/4 thread(s) 4083437.5 ns 4156979 ns 0.98
lenet(28, 28, 1, 32)/zygote/CPU/8 thread(s) 11037583 ns 10275229.5 ns 1.07
lenet(28, 28, 1, 32)/zygote/CPU/1 thread(s) 3144521 ns 3215083 ns 0.98
lenet(28, 28, 1, 32)/zygote/GPU/CUDA 1135705 ns 1193854 ns 0.95
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s) 2308333.5 ns 2332208 ns 0.99
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s) 1430208 ns 1382583 ns 1.03
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s) 1667021 ns 1687750 ns 0.99
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s) 4209459 ns 4215375.5 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/GPU/CUDA 210374 ns 209594 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s) 19417292 ns 19388000 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s) 16085209 ns 16069000 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s) 17361667 ns 17326292 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s) 25874854.5 ns 25910416.5 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA 1598161.5 ns 1588868.5 ns 1.01
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s) 34253375 ns 34088083 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s) 30840208 ns 30937833 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s) 31540625 ns 31230458.5 ns 1.01
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s) 36820354 ns 36701916.5 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s) 4533500 ns 4541833.5 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s) 2754437.5 ns 2768083 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s) 2922958.5 ns 2900875.5 ns 1.01
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s) 8379896 ns 8397125 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/GPU/CUDA 425541 ns 420375 ns 1.01
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s) 38931833.5 ns 38905562.5 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s) 32059166 ns 32031146 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s) 32304958 ns 32218187 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s) 51832000 ns 52007937.5 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA 2625272.5 ns 2626371 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s) 89088083.5 ns 89177917 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s) 114374167 ns 114075750 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s) 224208209 ns 229081458 ns 0.98
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s) 74528979 ns 74341583.5 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s) 268596959 ns 268097375 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s) 159233333.5 ns 159334604 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s) 126780333 ns 126815875 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s) 484901875 ns 486160833 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/GPU/CUDA 7002114 ns 6980043.5 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s) 1474144916.5 ns 1472796583.5 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s) 1144467750 ns 1170472000 ns 0.98
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s) 1075737187.5 ns 1064999083 ns 1.01
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s) 2026289333.5 ns 2007098958.5 ns 1.01
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA 34635293 ns 34638860 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s) 1704908667 ns 1689048208 ns 1.01
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s) 1477917583.5 ns 1523433000 ns 0.97
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s) 1882348542 ns 1884758500 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s) 2231847042 ns 2205616333 ns 1.01
lenet(28, 28, 1, 128)/forward/CPU/2 thread(s) 2004166.5 ns 2082500 ns 0.96
lenet(28, 28, 1, 128)/forward/CPU/4 thread(s) 2569125 ns 2988979.5 ns 0.86
lenet(28, 28, 1, 128)/forward/CPU/8 thread(s) 6929417 ns 8127625 ns 0.85
lenet(28, 28, 1, 128)/forward/CPU/1 thread(s) 2435021 ns 2508541.5 ns 0.97
lenet(28, 28, 1, 128)/forward/GPU/CUDA 267867 ns 272761 ns 0.98
lenet(28, 28, 1, 128)/zygote/CPU/2 thread(s) 9579229.5 ns 9735375 ns 0.98
lenet(28, 28, 1, 128)/zygote/CPU/4 thread(s) 11450124.5 ns 12139000 ns 0.94
lenet(28, 28, 1, 128)/zygote/CPU/8 thread(s) 24113603.5 ns 25821041 ns 0.93
lenet(28, 28, 1, 128)/zygote/CPU/1 thread(s) 11704333 ns 11698458 ns 1.00
lenet(28, 28, 1, 128)/zygote/GPU/CUDA 1169053.5 ns 1272798 ns 0.92
vgg16(32, 32, 3, 32)/forward/CPU/2 thread(s) 380411333 ns 381449917 ns 1.00
vgg16(32, 32, 3, 32)/forward/CPU/4 thread(s) 282013417 ns 286724875 ns 0.98
vgg16(32, 32, 3, 32)/forward/CPU/8 thread(s) 241718292 ns 242181541 ns 1.00
vgg16(32, 32, 3, 32)/forward/CPU/1 thread(s) 452199062 ns 453170250 ns 1.00
vgg16(32, 32, 3, 32)/forward/GPU/CUDA 4861447 ns 4833185 ns 1.01
vgg16(32, 32, 3, 32)/zygote/CPU/2 thread(s) 1177026875 ns 1173336083 ns 1.00
vgg16(32, 32, 3, 32)/zygote/CPU/4 thread(s) 911798208 ns 924216958 ns 0.99
vgg16(32, 32, 3, 32)/zygote/CPU/8 thread(s) 959270583 ns 971657125 ns 0.99
vgg16(32, 32, 3, 32)/zygote/CPU/1 thread(s) 1420082458 ns 1430179375 ns 0.99
vgg16(32, 32, 3, 32)/zygote/GPU/CUDA 18016614 ns 17840776 ns 1.01
lenet(28, 28, 1, 64)/forward/CPU/2 thread(s) 1506958 ns 1403750.5 ns 1.07
lenet(28, 28, 1, 64)/forward/CPU/4 thread(s) 1619042 ns 2081958 ns 0.78
lenet(28, 28, 1, 64)/forward/CPU/8 thread(s) 6078187.5 ns 5722875 ns 1.06
lenet(28, 28, 1, 64)/forward/CPU/1 thread(s) 1294916 ns 1408458 ns 0.92
lenet(28, 28, 1, 64)/forward/GPU/CUDA 267932 ns 275280 ns 0.97
lenet(28, 28, 1, 64)/zygote/CPU/2 thread(s) 6812000 ns 6770687 ns 1.01
lenet(28, 28, 1, 64)/zygote/CPU/4 thread(s) 13135583 ns 12458250 ns 1.05
lenet(28, 28, 1, 64)/zygote/CPU/8 thread(s) 19155666 ns 21274521 ns 0.90
lenet(28, 28, 1, 64)/zygote/CPU/1 thread(s) 6056354 ns 6134979 ns 0.99
lenet(28, 28, 1, 64)/zygote/GPU/CUDA 1212499 ns 1311627 ns 0.92
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s) 70511583 ns 70478833 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s) 43537375 ns 43532916 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s) 39409417 ns 39489833 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s) 133783708 ns 132771729 ns 1.01
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/GPU/CUDA 1933585.5 ns 1936601.5 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s) 381557562.5 ns 382368791 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s) 295764895.5 ns 295591666.5 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s) 281324083 ns 282483000 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s) 535257270.5 ns 535030479 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/GPU/CUDA 12290544.5 ns 12289555.5 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s) 412505875 ns 407420458 ns 1.01
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s) 373209375 ns 408775479 ns 0.91
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s) 688004291 ns 705784395.5 ns 0.97
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s) 709404875 ns 712922750 ns 1.00
vgg16(32, 32, 3, 128)/forward/CPU/2 thread(s) 1186087125 ns 1190190416 ns 1.00
vgg16(32, 32, 3, 128)/forward/CPU/4 thread(s) 688362479 ns 691356562.5 ns 1.00
vgg16(32, 32, 3, 128)/forward/CPU/8 thread(s) 626514875 ns 632381292 ns 0.99
vgg16(32, 32, 3, 128)/forward/CPU/1 thread(s) 1778854333.5 ns 1864383042 ns 0.95
vgg16(32, 32, 3, 128)/forward/GPU/CUDA 12319166.5 ns 12548744.5 ns 0.98
vgg16(32, 32, 3, 128)/zygote/CPU/2 thread(s) 3506982229 ns 3527214854.5 ns 0.99
vgg16(32, 32, 3, 128)/zygote/CPU/4 thread(s) 2794034750 ns 2750816917 ns 1.02
vgg16(32, 32, 3, 128)/zygote/CPU/8 thread(s) 2699392833 ns 2723456375 ns 0.99
vgg16(32, 32, 3, 128)/zygote/CPU/1 thread(s) 4950907833 ns 4906995375 ns 1.01
vgg16(32, 32, 3, 128)/zygote/GPU/CUDA 49414957 ns 49787100 ns 0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s) 3424125 ns 3430374.5 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s) 2051500 ns 2075896 ns 0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s) 2533250 ns 2513604 ns 1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s) 6031895.5 ns 6036208 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/GPU/CUDA 313046 ns 290675.5 ns 1.08
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s) 25554687.5 ns 25509791 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s) 18540916 ns 18477979.5 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s) 18962271 ns 18929687 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s) 38399291 ns 38972812 ns 0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA 2470998 ns 2459960 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s) 54650500 ns 54137604 ns 1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s) 78908438 ns 79016146 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s) 169063625 ns 172864042 ns 0.98
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s) 45558958 ns 45747729 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s) 1786417 ns 1785000 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s) 1086125 ns 1098833 ns 0.99
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s) 1603104.5 ns 1575271 ns 1.02
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s) 3030083 ns 3041083 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/GPU/CUDA 214935.5 ns 213255 ns 1.01
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s) 12546292 ns 12530062 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s) 9205583.5 ns 9179500 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s) 9646125 ns 9666624.5 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s) 18948583 ns 18982583.5 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA 1529511.5 ns 1539758 ns 0.99
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s) 17691958 ns 17615791.5 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s) 14322292 ns 14315750.5 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s) 14657000 ns 14612125 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s) 22150500 ns 22193458 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s) 70485250 ns 70464750 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s) 43560250 ns 43492875 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s) 39651125 ns 39563729.5 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s) 132456250 ns 132725062.5 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/GPU/CUDA 1934167.5 ns 1890878 ns 1.02
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s) 359706666 ns 360162958 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s) 289803812 ns 290966979 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s) 287024520.5 ns 287495583.5 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s) 620943458 ns 623603729 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA 13389118 ns 13401076 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s) 418207750.5 ns 420131604 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s) 426872417 ns 425616125 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s) 708863792 ns 719362771 ns 0.99
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s) 714272667 ns 718603750 ns 0.99
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/2 thread(s) 1467229.5 ns 1566208 ns 0.94
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/4 thread(s) 1164958 ns 1239083.5 ns 0.94
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/8 thread(s) 1223792 ns 1245979.5 ns 0.98
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/1 thread(s) 2308375 ns 2362041 ns 0.98
mlp7layer_bn(gelu)(32 x 256)/forward/GPU/CUDA 583756 ns 589439 ns 0.99
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/2 thread(s) 8755666.5 ns 8832333 ns 0.99
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/4 thread(s) 12812083 ns 12769958 ns 1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/8 thread(s) 31879208 ns 30689750 ns 1.04
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/1 thread(s) 9792458 ns 9829292 ns 1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/GPU/CUDA 1392390 ns 1434002 ns 0.97
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/2 thread(s) 17932750 ns 18037958 ns 0.99
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/4 thread(s) 17135625 ns 16982896 ns 1.01
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/8 thread(s) 29811583 ns 30462270.5 ns 0.98
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/1 thread(s) 14460729.5 ns 14482959 ns 1.00
Dense(512 => 512, relu)(512 x 128)/forward/CPU/2 thread(s) 823083.5 ns 789958.5 ns 1.04
Dense(512 => 512, relu)(512 x 128)/forward/CPU/4 thread(s) 620625 ns 633083.5 ns 0.98
Dense(512 => 512, relu)(512 x 128)/forward/CPU/8 thread(s) 1022854.5 ns 1036791.5 ns 0.99
Dense(512 => 512, relu)(512 x 128)/forward/CPU/1 thread(s) 740791 ns 725125 ns 1.02
Dense(512 => 512, relu)(512 x 128)/forward/GPU/CUDA 47357.5 ns 48429 ns 0.98
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/2 thread(s) 1528750 ns 1542250 ns 0.99
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/4 thread(s) 953917 ns 1032458.5 ns 0.92
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/8 thread(s) 1387583 ns 1380125 ns 1.01
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/1 thread(s) 2279146 ns 2295562.5 ns 0.99
Dense(512 => 512, relu)(512 x 128)/zygote/GPU/CUDA 233369 ns 240743.5 ns 0.97
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/2 thread(s) 1748792 ns 1747854 ns 1.00
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/4 thread(s) 1258250 ns 1235354 ns 1.02
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/8 thread(s) 1680104 ns 1736479 ns 0.97
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/1 thread(s) 2337833.5 ns 2412208 ns 0.97
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s) 3398000.5 ns 3414875 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s) 2032875 ns 2061771 ns 0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s) 2524750 ns 2477833 ns 1.02
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s) 5998916 ns 6017000 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/GPU/CUDA 282348 ns 284081.5 ns 0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s) 24156145.5 ns 24039917 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s) 17254937.5 ns 17178499.5 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s) 17217979.5 ns 17190666.5 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s) 37524604.5 ns 37578542 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/GPU/CUDA 2399084 ns 2405176.5 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s) 52823083 ns 52521875 ns 1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s) 80975187.5 ns 78741917 ns 1.03
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s) 170431562.5 ns 170683583 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s) 44543937.5 ns 44627042 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s) 251011396 ns 250060541.5 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s) 148156125 ns 148207250 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s) 115824354 ns 115967792 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s) 454908937.5 ns 448320521 ns 1.01
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/GPU/CUDA 5336248.5 ns 5438535.5 ns 0.98
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s) 1130772458 ns 1129762334 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s) 881484167 ns 881232895.5 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s) 804587958 ns 807642666 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s) 1745692959 ns 1746898708 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/GPU/CUDA 28847342 ns 28881644.5 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s) 1027064583.5 ns 1020749770.5 ns 1.01
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s) 959640250 ns 971889209 ns 0.99
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s) 1261786916 ns 1306078959 ns 0.97
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s) 1731191479.5 ns 1723825958.5 ns 1.00
mlp7layer_bn(relu)(32 x 256)/forward/CPU/2 thread(s) 1173624.5 ns 1295917 ns 0.91
mlp7layer_bn(relu)(32 x 256)/forward/CPU/4 thread(s) 906000 ns 904250 ns 1.00
mlp7layer_bn(relu)(32 x 256)/forward/CPU/8 thread(s) 939334 ns 957041.5 ns 0.98
mlp7layer_bn(relu)(32 x 256)/forward/CPU/1 thread(s) 2039708.5 ns 2119271 ns 0.96
mlp7layer_bn(relu)(32 x 256)/forward/GPU/CUDA 570174 ns 573283 ns 0.99
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/2 thread(s) 5806917 ns 5873750.5 ns 0.99
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/4 thread(s) 7014250 ns 6045250 ns 1.16
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/8 thread(s) 25017291.5 ns 24731625 ns 1.01
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/1 thread(s) 7060041 ns 7076916 ns 1.00
mlp7layer_bn(relu)(32 x 256)/zygote/GPU/CUDA 1340455.5 ns 1333770 ns 1.01
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/2 thread(s) 11530292 ns 11387000 ns 1.01
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/4 thread(s) 8850020.5 ns 10073875 ns 0.88
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/8 thread(s) 17434458 ns 17896812 ns 0.97
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/1 thread(s) 8551437.5 ns 8967708 ns 0.95
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/2 thread(s) 506417 ns 479500 ns 1.06
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/4 thread(s) 273625 ns 475500 ns 0.58
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/8 thread(s) 2396979 ns 2159875 ns 1.11
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/1 thread(s) 90000 ns 89083 ns 1.01
Dense(128 => 128, gelu)(128 x 128)/forward/GPU/CUDA 27635 ns 28042 ns 0.99
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/2 thread(s) 385625 ns 383020.5 ns 1.01
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/4 thread(s) 348812.5 ns 428916 ns 0.81
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/8 thread(s) 4572979.5 ns 4731438 ns 0.97
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/1 thread(s) 262125 ns 266541 ns 0.98
Dense(128 => 128, gelu)(128 x 128)/zygote/GPU/CUDA 220978.5 ns 220790.5 ns 1.00
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/2 thread(s) 707916 ns 709084 ns 1.00
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/4 thread(s) 579562.5 ns 701625 ns 0.83
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/8 thread(s) 1057604 ns 787375.5 ns 1.34
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/1 thread(s) 449729 ns 445771 ns 1.01
Dense(128 => 128, relu)(128 x 128)/forward/CPU/2 thread(s) 456750 ns 427875 ns 1.07
Dense(128 => 128, relu)(128 x 128)/forward/CPU/4 thread(s) 212166 ns 416708.5 ns 0.51
Dense(128 => 128, relu)(128 x 128)/forward/CPU/8 thread(s) 729000 ns 744000 ns 0.98
Dense(128 => 128, relu)(128 x 128)/forward/CPU/1 thread(s) 54895.5 ns 52854 ns 1.04
Dense(128 => 128, relu)(128 x 128)/forward/GPU/CUDA 27483.5 ns 27664 ns 0.99
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/2 thread(s) 339208 ns 340833 ns 1.00
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/4 thread(s) 194896 ns 317584 ns 0.61
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/8 thread(s) 864542 ns 868791.5 ns 1.00
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/1 thread(s) 153187.5 ns 153625 ns 1.00
Dense(128 => 128, relu)(128 x 128)/zygote/GPU/CUDA 206000 ns 207528 ns 0.99
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/2 thread(s) 406291 ns 404416 ns 1.00
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/4 thread(s) 262083.5 ns 385334 ns 0.68
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/8 thread(s) 828042 ns 1054292 ns 0.79
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/1 thread(s) 173792 ns 174000 ns 1.00
vgg16(32, 32, 3, 64)/forward/CPU/2 thread(s) 600740375 ns 603618375 ns 1.00
vgg16(32, 32, 3, 64)/forward/CPU/4 thread(s) 425777500 ns 428696083 ns 0.99
vgg16(32, 32, 3, 64)/forward/CPU/8 thread(s) 373716000 ns 377266063 ns 0.99
vgg16(32, 32, 3, 64)/forward/CPU/1 thread(s) 873713812.5 ns 876199292 ns 1.00
vgg16(32, 32, 3, 64)/forward/GPU/CUDA 7032511.5 ns 7024377 ns 1.00
vgg16(32, 32, 3, 64)/zygote/CPU/2 thread(s) 2084258688 ns 1985844104.5 ns 1.05
vgg16(32, 32, 3, 64)/zygote/CPU/4 thread(s) 1651169312.5 ns 1661758208.5 ns 0.99
vgg16(32, 32, 3, 64)/zygote/CPU/8 thread(s) 1580932771 ns 1608456437.5 ns 0.98
vgg16(32, 32, 3, 64)/zygote/CPU/1 thread(s) 2753232709 ns 2755931875 ns 1.00
vgg16(32, 32, 3, 64)/zygote/GPU/CUDA 26093846 ns 25990323.5 ns 1.00
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/2 thread(s) 534708 ns 522084 ns 1.02
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/4 thread(s) 428292 ns 433375 ns 0.99
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/8 thread(s) 1851583 ns 2244333.5 ns 0.83
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/1 thread(s) 866334 ns 870959 ns 0.99
Dense(512 => 512, gelu)(512 x 128)/forward/GPU/CUDA 46927 ns 47163.5 ns 0.99
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/2 thread(s) 1888062.5 ns 1868271 ns 1.01
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/4 thread(s) 2316896 ns 2327875 ns 1.00
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/8 thread(s) 14585209 ns 14854667 ns 0.98
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/1 thread(s) 2757583.5 ns 2780687.5 ns 0.99
Dense(512 => 512, gelu)(512 x 128)/zygote/GPU/CUDA 247984.5 ns 248248 ns 1.00
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/2 thread(s) 2751959 ns 2717000 ns 1.01
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/4 thread(s) 2279292 ns 2282917 ns 1.00
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/8 thread(s) 3318791.5 ns 3907250 ns 0.85
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/1 thread(s) 3395625 ns 3418708 ns 0.99
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/2 thread(s) 1510000 ns 1568167 ns 0.96
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/4 thread(s) 1177708 ns 1231291.5 ns 0.96
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/8 thread(s) 1195583 ns 1182312.5 ns 1.01
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/1 thread(s) 2315167 ns 2381916 ns 0.97
mlp7layer_bn(tanh)(32 x 256)/forward/GPU/CUDA 588506 ns 584934.5 ns 1.01
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/2 thread(s) 5715958.5 ns 5788854 ns 0.99
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/4 thread(s) 6618896 ns 6745833 ns 0.98
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/8 thread(s) 24170542 ns 24802729 ns 0.97
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/1 thread(s) 7277583.5 ns 7285084 ns 1.00
mlp7layer_bn(tanh)(32 x 256)/zygote/GPU/CUDA 1377478 ns 1358738.5 ns 1.01
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/2 thread(s) 12783958 ns 13061333 ns 0.98
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/4 thread(s) 11833292 ns 12025375 ns 0.98
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/8 thread(s) 19658396.5 ns 21056084 ns 0.93
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/1 thread(s) 9760416.5 ns 10835604.5 ns 0.90
Dense(16 => 16, relu)(16 x 128)/forward/CPU/2 thread(s) 2750 ns 2666 ns 1.03
Dense(16 => 16, relu)(16 x 128)/forward/CPU/4 thread(s) 2583 ns 2459 ns 1.05
Dense(16 => 16, relu)(16 x 128)/forward/CPU/8 thread(s) 3250 ns 3541.5 ns 0.92
Dense(16 => 16, relu)(16 x 128)/forward/CPU/1 thread(s) 4771 ns 2750 ns 1.73
Dense(16 => 16, relu)(16 x 128)/forward/GPU/CUDA 24855 ns 24643 ns 1.01
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/2 thread(s) 8708 ns 8500 ns 1.02
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/4 thread(s) 8500 ns 8709 ns 0.98
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/8 thread(s) 8416 ns 8770.5 ns 0.96
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/1 thread(s) 8479.5 ns 8458 ns 1.00
Dense(16 => 16, relu)(16 x 128)/zygote/GPU/CUDA 213745.5 ns 211404.5 ns 1.01
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/2 thread(s) 16583 ns 16791 ns 0.99
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/4 thread(s) 16583 ns 16708 ns 0.99
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/8 thread(s) 16625 ns 16708 ns 1.00
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/1 thread(s) 10709 ns 10750 ns 1.00
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/2 thread(s) 11791 ns 11729 ns 1.01
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/4 thread(s) 16125 ns 14500 ns 1.11
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/8 thread(s) 11750 ns 11709 ns 1.00
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/1 thread(s) 7583 ns 7833 ns 0.97
Dense(16 => 16, gelu)(16 x 128)/forward/GPU/CUDA 24983 ns 24689 ns 1.01
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/2 thread(s) 22292 ns 22667 ns 0.98
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/4 thread(s) 22625 ns 22250 ns 1.02
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/8 thread(s) 22416.5 ns 22459 ns 1.00
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/1 thread(s) 22541 ns 22500 ns 1.00
Dense(16 => 16, gelu)(16 x 128)/zygote/GPU/CUDA 235287.5 ns 232898 ns 1.01
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/2 thread(s) 52250 ns 52291.5 ns 1.00
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/4 thread(s) 52375 ns 52500 ns 1.00
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/8 thread(s) 52417 ns 52521 ns 1.00
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/1 thread(s) 43792 ns 44000 ns 1.00
Dense(128 => 128, identity)(128 x 128)/forward/CPU/2 thread(s) 29333 ns 28750 ns 1.02
Dense(128 => 128, identity)(128 x 128)/forward/CPU/4 thread(s) 28791 ns 29334 ns 0.98
Dense(128 => 128, identity)(128 x 128)/forward/CPU/8 thread(s) 29167 ns 29208 ns 1.00
Dense(128 => 128, identity)(128 x 128)/forward/CPU/1 thread(s) 46167 ns 46916 ns 0.98
Dense(128 => 128, identity)(128 x 128)/forward/GPU/CUDA 26056 ns 25952 ns 1.00
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/2 thread(s) 209667 ns 211958.5 ns 0.99
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/4 thread(s) 257250 ns 261208 ns 0.98
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/8 thread(s) 4075916 ns 4169541.5 ns 0.98
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/1 thread(s) 147625 ns 153125 ns 0.96
Dense(128 => 128, identity)(128 x 128)/zygote/GPU/CUDA 220948.5 ns 217493 ns 1.02
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/2 thread(s) 308542 ns 317500 ns 0.97
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/4 thread(s) 282917 ns 290167 ns 0.98
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/8 thread(s) 767042 ns 796854.5 ns 0.96
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/1 thread(s) 161708 ns 161500 ns 1.00
Dense(16 => 16, identity)(16 x 128)/forward/CPU/2 thread(s) 2042 ns 1792 ns 1.14
Dense(16 => 16, identity)(16 x 128)/forward/CPU/4 thread(s) 1958 ns 1875 ns 1.04
Dense(16 => 16, identity)(16 x 128)/forward/CPU/8 thread(s) 2312.5 ns 2625 ns 0.88
Dense(16 => 16, identity)(16 x 128)/forward/CPU/1 thread(s) 1958 ns 1917 ns 1.02
Dense(16 => 16, identity)(16 x 128)/forward/GPU/CUDA 22938 ns 22908 ns 1.00
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/2 thread(s) 7375 ns 7416 ns 0.99
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/4 thread(s) 7250 ns 7208 ns 1.01
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/8 thread(s) 7625 ns 7625 ns 1
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/1 thread(s) 7250 ns 7625 ns 0.95
Dense(16 => 16, identity)(16 x 128)/zygote/GPU/CUDA 263592.5 ns 268483 ns 0.98
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/2 thread(s) 11292 ns 11250 ns 1.00
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/4 thread(s) 11458 ns 11625 ns 0.99
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/8 thread(s) 11500 ns 11542 ns 1.00
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/1 thread(s) 7000 ns 6833 ns 1.02
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s) 79852292 ns 79894667 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s) 49068812.5 ns 49133813 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s) 45007187.5 ns 44971167 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s) 151374416 ns 151617667 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/GPU/CUDA 2720111.5 ns 2714974.5 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s) 607847917 ns 472351667 ns 1.29
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s) 412172583 ns 408027541 ns 1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s) 398297875 ns 398391084 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s) 737514583.5 ns 687897666 ns 1.07
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA 14594549 ns 14607484.5 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s) 713373500 ns 686060271 ns 1.04
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s) 665302083 ns 657056541 ns 1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s) 1010864625 ns 1003771958 ns 1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s) 998393833 ns 999509292 ns 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Please sign in to comment.