Skip to content

Commit

Permalink
chore: update version for release
Browse files Browse the repository at this point in the history
  • Loading branch information
avik-pal authored Sep 23, 2024
1 parent 283db4e commit 0af6fd2
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion Project.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name = "Lux"
uuid = "b2108857-7c20-44ae-9111-449ecde12c47"
authors = ["Avik Pal <avikpal@mit.edu> and contributors"]
version = "1.0.6"
version = "1.1.0"

[deps]
ADTypes = "47edcb42-4c32-4615-8424-f2b9edc5f35b"
Expand Down

3 comments on commit 0af6fd2

@avik-pal
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JuliaRegistrator
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Registration pull request created: JuliaRegistries/General/115730

Tip: Release Notes

Did you know you can add release notes too? Just add markdown formatted text underneath the comment after the text
"Release notes:" and it will be added to the registry PR, and if TagBot is installed it will also be added to the
release that TagBot creates. i.e.

@JuliaRegistrator register

Release notes:

## Breaking changes

- blah

To add them here just re-invoke and the PR will be updated.

Tagging

After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version.

This will be done automatically if the Julia TagBot GitHub Action is installed, or can be done manually through the github interface, or via:

git tag -a v1.1.0 -m "<description of version>" 0af6fd200442c36732052c17f40186bd91ca4623
git push origin v1.1.0

@github-actions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lux Benchmarks

Benchmark suite Current: 0af6fd2 Previous: 92e8469 Ratio
Dense(512 => 512, identity)(512 x 128)/forward/CPU/2 thread(s) 414541 ns 415000 ns 1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/4 thread(s) 243125 ns 244167 ns 1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/8 thread(s) 243417 ns 243917 ns 1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/1 thread(s) 739334 ns 740083 ns 1.00
Dense(512 => 512, identity)(512 x 128)/forward/GPU/CUDA 41982 ns 43793 ns 0.96
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/2 thread(s) 1346458 ns 1280333 ns 1.05
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/4 thread(s) 1259125 ns 1268791 ns 0.99
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/8 thread(s) 16338395.5 ns 16455125 ns 0.99
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/1 thread(s) 2194459 ns 2193625.5 ns 1.00
Dense(512 => 512, identity)(512 x 128)/zygote/GPU/CUDA 189479 ns 205231 ns 0.92
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/2 thread(s) 1355250 ns 1311917 ns 1.03
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/4 thread(s) 1262583 ns 1301792 ns 0.97
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/8 thread(s) 16497166 ns 16522625 ns 1.00
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/1 thread(s) 2228292 ns 2229625 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s) 1762584 ns 1672666 ns 1.05
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s) 1092041 ns 1078166 ns 1.01
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s) 1568312.5 ns 1511041.5 ns 1.04
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s) 2959792 ns 2994458 ns 0.99
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/GPU/CUDA 205112 ns 207884 ns 0.99
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s) 12142250.5 ns 12154146 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s) 8835812 ns 8856791 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s) 9242459 ns 9297792 ns 0.99
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s) 18585833 ns 18579708 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/GPU/CUDA 1492755 ns 1492665 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s) 17309458 ns 17297396 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s) 13985458 ns 13998833 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s) 14508333.5 ns 14511000 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s) 21837437.5 ns 21839416 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s) 250564062.5 ns 250544729 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s) 148884708 ns 148581208 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s) 116487583.5 ns 116355916.5 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s) 447218375 ns 447348667 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/GPU/CUDA 5458403 ns 5449372 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s) 1224183333 ns 1226769166 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s) 933732541 ns 930331417 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s) 831622729.5 ns 829560312.5 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s) 1635560416 ns 1631272125 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA 35425612 ns 31620503.5 ns 1.12
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s) 1152095666 ns 1143568125 ns 1.01
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s) 993335854.5 ns 993275583.5 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s) 1312403208.5 ns 1332092333.5 ns 0.99
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s) 1733303125 ns 1732940916.5 ns 1.00
lenet(28, 28, 1, 32)/forward/CPU/2 thread(s) 1117750 ns 1119875 ns 1.00
lenet(28, 28, 1, 32)/forward/CPU/4 thread(s) 1598750.5 ns 1650333 ns 0.97
lenet(28, 28, 1, 32)/forward/CPU/8 thread(s) 3754208 ns 3433334 ns 1.09
lenet(28, 28, 1, 32)/forward/CPU/1 thread(s) 783083.5 ns 782354 ns 1.00
lenet(28, 28, 1, 32)/forward/GPU/CUDA 257385.5 ns 263984.5 ns 0.98
lenet(28, 28, 1, 32)/zygote/CPU/2 thread(s) 2990333 ns 2986166 ns 1.00
lenet(28, 28, 1, 32)/zygote/CPU/4 thread(s) 4119917 ns 4134521 ns 1.00
lenet(28, 28, 1, 32)/zygote/CPU/8 thread(s) 11230812 ns 9684479 ns 1.16
lenet(28, 28, 1, 32)/zygote/CPU/1 thread(s) 3197229 ns 3141166 ns 1.02
lenet(28, 28, 1, 32)/zygote/GPU/CUDA 1046508.5 ns 1099110 ns 0.95
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s) 2301708 ns 2222125 ns 1.04
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s) 1307291 ns 1310979 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s) 1562625 ns 1561042 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s) 4228312.5 ns 4207458 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/GPU/CUDA 207810 ns 208127 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s) 19397854 ns 19407062.5 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s) 16093709 ns 16092937.5 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s) 17349000 ns 17317479 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s) 25897042 ns 25877354.5 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA 1578941.5 ns 1588570 ns 0.99
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s) 33935833 ns 34283042 ns 0.99
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s) 30962375 ns 31029667 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s) 31278499.5 ns 31324334 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s) 37009375 ns 36972625 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s) 4535625.5 ns 4535728.5 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s) 2541958 ns 2550437.5 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s) 2676729 ns 2682521 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s) 8399042 ns 8376542 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/GPU/CUDA 421981 ns 420059 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s) 38713291 ns 38787729 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s) 32067250 ns 32133646 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s) 32235000 ns 32252916 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s) 52021374.5 ns 51916459 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA 2612398 ns 2624143 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s) 89340500 ns 88908791 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s) 115182063 ns 114840750 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s) 226080500 ns 227998375 ns 0.99
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s) 74787250 ns 74777479 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s) 268783625 ns 269000958 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s) 156455958 ns 156605625 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s) 123498687 ns 123282250 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s) 485358792 ns 485266417 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/GPU/CUDA 6886357 ns 7007944 ns 0.98
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s) 1473244563 ns 1477600500.5 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s) 1170669416 ns 1177860417 ns 0.99
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s) 1068236875 ns 1059255604.5 ns 1.01
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s) 2007699500 ns 2001527437.5 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA 33116264.5 ns 34509709 ns 0.96
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s) 1722112916 ns 1725457125 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s) 1526979229.5 ns 1535708771 ns 0.99
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s) 1854066584 ns 1892793750 ns 0.98
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s) 2208474333 ns 2208396292 ns 1.00
lenet(28, 28, 1, 128)/forward/CPU/2 thread(s) 2098500 ns 2072875 ns 1.01
lenet(28, 28, 1, 128)/forward/CPU/4 thread(s) 3028958 ns 3011791 ns 1.01
lenet(28, 28, 1, 128)/forward/CPU/8 thread(s) 9208917 ns 8320459 ns 1.11
lenet(28, 28, 1, 128)/forward/CPU/1 thread(s) 2506917 ns 2450499.5 ns 1.02
lenet(28, 28, 1, 128)/forward/GPU/CUDA 250413 ns 268533.5 ns 0.93
lenet(28, 28, 1, 128)/zygote/CPU/2 thread(s) 9640708.5 ns 9519292 ns 1.01
lenet(28, 28, 1, 128)/zygote/CPU/4 thread(s) 12027146 ns 12095020.5 ns 0.99
lenet(28, 28, 1, 128)/zygote/CPU/8 thread(s) 23902417 ns 24991500 ns 0.96
lenet(28, 28, 1, 128)/zygote/CPU/1 thread(s) 11729666.5 ns 11770084 ns 1.00
lenet(28, 28, 1, 128)/zygote/GPU/CUDA 1076622 ns 1173232 ns 0.92
vgg16(32, 32, 3, 32)/forward/CPU/2 thread(s) 381127000 ns 383052437.5 ns 0.99
vgg16(32, 32, 3, 32)/forward/CPU/4 thread(s) 309063896 ns 311828042 ns 0.99
vgg16(32, 32, 3, 32)/forward/CPU/8 thread(s) 262337458 ns 269993541.5 ns 0.97
vgg16(32, 32, 3, 32)/forward/CPU/1 thread(s) 453768395.5 ns 452443833.5 ns 1.00
vgg16(32, 32, 3, 32)/forward/GPU/CUDA 4853131.5 ns 4865362.5 ns 1.00
vgg16(32, 32, 3, 32)/zygote/CPU/2 thread(s) 1158643375 ns 1155538583 ns 1.00
vgg16(32, 32, 3, 32)/zygote/CPU/4 thread(s) 938969625 ns 936810083 ns 1.00
vgg16(32, 32, 3, 32)/zygote/CPU/8 thread(s) 963762041 ns 959183583 ns 1.00
vgg16(32, 32, 3, 32)/zygote/CPU/1 thread(s) 1577954500 ns 1397577000 ns 1.13
vgg16(32, 32, 3, 32)/zygote/GPU/CUDA 18263437 ns 19191910 ns 0.95
lenet(28, 28, 1, 64)/forward/CPU/2 thread(s) 1057541 ns 1053520.5 ns 1.00
lenet(28, 28, 1, 64)/forward/CPU/4 thread(s) 1646584 ns 1668459 ns 0.99
lenet(28, 28, 1, 64)/forward/CPU/8 thread(s) 5451187.5 ns 5692083 ns 0.96
lenet(28, 28, 1, 64)/forward/CPU/1 thread(s) 1369833 ns 1396104.5 ns 0.98
lenet(28, 28, 1, 64)/forward/GPU/CUDA 249867.5 ns 270444.5 ns 0.92
lenet(28, 28, 1, 64)/zygote/CPU/2 thread(s) 6519125 ns 6494584 ns 1.00
lenet(28, 28, 1, 64)/zygote/CPU/4 thread(s) 13095375 ns 13134333 ns 1.00
lenet(28, 28, 1, 64)/zygote/CPU/8 thread(s) 18952875 ns 19522667 ns 0.97
lenet(28, 28, 1, 64)/zygote/CPU/1 thread(s) 5956188 ns 6062833 ns 0.98
lenet(28, 28, 1, 64)/zygote/GPU/CUDA 1117980 ns 1205114.5 ns 0.93
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s) 70611125 ns 70593167 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s) 43793833 ns 43687500 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s) 39731937.5 ns 39756500 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s) 132592666.5 ns 132546521 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/GPU/CUDA 1854679 ns 1861025.5 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s) 356226125 ns 356256979 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s) 270572708 ns 270180000 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s) 254771104 ns 253147750 ns 1.01
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s) 534785021 ns 535028854 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/GPU/CUDA 12169343.5 ns 12303646 ns 0.99
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s) 397173792 ns 400021667 ns 0.99
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s) 374448417 ns 374059625 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s) 694189709 ns 723689958.5 ns 0.96
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s) 713602000 ns 712462250 ns 1.00
vgg16(32, 32, 3, 128)/forward/CPU/2 thread(s) 1192048542 ns 1195955667 ns 1.00
vgg16(32, 32, 3, 128)/forward/CPU/4 thread(s) 831171854.5 ns 833640041.5 ns 1.00
vgg16(32, 32, 3, 128)/forward/CPU/8 thread(s) 640669937.5 ns 641220229.5 ns 1.00
vgg16(32, 32, 3, 128)/forward/CPU/1 thread(s) 1863728709 ns 1769113729 ns 1.05
vgg16(32, 32, 3, 128)/forward/GPU/CUDA 12532816 ns 12497145 ns 1.00
vgg16(32, 32, 3, 128)/zygote/CPU/2 thread(s) 3618000479 ns 3639556520.5 ns 0.99
vgg16(32, 32, 3, 128)/zygote/CPU/4 thread(s) 2826885834 ns 2825360333 ns 1.00
vgg16(32, 32, 3, 128)/zygote/CPU/8 thread(s) 2713615750 ns 2702765709 ns 1.00
vgg16(32, 32, 3, 128)/zygote/CPU/1 thread(s) 5011045750 ns 5019640833 ns 1.00
vgg16(32, 32, 3, 128)/zygote/GPU/CUDA 50402599.5 ns 49951471 ns 1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s) 3419875 ns 3421500 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s) 2065250 ns 2074979 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s) 2533333.5 ns 2545666 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s) 6025000 ns 6030125 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/GPU/CUDA 337456.5 ns 343299 ns 0.98
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s) 26007750 ns 26132666.5 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s) 18963500 ns 19030500 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s) 19385500.5 ns 19345021 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s) 39322542 ns 39337834 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA 2456325 ns 2467033.5 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s) 54524645.5 ns 54504542 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s) 82936812.5 ns 81980333 ns 1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s) 182520645.5 ns 173279167 ns 1.05
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s) 45591687.5 ns 45606041 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s) 1779791 ns 1787396 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s) 1093417 ns 1095125 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s) 1588750 ns 1559166 ns 1.02
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s) 3034417 ns 3050791 ns 0.99
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/GPU/CUDA 208871.5 ns 213819 ns 0.98
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s) 12541750 ns 12546291 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s) 9203791.5 ns 9225062.5 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s) 9610729 ns 9642333.5 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s) 18987500 ns 19019500 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA 1509397.5 ns 1532922 ns 0.98
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s) 17660375 ns 17668667 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s) 14333917 ns 14332167 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s) 14545395.5 ns 14597000 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s) 22202979 ns 22175750.5 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s) 70525542 ns 70541417 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s) 43667833 ns 43674667 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s) 39614000 ns 39704500 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s) 132626625 ns 132649271 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/GPU/CUDA 1873199.5 ns 1938611 ns 0.97
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s) 360567208 ns 361084062.5 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s) 348361750 ns 347061583.5 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s) 305569375 ns 305013375 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s) 726844292 ns 723885708 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA 13232795 ns 13388921 ns 0.99
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s) 421270563 ns 425519667 ns 0.99
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s) 430893167 ns 427658750 ns 1.01
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s) 718848354 ns 736440729.5 ns 0.98
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s) 717686250 ns 715989083 ns 1.00
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/2 thread(s) 1569104 ns 1596542 ns 0.98
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/4 thread(s) 1153229.5 ns 1135916 ns 1.02
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/8 thread(s) 1137979 ns 1138166.5 ns 1.00
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/1 thread(s) 2453021 ns 2412708 ns 1.02
mlp7layer_bn(gelu)(32 x 256)/forward/GPU/CUDA 547553.5 ns 587435 ns 0.93
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/2 thread(s) 8870542 ns 8847312 ns 1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/4 thread(s) 13929562.5 ns 13684021 ns 1.02
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/8 thread(s) 34331208.5 ns 32863792 ns 1.04
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/1 thread(s) 9863166.5 ns 9875083 ns 1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/GPU/CUDA 1211835 ns 1416297.5 ns 0.86
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/2 thread(s) 16622562.5 ns 16549687.5 ns 1.00
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/4 thread(s) 22981042 ns 22946333.5 ns 1.00
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/8 thread(s) 47608083 ns 47499854 ns 1.00
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/1 thread(s) 13135583 ns 13135792 ns 1.00
Dense(512 => 512, relu)(512 x 128)/forward/CPU/2 thread(s) 823083.5 ns 827646 ns 0.99
Dense(512 => 512, relu)(512 x 128)/forward/CPU/4 thread(s) 519667 ns 514125 ns 1.01
Dense(512 => 512, relu)(512 x 128)/forward/CPU/8 thread(s) 1027437.5 ns 1076104 ns 0.95
Dense(512 => 512, relu)(512 x 128)/forward/CPU/1 thread(s) 725812.5 ns 725021 ns 1.00
Dense(512 => 512, relu)(512 x 128)/forward/GPU/CUDA 45006 ns 47722 ns 0.94
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/2 thread(s) 1550396 ns 1531958 ns 1.01
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/4 thread(s) 1010312 ns 1005542 ns 1.00
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/8 thread(s) 1368833 ns 1422834 ns 0.96
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/1 thread(s) 2294958 ns 2290271 ns 1.00
Dense(512 => 512, relu)(512 x 128)/zygote/GPU/CUDA 206718.5 ns 235161 ns 0.88
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/2 thread(s) 1539958 ns 1550625 ns 0.99
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/4 thread(s) 1027083 ns 1063666.5 ns 0.97
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/8 thread(s) 1434958.5 ns 1456541 ns 0.99
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/1 thread(s) 2262000 ns 2260042 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s) 3395958.5 ns 3417917 ns 0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s) 2043562.5 ns 2065041 ns 0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s) 2510645.5 ns 2482708 ns 1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s) 6010208 ns 6009500 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/GPU/CUDA 278141 ns 284432 ns 0.98
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s) 24077834 ns 24080042 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s) 17174000 ns 17195500 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s) 17077208 ns 17121125 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s) 37552104.5 ns 37501854 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/GPU/CUDA 2386026 ns 2416353 ns 0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s) 52856250.5 ns 52890167 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s) 85034770.5 ns 84990875 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s) 179760229.5 ns 173811125 ns 1.03
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s) 44607791.5 ns 44527208 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s) 250724750 ns 250510875 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s) 148549708 ns 148711500 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s) 116156104 ns 116106354 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s) 448130375 ns 447706104 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/GPU/CUDA 5468960 ns 5473947 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s) 1101806791 ns 1104910333 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s) 854968041.5 ns 852696229 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s) 832535437.5 ns 828124666.5 ns 1.01
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s) 1749893542 ns 1753883208 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/GPU/CUDA 33175899 ns 29129663 ns 1.14
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s) 1028461562.5 ns 1027987062.5 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s) 965583583 ns 967528166 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s) 1298573500 ns 1323494083.5 ns 0.98
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s) 1729774999.5 ns 1721562854.5 ns 1.00
mlp7layer_bn(relu)(32 x 256)/forward/CPU/2 thread(s) 1199187.5 ns 1199000 ns 1.00
mlp7layer_bn(relu)(32 x 256)/forward/CPU/4 thread(s) 679771 ns 722000 ns 0.94
mlp7layer_bn(relu)(32 x 256)/forward/CPU/8 thread(s) 679125 ns 723333.5 ns 0.94
mlp7layer_bn(relu)(32 x 256)/forward/CPU/1 thread(s) 1968791.5 ns 2059938 ns 0.96
mlp7layer_bn(relu)(32 x 256)/forward/GPU/CUDA 548897.5 ns 566089.5 ns 0.97
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/2 thread(s) 5868125 ns 5883354 ns 1.00
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/4 thread(s) 8863000 ns 9012521 ns 0.98
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/8 thread(s) 26024792 ns 26898459 ns 0.97
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/1 thread(s) 7119500 ns 7112042 ns 1.00
mlp7layer_bn(relu)(32 x 256)/zygote/GPU/CUDA 1215129 ns 1371381.5 ns 0.89
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/2 thread(s) 9672145.5 ns 9684083 ns 1.00
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/4 thread(s) 16106084 ns 16051250 ns 1.00
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/8 thread(s) 34274604 ns 33056542 ns 1.04
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/1 thread(s) 7600542 ns 7626499.5 ns 1.00
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/2 thread(s) 517041 ns 522916.5 ns 0.99
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/4 thread(s) 378770.5 ns 390125.5 ns 0.97
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/8 thread(s) 2622459 ns 3390917 ns 0.77
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/1 thread(s) 89167 ns 89292 ns 1.00
Dense(128 => 128, gelu)(128 x 128)/forward/GPU/CUDA 25879 ns 28324 ns 0.91
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/2 thread(s) 380083.5 ns 380812.5 ns 1.00
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/4 thread(s) 442125 ns 444875 ns 0.99
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/8 thread(s) 4454792 ns 5040083.5 ns 0.88
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/1 thread(s) 258584 ns 259041 ns 1.00
Dense(128 => 128, gelu)(128 x 128)/zygote/GPU/CUDA 187434 ns 219450.5 ns 0.85
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/2 thread(s) 411084 ns 411083 ns 1.00
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/4 thread(s) 472750 ns 475270.5 ns 0.99
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/8 thread(s) 4766688 ns 4889250 ns 0.97
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/1 thread(s) 270916 ns 271084 ns 1.00
Dense(128 => 128, relu)(128 x 128)/forward/CPU/2 thread(s) 468292 ns 465208.5 ns 1.01
Dense(128 => 128, relu)(128 x 128)/forward/CPU/4 thread(s) 318292 ns 318584 ns 1.00
Dense(128 => 128, relu)(128 x 128)/forward/CPU/8 thread(s) 735291 ns 778771 ns 0.94
Dense(128 => 128, relu)(128 x 128)/forward/CPU/1 thread(s) 52750 ns 54354.5 ns 0.97
Dense(128 => 128, relu)(128 x 128)/forward/GPU/CUDA 25952 ns 28220 ns 0.92
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/2 thread(s) 338166.5 ns 340333 ns 0.99
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/4 thread(s) 339291 ns 341958 ns 0.99
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/8 thread(s) 517208.5 ns 734125 ns 0.70
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/1 thread(s) 151625 ns 151417 ns 1.00
Dense(128 => 128, relu)(128 x 128)/zygote/GPU/CUDA 180334.5 ns 205814.5 ns 0.88
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/2 thread(s) 352187.5 ns 351792 ns 1.00
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/4 thread(s) 354833 ns 356604.5 ns 1.00
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/8 thread(s) 918834 ns 935583 ns 0.98
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/1 thread(s) 150875 ns 151000 ns 1.00
vgg16(32, 32, 3, 64)/forward/CPU/2 thread(s) 603168958 ns 606312458 ns 0.99
vgg16(32, 32, 3, 64)/forward/CPU/4 thread(s) 428255645.5 ns 430997020.5 ns 0.99
vgg16(32, 32, 3, 64)/forward/CPU/8 thread(s) 387565458 ns 382921125 ns 1.01
vgg16(32, 32, 3, 64)/forward/CPU/1 thread(s) 872582458 ns 871105000 ns 1.00
vgg16(32, 32, 3, 64)/forward/GPU/CUDA 7022166.5 ns 7038469 ns 1.00
vgg16(32, 32, 3, 64)/zygote/CPU/2 thread(s) 2000369604.5 ns 2005974042 ns 1.00
vgg16(32, 32, 3, 64)/zygote/CPU/4 thread(s) 1635627395.5 ns 1610239562.5 ns 1.02
vgg16(32, 32, 3, 64)/zygote/CPU/8 thread(s) 1619963354.5 ns 1558401520.5 ns 1.04
vgg16(32, 32, 3, 64)/zygote/CPU/1 thread(s) 2622232458 ns 2631627625 ns 1.00
vgg16(32, 32, 3, 64)/zygote/GPU/CUDA 26705505 ns 26000726 ns 1.03
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/2 thread(s) 521375 ns 539604 ns 0.97
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/4 thread(s) 396229.5 ns 396875 ns 1.00
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/8 thread(s) 2919542 ns 3106167 ns 0.94
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/1 thread(s) 865354.5 ns 866292 ns 1.00
Dense(512 => 512, gelu)(512 x 128)/forward/GPU/CUDA 45399 ns 47775 ns 0.95
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/2 thread(s) 1819958 ns 1813250 ns 1.00
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/4 thread(s) 1772584 ns 1736667 ns 1.02
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/8 thread(s) 16344771.5 ns 16480542 ns 0.99
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/1 thread(s) 2757458 ns 2648000 ns 1.04
Dense(512 => 512, gelu)(512 x 128)/zygote/GPU/CUDA 216751.5 ns 246886 ns 0.88
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/2 thread(s) 1889750 ns 1867042 ns 1.01
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/4 thread(s) 1865354 ns 1816500 ns 1.03
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/8 thread(s) 16504208.5 ns 16523458 ns 1.00
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/1 thread(s) 2783666.5 ns 2741770.5 ns 1.02
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/2 thread(s) 1349166 ns 1439604.5 ns 0.94
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/4 thread(s) 935416 ns 934625 ns 1.00
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/8 thread(s) 1011125 ns 1053375.5 ns 0.96
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/1 thread(s) 2227541 ns 2331625 ns 0.96
mlp7layer_bn(tanh)(32 x 256)/forward/GPU/CUDA 544870 ns 580680 ns 0.94
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/2 thread(s) 5903542 ns 5896895.5 ns 1.00
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/4 thread(s) 8786042 ns 8530979 ns 1.03
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/8 thread(s) 26388917 ns 26479875.5 ns 1.00
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/1 thread(s) 7336208 ns 7269958 ns 1.01
mlp7layer_bn(tanh)(32 x 256)/zygote/GPU/CUDA 1179081.5 ns 1365923.5 ns 0.86
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/2 thread(s) 11703291.5 ns 11687917 ns 1.00
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/4 thread(s) 18250333 ns 18462792 ns 0.99
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/8 thread(s) 38934750 ns 39354708.5 ns 0.99
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/1 thread(s) 9538708 ns 9551562.5 ns 1.00
Dense(16 => 16, relu)(16 x 128)/forward/CPU/2 thread(s) 2625 ns 4541.5 ns 0.58
Dense(16 => 16, relu)(16 x 128)/forward/CPU/4 thread(s) 2875 ns 3000 ns 0.96
Dense(16 => 16, relu)(16 x 128)/forward/CPU/8 thread(s) 3583 ns 3333 ns 1.08
Dense(16 => 16, relu)(16 x 128)/forward/CPU/1 thread(s) 2584 ns 4750 ns 0.54
Dense(16 => 16, relu)(16 x 128)/forward/GPU/CUDA 22526 ns 25041 ns 0.90
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/2 thread(s) 7187.5 ns 7333.5 ns 0.98
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/4 thread(s) 7334 ns 7208 ns 1.02
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/8 thread(s) 7208 ns 7187.5 ns 1.00
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/1 thread(s) 7333 ns 7208 ns 1.02
Dense(16 => 16, relu)(16 x 128)/zygote/GPU/CUDA 179733.5 ns 213760.5 ns 0.84
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/2 thread(s) 8250 ns 8500 ns 0.97
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/4 thread(s) 8208.5 ns 8333 ns 0.99
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/8 thread(s) 8500 ns 8459 ns 1.00
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/1 thread(s) 5958 ns 6167 ns 0.97
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/2 thread(s) 10542 ns 10375 ns 1.02
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/4 thread(s) 13063 ns 13833 ns 0.94
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/8 thread(s) 10625 ns 11229.5 ns 0.95
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/1 thread(s) 7208 ns 9250 ns 0.78
Dense(16 => 16, gelu)(16 x 128)/forward/GPU/CUDA 22650 ns 25667 ns 0.88
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/2 thread(s) 19917 ns 20041 ns 0.99
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/4 thread(s) 19750 ns 19917 ns 0.99
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/8 thread(s) 19958 ns 20083 ns 0.99
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/1 thread(s) 20042 ns 19584 ns 1.02
Dense(16 => 16, gelu)(16 x 128)/zygote/GPU/CUDA 194280 ns 233795.5 ns 0.83
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/2 thread(s) 23500 ns 23833 ns 0.99
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/4 thread(s) 23458 ns 23541.5 ns 1.00
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/8 thread(s) 23750 ns 23750 ns 1
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/1 thread(s) 21250 ns 21333 ns 1.00
Dense(128 => 128, identity)(128 x 128)/forward/CPU/2 thread(s) 28958 ns 28542 ns 1.01
Dense(128 => 128, identity)(128 x 128)/forward/CPU/4 thread(s) 28334 ns 28542 ns 0.99
Dense(128 => 128, identity)(128 x 128)/forward/CPU/8 thread(s) 28459 ns 28750 ns 0.99
Dense(128 => 128, identity)(128 x 128)/forward/CPU/1 thread(s) 46209 ns 46083 ns 1.00
Dense(128 => 128, identity)(128 x 128)/forward/GPU/CUDA 24228 ns 26413 ns 0.92
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/2 thread(s) 229125 ns 227625 ns 1.01
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/4 thread(s) 279750 ns 277333 ns 1.01
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/8 thread(s) 4047542 ns 3752584 ns 1.08
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/1 thread(s) 145125 ns 145792 ns 1.00
Dense(128 => 128, identity)(128 x 128)/zygote/GPU/CUDA 188066 ns 215287 ns 0.87
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/2 thread(s) 246875 ns 246083 ns 1.00
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/4 thread(s) 297500 ns 294959 ns 1.01
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/8 thread(s) 4136041 ns 4140167 ns 1.00
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/1 thread(s) 145375 ns 145458 ns 1.00
Dense(16 => 16, identity)(16 x 128)/forward/CPU/2 thread(s) 2000 ns 3875 ns 0.52
Dense(16 => 16, identity)(16 x 128)/forward/CPU/4 thread(s) 2000 ns 1792 ns 1.12
Dense(16 => 16, identity)(16 x 128)/forward/CPU/8 thread(s) 2604.5 ns 2291.5 ns 1.14
Dense(16 => 16, identity)(16 x 128)/forward/CPU/1 thread(s) 1833 ns 1958 ns 0.94
Dense(16 => 16, identity)(16 x 128)/forward/GPU/CUDA 21214 ns 23326 ns 0.91
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/2 thread(s) 5083.5 ns 5333 ns 0.95
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/4 thread(s) 5333 ns 5125 ns 1.04
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/8 thread(s) 5167 ns 5250 ns 0.98
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/1 thread(s) 5375 ns 5125 ns 1.05
Dense(16 => 16, identity)(16 x 128)/zygote/GPU/CUDA 230179 ns 246332 ns 0.93
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/2 thread(s) 7375 ns 7625 ns 0.97
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/4 thread(s) 7416 ns 7416 ns 1
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/8 thread(s) 7500 ns 7770.5 ns 0.97
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/1 thread(s) 5125 ns 5250 ns 0.98
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s) 80141959 ns 80124625 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s) 47888229 ns 47921000 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s) 43255041.5 ns 43331166.5 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s) 151557417 ns 151470167 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/GPU/CUDA 2714800 ns 2687344 ns 1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s) 606608917 ns 672319791 ns 0.90
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s) 409612958 ns 413871833 ns 0.99
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s) 397283458.5 ns 397456333.5 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s) 684454375 ns 687252833 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA 17052645 ns 14598552.5 ns 1.17
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s) 715895500 ns 695248479.5 ns 1.03
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s) 683149083 ns 677318208 ns 1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s) 989952709 ns 996212291 ns 0.99
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s) 998157291 ns 997847458 ns 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Please sign in to comment.