Skip to content

Commit

Permalink
chore: bump compat for JLD2 to 0.5 for package ImageNet, (keep existi…
Browse files Browse the repository at this point in the history
…ng compat) (#886)

Co-authored-by: CompatHelper Julia <compathelper_noreply@julialang.org>
  • Loading branch information
github-actions[bot] and CompatHelper Julia authored Sep 9, 2024
1 parent 358b4df commit d381969
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions examples/ImageNet/Project.toml
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
[deps]
AMDGPU = "21141c5a-9bdb-4563-92ae-f87d6854732e"
Augmentor = "02898b10-1f73-11ea-317c-6393d7073e15"
Boltz = "4544d5e4-abc5-4dea-817f-29e4c205d9c8"
Configurations = "5218b696-f38b-4ac9-8b61-a12ec717816d"
Expand All @@ -11,7 +12,6 @@ Images = "916415d5-f1e6-5110-898d-aaa5f9f070e0"
JLD2 = "033835bb-8acc-5ee8-8aae-3f567f8a3819"
JpegTurbo = "b835a17e-a41a-41e7-81f0-2f016b05efe0"
Lux = "b2108857-7c20-44ae-9111-449ecde12c47"
AMDGPU = "21141c5a-9bdb-4563-92ae-f87d6854732e"
LuxCUDA = "d0bbae9a-e099-4d5b-a835-1c6931763bda"
MLUtils = "f1d291b0-491e-4a28-83b9-f70985020b54"
MPI = "da04e1cc-30fd-572f-bb4f-1f8673147195"
Expand All @@ -36,7 +36,7 @@ FileIO = "1.16"
Format = "1.3"
Functors = "0.4"
Images = "0.26"
JLD2 = "0.4.46"
JLD2 = "0.4.46, 0.5"
JpegTurbo = "0.1"
Lux = "1"
LuxCUDA = "0.3"
Expand Down

1 comment on commit d381969

@github-actions
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lux Benchmarks

Benchmark suite Current: d381969 Previous: cf99f8b Ratio
Dense(512 => 512, identity)(512 x 128)/forward/CPU/2 thread(s) 411750 ns 409792 ns 1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/4 thread(s) 322541 ns 322250 ns 1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/8 thread(s) 323000 ns 243583 ns 1.33
Dense(512 => 512, identity)(512 x 128)/forward/CPU/1 thread(s) 741125 ns 739625 ns 1.00
Dense(512 => 512, identity)(512 x 128)/forward/GPU/CUDA 43519 ns 44053 ns 0.99
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/2 thread(s) 1352646 ns 1353834 ns 1.00
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/4 thread(s) 2410520.5 ns 2426458 ns 0.99
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/8 thread(s) 14567000 ns 16512459 ns 0.88
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/1 thread(s) 2198750 ns 2191083.5 ns 1.00
Dense(512 => 512, identity)(512 x 128)/zygote/GPU/CUDA 207751 ns 209370 ns 0.99
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/2 thread(s) 1399542 ns 1454375 ns 0.96
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/4 thread(s) 873625 ns 908458 ns 0.96
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/8 thread(s) 1692500 ns 1834875 ns 0.92
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/1 thread(s) 2212958 ns 2240458.5 ns 0.99
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s) 1771458.5 ns 1748562.5 ns 1.01
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s) 1103187.5 ns 1089395.5 ns 1.01
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s) 1508541 ns 1512729 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s) 2942874.5 ns 3013750 ns 0.98
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/GPU/CUDA 209185 ns 208817.5 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s) 12143291.5 ns 12152041.5 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s) 8815750 ns 8814875 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s) 9216667 ns 9198917 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s) 18607583.5 ns 18613479 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/GPU/CUDA 1485488 ns 1488013.5 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s) 17281166 ns 17304750 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s) 13953458 ns 13952770.5 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s) 14500417 ns 14533958 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s) 21857458 ns 21843833.5 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s) 250208625 ns 250399541.5 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s) 148288167 ns 148350083 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s) 116176854 ns 117130083 ns 0.99
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s) 447713791 ns 450838083 ns 0.99
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/GPU/CUDA 5477076 ns 5478039 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s) 1219737541 ns 1223340875 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s) 929484250 ns 931640292 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s) 831918520.5 ns 831594354.5 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s) 1635891459 ns 1647325416 ns 0.99
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA 31213208 ns 31506744.5 ns 0.99
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s) 1133213417 ns 1144335875 ns 0.99
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s) 980873625 ns 995382583.5 ns 0.99
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s) 1340231687.5 ns 1322398292 ns 1.01
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s) 1731769583 ns 1739450208 ns 1.00
lenet(28, 28, 1, 32)/forward/CPU/2 thread(s) 1090875 ns 1068417 ns 1.02
lenet(28, 28, 1, 32)/forward/CPU/4 thread(s) 1577729 ns 1603458.5 ns 0.98
lenet(28, 28, 1, 32)/forward/CPU/8 thread(s) 4079000 ns 3760063 ns 1.08
lenet(28, 28, 1, 32)/forward/CPU/1 thread(s) 780666 ns 782062 ns 1.00
lenet(28, 28, 1, 32)/forward/GPU/CUDA 260126.5 ns 261189.5 ns 1.00
lenet(28, 28, 1, 32)/zygote/CPU/2 thread(s) 2947896 ns 3001979 ns 0.98
lenet(28, 28, 1, 32)/zygote/CPU/4 thread(s) 4119187.5 ns 4127958 ns 1.00
lenet(28, 28, 1, 32)/zygote/CPU/8 thread(s) 10901042 ns 10894833 ns 1.00
lenet(28, 28, 1, 32)/zygote/CPU/1 thread(s) 3160729 ns 3233270.5 ns 0.98
lenet(28, 28, 1, 32)/zygote/GPU/CUDA 1100430 ns 1128601 ns 0.98
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s) 2330791.5 ns 2312312.5 ns 1.01
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s) 1427875 ns 1427541.5 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s) 1672479 ns 1552396 ns 1.08
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s) 4208291.5 ns 4205417 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/GPU/CUDA 207691 ns 207575 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s) 19390833 ns 19386792 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s) 16081417 ns 16057458 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s) 17319916.5 ns 17256291 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s) 25940458 ns 25860208 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA 1590182 ns 1590086 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s) 34308604.5 ns 34375666 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s) 30637541.5 ns 30899458.5 ns 0.99
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s) 31121708 ns 31158000 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s) 36558542 ns 36246917 ns 1.01
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s) 4536583.5 ns 4546167 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s) 2768292 ns 2772584 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s) 2919709 ns 2682438 ns 1.09
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s) 8393437.5 ns 8378667 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/GPU/CUDA 427507 ns 420456 ns 1.02
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s) 38925624.5 ns 38885979.5 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s) 32072729 ns 32074313 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s) 32202792 ns 32239667 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s) 52013208 ns 51823708 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA 2621720 ns 2618884 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s) 81565208 ns 82643500 ns 0.99
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s) 112402208 ns 112560458 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s) 218212166 ns 185039874.5 ns 1.18
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s) 74389500 ns 73747708 ns 1.01
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s) 267897375 ns 268204791.5 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s) 159400292 ns 159374708 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s) 127046375 ns 123950416.5 ns 1.02
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s) 485507209 ns 485039833 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/GPU/CUDA 6981458 ns 7043693 ns 0.99
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s) 1471695708.5 ns 1468109979 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s) 1169932458 ns 1174089583 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s) 1072471604 ns 1065212458.5 ns 1.01
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s) 2003667895.5 ns 2013851104.5 ns 0.99
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA 34727316 ns 34531403 ns 1.01
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s) 1695553417 ns 1695591000 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s) 1467223249.5 ns 1493306146 ns 0.98
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s) 1864132334 ns 1801755584 ns 1.03
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s) 2207761250 ns 2201440812.5 ns 1.00
lenet(28, 28, 1, 128)/forward/CPU/2 thread(s) 1831895.5 ns 1806792 ns 1.01
lenet(28, 28, 1, 128)/forward/CPU/4 thread(s) 2554979 ns 2531562 ns 1.01
lenet(28, 28, 1, 128)/forward/CPU/8 thread(s) 7371125 ns 7672666 ns 0.96
lenet(28, 28, 1, 128)/forward/CPU/1 thread(s) 2471937.5 ns 2462833 ns 1.00
lenet(28, 28, 1, 128)/forward/GPU/CUDA 275322 ns 266951 ns 1.03
lenet(28, 28, 1, 128)/zygote/CPU/2 thread(s) 9374562.5 ns 9343333 ns 1.00
lenet(28, 28, 1, 128)/zygote/CPU/4 thread(s) 11397062.5 ns 11495750 ns 0.99
lenet(28, 28, 1, 128)/zygote/CPU/8 thread(s) 25149625 ns 26058854.5 ns 0.97
lenet(28, 28, 1, 128)/zygote/CPU/1 thread(s) 11776917 ns 11770625 ns 1.00
lenet(28, 28, 1, 128)/zygote/GPU/CUDA 1196044.5 ns 1165407 ns 1.03
vgg16(32, 32, 3, 32)/forward/CPU/2 thread(s) 380496500 ns 379821291 ns 1.00
vgg16(32, 32, 3, 32)/forward/CPU/4 thread(s) 283000250 ns 284431333.5 ns 0.99
vgg16(32, 32, 3, 32)/forward/CPU/8 thread(s) 248274541.5 ns 276993833.5 ns 0.90
vgg16(32, 32, 3, 32)/forward/CPU/1 thread(s) 453217104.5 ns 453499125 ns 1.00
vgg16(32, 32, 3, 32)/forward/GPU/CUDA 4853395.5 ns 4933427 ns 0.98
vgg16(32, 32, 3, 32)/zygote/CPU/2 thread(s) 1146433292 ns 1154735042 ns 0.99
vgg16(32, 32, 3, 32)/zygote/CPU/4 thread(s) 939675875 ns 934566458 ns 1.01
vgg16(32, 32, 3, 32)/zygote/CPU/8 thread(s) 910277166 ns 1022641417 ns 0.89
vgg16(32, 32, 3, 32)/zygote/CPU/1 thread(s) 1399365083 ns 1392634541 ns 1.00
vgg16(32, 32, 3, 32)/zygote/GPU/CUDA 17839841 ns 18839648 ns 0.95
lenet(28, 28, 1, 64)/forward/CPU/2 thread(s) 1050625 ns 1047667 ns 1.00
lenet(28, 28, 1, 64)/forward/CPU/4 thread(s) 1869750 ns 1906208 ns 0.98
lenet(28, 28, 1, 64)/forward/CPU/8 thread(s) 5510375 ns 6506020.5 ns 0.85
lenet(28, 28, 1, 64)/forward/CPU/1 thread(s) 1400416.5 ns 1385270.5 ns 1.01
lenet(28, 28, 1, 64)/forward/GPU/CUDA 275552.5 ns 268224 ns 1.03
lenet(28, 28, 1, 64)/zygote/CPU/2 thread(s) 6502291.5 ns 6461437 ns 1.01
lenet(28, 28, 1, 64)/zygote/CPU/4 thread(s) 13774167 ns 13802959 ns 1.00
lenet(28, 28, 1, 64)/zygote/CPU/8 thread(s) 21393292 ns 21722625 ns 0.98
lenet(28, 28, 1, 64)/zygote/CPU/1 thread(s) 6034209 ns 6091083 ns 0.99
lenet(28, 28, 1, 64)/zygote/GPU/CUDA 1255073 ns 1208321 ns 1.04
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s) 70518437 ns 70468396 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s) 43453209 ns 43613625 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s) 39518750 ns 39889875 ns 0.99
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s) 132645125 ns 132854895.5 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/GPU/CUDA 1884124 ns 1872456 ns 1.01
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s) 355363583 ns 355307875 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s) 269480917 ns 270273125 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s) 254468500.5 ns 254197770.5 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s) 535126104 ns 534390229.5 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/GPU/CUDA 12296923 ns 12309296.5 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s) 395233833 ns 395284167 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s) 392713625 ns 394804354.5 ns 0.99
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s) 727646187 ns 701196333.5 ns 1.04
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s) 710088708 ns 711179875 ns 1.00
vgg16(32, 32, 3, 128)/forward/CPU/2 thread(s) 1187138125 ns 1186639833 ns 1.00
vgg16(32, 32, 3, 128)/forward/CPU/4 thread(s) 689654395.5 ns 689274542 ns 1.00
vgg16(32, 32, 3, 128)/forward/CPU/8 thread(s) 631849125 ns 640237249.5 ns 0.99
vgg16(32, 32, 3, 128)/forward/CPU/1 thread(s) 1771680916.5 ns 1775678646 ns 1.00
vgg16(32, 32, 3, 128)/forward/GPU/CUDA 12312221 ns 12314528 ns 1.00
vgg16(32, 32, 3, 128)/zygote/CPU/2 thread(s) 3672801604 ns 3680556646 ns 1.00
vgg16(32, 32, 3, 128)/zygote/CPU/4 thread(s) 2818472834 ns 2857162417 ns 0.99
vgg16(32, 32, 3, 128)/zygote/CPU/8 thread(s) 2766537959 ns 2854405625 ns 0.97
vgg16(32, 32, 3, 128)/zygote/CPU/1 thread(s) 5065721375 ns 5145784083 ns 0.98
vgg16(32, 32, 3, 128)/zygote/GPU/CUDA 49571605.5 ns 49808957 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s) 3419354 ns 3409479 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s) 2060937.5 ns 2065084 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s) 2506603.5 ns 2479917 ns 1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s) 6010687.5 ns 6015479 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/GPU/CUDA 342327.5 ns 341120 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s) 25984833 ns 25925021 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s) 18848354 ns 18915667 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s) 19326937.5 ns 19134125.5 ns 1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s) 39348166 ns 39216437.5 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA 2475985 ns 2468869 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s) 55415292 ns 55378250 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s) 81482791 ns 81111916 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s) 171451083 ns 174313958.5 ns 0.98
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s) 45362833 ns 45500125 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s) 1776500 ns 1779417 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s) 1083459 ns 1092250 ns 0.99
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s) 1560333 ns 1547583 ns 1.01
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s) 3034625 ns 3037625 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/GPU/CUDA 213929 ns 212275 ns 1.01
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s) 12513666 ns 12533437.5 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s) 9181542 ns 9199000 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s) 9611229.5 ns 9578167 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s) 19017125 ns 18975812.5 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA 1536683 ns 1533549 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s) 17622541 ns 17619125 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s) 14289271 ns 14239459 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s) 14490562 ns 14500521 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s) 22192084 ns 22180250 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s) 70486875 ns 70496583.5 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s) 43536166 ns 43594834 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s) 39571167 ns 39807625 ns 0.99
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s) 132732875.5 ns 132718979 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/GPU/CUDA 1880179 ns 1947710 ns 0.97
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s) 360038375 ns 360073791 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s) 344199854 ns 345868042 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s) 304988917 ns 302741792 ns 1.01
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s) 722793084 ns 725319167 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA 13374232 ns 13371028 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s) 417916708 ns 419555417 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s) 427335417 ns 418148437.5 ns 1.02
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s) 724586812.5 ns 710077458.5 ns 1.02
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s) 714886416 ns 715636334 ns 1.00
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/2 thread(s) 1666375 ns 1661042 ns 1.00
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/4 thread(s) 1345812.5 ns 1277792 ns 1.05
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/8 thread(s) 1349500 ns 1134813 ns 1.19
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/1 thread(s) 2306688 ns 2433292 ns 0.95
mlp7layer_bn(gelu)(32 x 256)/forward/GPU/CUDA 549983 ns 584506.5 ns 0.94
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/2 thread(s) 8943250 ns 9020542 ns 0.99
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/4 thread(s) 12857458 ns 12869000 ns 1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/8 thread(s) 30380396 ns 32651417 ns 0.93
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/1 thread(s) 9862708 ns 9805792 ns 1.01
mlp7layer_bn(gelu)(32 x 256)/zygote/GPU/CUDA 1482094 ns 1428291 ns 1.04
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/2 thread(s) 17149708.5 ns 18111583 ns 0.95
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/4 thread(s) 17069292 ns 17253354 ns 0.99
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/8 thread(s) 30182000 ns 26535354 ns 1.14
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/1 thread(s) 14403750.5 ns 14356792 ns 1.00
Dense(512 => 512, relu)(512 x 128)/forward/CPU/2 thread(s) 673250.5 ns 710208 ns 0.95
Dense(512 => 512, relu)(512 x 128)/forward/CPU/4 thread(s) 534416 ns 599312.5 ns 0.89
Dense(512 => 512, relu)(512 x 128)/forward/CPU/8 thread(s) 1039625 ns 912395.5 ns 1.14
Dense(512 => 512, relu)(512 x 128)/forward/CPU/1 thread(s) 726250 ns 725791 ns 1.00
Dense(512 => 512, relu)(512 x 128)/forward/GPU/CUDA 48750 ns 47816 ns 1.02
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/2 thread(s) 1562021 ns 1582187.5 ns 0.99
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/4 thread(s) 1008583.5 ns 973833 ns 1.04
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/8 thread(s) 1379479 ns 1835187.5 ns 0.75
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/1 thread(s) 2291979 ns 2183125 ns 1.05
Dense(512 => 512, relu)(512 x 128)/zygote/GPU/CUDA 242615.5 ns 236731.5 ns 1.02
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/2 thread(s) 1558791.5 ns 1600083 ns 0.97
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/4 thread(s) 1056625 ns 1053041.5 ns 1.00
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/8 thread(s) 1418667 ns 1388771 ns 1.02
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/1 thread(s) 2225042 ns 2256062 ns 0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s) 3409916 ns 3409541.5 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s) 2058979 ns 2060229 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s) 2482708 ns 2482875 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s) 6018000 ns 5998167 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/GPU/CUDA 286378 ns 286197 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s) 24041250 ns 24038625 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s) 17174104 ns 17258666.5 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s) 17053959 ns 17123396 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s) 37566250.5 ns 37487104 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/GPU/CUDA 2404563 ns 2409477.5 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s) 53636896 ns 54679729 ns 0.98
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s) 81399459 ns 84538542 ns 0.96
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s) 168265249.5 ns 157339000 ns 1.07
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s) 44475917 ns 44498708 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s) 249926041.5 ns 250028813 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s) 148262292 ns 147930708 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s) 116062417 ns 116617291 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s) 448317354.5 ns 454228375 ns 0.99
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/GPU/CUDA 5340767 ns 5443404 ns 0.98
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s) 1102986708 ns 1101896208 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s) 854720791.5 ns 855324125.5 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s) 827927812 ns 839930250.5 ns 0.99
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s) 1754387958 ns 1774005666 ns 0.99
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/GPU/CUDA 28314924 ns 29278014 ns 0.97
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s) 1008984916.5 ns 1013677520.5 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s) 945336417 ns 922761000 ns 1.02
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s) 1185625042 ns 1320593542 ns 0.90
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s) 1728502750 ns 1744904771 ns 0.99
mlp7layer_bn(relu)(32 x 256)/forward/CPU/2 thread(s) 1263833 ns 1230812.5 ns 1.03
mlp7layer_bn(relu)(32 x 256)/forward/CPU/4 thread(s) 905542 ns 967417 ns 0.94
mlp7layer_bn(relu)(32 x 256)/forward/CPU/8 thread(s) 960083 ns 669125 ns 1.43
mlp7layer_bn(relu)(32 x 256)/forward/CPU/1 thread(s) 1944125 ns 2028541 ns 0.96
mlp7layer_bn(relu)(32 x 256)/forward/GPU/CUDA 575792 ns 558507.5 ns 1.03
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/2 thread(s) 5733792 ns 6006292 ns 0.95
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/4 thread(s) 6291604.5 ns 6899417 ns 0.91
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/8 thread(s) 24260958 ns 25958937 ns 0.93
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/1 thread(s) 7123042 ns 7102312 ns 1.00
mlp7layer_bn(relu)(32 x 256)/zygote/GPU/CUDA 1397495 ns 1368625 ns 1.02
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/2 thread(s) 10348333 ns 10886750 ns 0.95
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/4 thread(s) 9719437.5 ns 9389042 ns 1.04
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/8 thread(s) 16565333.5 ns 17293854.5 ns 0.96
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/1 thread(s) 8814042 ns 7443459 ns 1.18
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/2 thread(s) 402292 ns 352104 ns 1.14
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/4 thread(s) 413417 ns 409416.5 ns 1.01
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/8 thread(s) 2055375 ns 3455917 ns 0.59
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/1 thread(s) 88333 ns 88750 ns 1.00
Dense(128 => 128, gelu)(128 x 128)/forward/GPU/CUDA 28130 ns 27682 ns 1.02
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/2 thread(s) 349375 ns 392604 ns 0.89
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/4 thread(s) 401500 ns 399000 ns 1.01
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/8 thread(s) 4847312.5 ns 4557125 ns 1.06
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/1 thread(s) 258625 ns 258875 ns 1.00
Dense(128 => 128, gelu)(128 x 128)/zygote/GPU/CUDA 226757.5 ns 221053 ns 1.03
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/2 thread(s) 380917 ns 422125 ns 0.90
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/4 thread(s) 431667 ns 429208 ns 1.01
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/8 thread(s) 4832875 ns 4755354 ns 1.02
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/1 thread(s) 271250 ns 270916 ns 1.00
Dense(128 => 128, relu)(128 x 128)/forward/CPU/2 thread(s) 344833 ns 305104 ns 1.13
Dense(128 => 128, relu)(128 x 128)/forward/CPU/4 thread(s) 351917 ns 348458 ns 1.01
Dense(128 => 128, relu)(128 x 128)/forward/CPU/8 thread(s) 596333.5 ns 635625 ns 0.94
Dense(128 => 128, relu)(128 x 128)/forward/CPU/1 thread(s) 53083 ns 54250 ns 0.98
Dense(128 => 128, relu)(128 x 128)/forward/GPU/CUDA 28544 ns 27950 ns 1.02
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/2 thread(s) 297000 ns 355958 ns 0.83
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/4 thread(s) 275291.5 ns 274500 ns 1.00
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/8 thread(s) 721584 ns 753208.5 ns 0.96
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/1 thread(s) 151833 ns 151667 ns 1.00
Dense(128 => 128, relu)(128 x 128)/zygote/GPU/CUDA 211261.5 ns 205458.5 ns 1.03
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/2 thread(s) 308750 ns 372292 ns 0.83
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/4 thread(s) 291083 ns 288521 ns 1.01
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/8 thread(s) 599833 ns 798979 ns 0.75
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/1 thread(s) 151084 ns 150792 ns 1.00
vgg16(32, 32, 3, 64)/forward/CPU/2 thread(s) 602960542 ns 602253459 ns 1.00
vgg16(32, 32, 3, 64)/forward/CPU/4 thread(s) 427926645.5 ns 430857604 ns 0.99
vgg16(32, 32, 3, 64)/forward/CPU/8 thread(s) 374874229 ns 392009125 ns 0.96
vgg16(32, 32, 3, 64)/forward/CPU/1 thread(s) 872331042 ns 877215958 ns 0.99
vgg16(32, 32, 3, 64)/forward/GPU/CUDA 7026315 ns 7028016 ns 1.00
vgg16(32, 32, 3, 64)/zygote/CPU/2 thread(s) 2010620020.5 ns 1996302145.5 ns 1.01
vgg16(32, 32, 3, 64)/zygote/CPU/4 thread(s) 1601555416.5 ns 1609994521 ns 0.99
vgg16(32, 32, 3, 64)/zygote/CPU/8 thread(s) 1543795499.5 ns 1565616166.5 ns 0.99
vgg16(32, 32, 3, 64)/zygote/CPU/1 thread(s) 2638535458 ns 2641861333 ns 1.00
vgg16(32, 32, 3, 64)/zygote/GPU/CUDA 25732530.5 ns 25992958 ns 0.99
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/2 thread(s) 521479.5 ns 536791.5 ns 0.97
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/4 thread(s) 436917 ns 435250 ns 1.00
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/8 thread(s) 2359583.5 ns 2792250 ns 0.85
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/1 thread(s) 883459 ns 865125 ns 1.02
Dense(512 => 512, gelu)(512 x 128)/forward/GPU/CUDA 48099 ns 47701 ns 1.01
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/2 thread(s) 1859625 ns 1900167 ns 0.98
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/4 thread(s) 2566167 ns 2798208 ns 0.92
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/8 thread(s) 14721812.5 ns 16325500 ns 0.90
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/1 thread(s) 2772708 ns 2771604 ns 1.00
Dense(512 => 512, gelu)(512 x 128)/zygote/GPU/CUDA 258195.5 ns 248374 ns 1.04
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/2 thread(s) 1930708 ns 1976729 ns 0.98
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/4 thread(s) 5055042 ns 5051583 ns 1.00
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/8 thread(s) 14919271 ns 16501146 ns 0.90
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/1 thread(s) 2790500 ns 2698083.5 ns 1.03
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/2 thread(s) 1571728.5 ns 1614854 ns 0.97
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/4 thread(s) 1175187.5 ns 1236833 ns 0.95
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/8 thread(s) 1199667 ns 1069583 ns 1.12
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/1 thread(s) 2372709 ns 2226209 ns 1.07
mlp7layer_bn(tanh)(32 x 256)/forward/GPU/CUDA 547236.5 ns 577670 ns 0.95
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/2 thread(s) 6028333 ns 5930562.5 ns 1.02
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/4 thread(s) 4710562.5 ns 6880833 ns 0.68
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/8 thread(s) 24475354 ns 26135520.5 ns 0.94
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/1 thread(s) 7087875 ns 7284792 ns 0.97
mlp7layer_bn(tanh)(32 x 256)/zygote/GPU/CUDA 1396100 ns 1356112 ns 1.03
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/2 thread(s) 12455375 ns 12782291 ns 0.97
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/4 thread(s) 12268625 ns 11955834 ns 1.03
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/8 thread(s) 21774562.5 ns 21105833.5 ns 1.03
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/1 thread(s) 10955542 ns 10667312.5 ns 1.03
Dense(16 => 16, relu)(16 x 128)/forward/CPU/2 thread(s) 2375 ns 2334 ns 1.02
Dense(16 => 16, relu)(16 x 128)/forward/CPU/4 thread(s) 4562.5 ns 4792 ns 0.95
Dense(16 => 16, relu)(16 x 128)/forward/CPU/8 thread(s) 3000 ns 3625 ns 0.83
Dense(16 => 16, relu)(16 x 128)/forward/CPU/1 thread(s) 2667 ns 2375 ns 1.12
Dense(16 => 16, relu)(16 x 128)/forward/GPU/CUDA 25691 ns 24681 ns 1.04
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/2 thread(s) 7083 ns 7333 ns 0.97
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/4 thread(s) 7291 ns 7250 ns 1.01
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/8 thread(s) 7416 ns 7167 ns 1.03
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/1 thread(s) 7083 ns 7291 ns 0.97
Dense(16 => 16, relu)(16 x 128)/zygote/GPU/CUDA 217637.5 ns 209372.5 ns 1.04
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/2 thread(s) 8083 ns 8333 ns 0.97
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/4 thread(s) 8291 ns 8292 ns 1.00
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/8 thread(s) 8667 ns 8500 ns 1.02
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/1 thread(s) 6042 ns 6000 ns 1.01
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/2 thread(s) 10396 ns 10312.5 ns 1.01
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/4 thread(s) 12625 ns 14125 ns 0.89
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/8 thread(s) 10667 ns 10687.5 ns 1.00
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/1 thread(s) 9562.5 ns 7167 ns 1.33
Dense(16 => 16, gelu)(16 x 128)/forward/GPU/CUDA 25234.5 ns 24485 ns 1.03
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/2 thread(s) 19834 ns 19958 ns 0.99
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/4 thread(s) 20167 ns 20041.5 ns 1.01
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/8 thread(s) 20167 ns 19833 ns 1.02
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/1 thread(s) 20125 ns 20000 ns 1.01
Dense(16 => 16, gelu)(16 x 128)/zygote/GPU/CUDA 238032.5 ns 229359 ns 1.04
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/2 thread(s) 23292 ns 23395.5 ns 1.00
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/4 thread(s) 23667 ns 23750 ns 1.00
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/8 thread(s) 23750 ns 23542 ns 1.01
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/1 thread(s) 21416 ns 21333 ns 1.00
Dense(128 => 128, identity)(128 x 128)/forward/CPU/2 thread(s) 28792 ns 28875 ns 1.00
Dense(128 => 128, identity)(128 x 128)/forward/CPU/4 thread(s) 29167 ns 28750 ns 1.01
Dense(128 => 128, identity)(128 x 128)/forward/CPU/8 thread(s) 28896 ns 29083 ns 0.99
Dense(128 => 128, identity)(128 x 128)/forward/CPU/1 thread(s) 46917 ns 46041 ns 1.02
Dense(128 => 128, identity)(128 x 128)/forward/GPU/CUDA 26423 ns 25546 ns 1.03
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/2 thread(s) 233583 ns 221812.5 ns 1.05
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/4 thread(s) 286187.5 ns 279708 ns 1.02
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/8 thread(s) 4211271 ns 4417417 ns 0.95
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/1 thread(s) 145625 ns 145625 ns 1
Dense(128 => 128, identity)(128 x 128)/zygote/GPU/CUDA 219292 ns 211875.5 ns 1.04
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/2 thread(s) 347958 ns 332875 ns 1.05
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/4 thread(s) 331959 ns 321125 ns 1.03
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/8 thread(s) 870041 ns 562312.5 ns 1.55
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/1 thread(s) 168125 ns 161625 ns 1.04
Dense(16 => 16, identity)(16 x 128)/forward/CPU/2 thread(s) 1875 ns 2083 ns 0.90
Dense(16 => 16, identity)(16 x 128)/forward/CPU/4 thread(s) 2750 ns 2125 ns 1.29
Dense(16 => 16, identity)(16 x 128)/forward/CPU/8 thread(s) 2417 ns 3875 ns 0.62
Dense(16 => 16, identity)(16 x 128)/forward/CPU/1 thread(s) 2208.5 ns 1709 ns 1.29
Dense(16 => 16, identity)(16 x 128)/forward/GPU/CUDA 23633 ns 22559 ns 1.05
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/2 thread(s) 5125 ns 5334 ns 0.96
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/4 thread(s) 5166 ns 5437.5 ns 0.95
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/8 thread(s) 5500 ns 5458 ns 1.01
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/1 thread(s) 5334 ns 5417 ns 0.98
Dense(16 => 16, identity)(16 x 128)/zygote/GPU/CUDA 265711 ns 254509.5 ns 1.04
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/2 thread(s) 11167 ns 11708 ns 0.95
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/4 thread(s) 11291.5 ns 11416 ns 0.99
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/8 thread(s) 11458 ns 11416 ns 1.00
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/1 thread(s) 6917 ns 6750 ns 1.02
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s) 79845791 ns 79881458 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s) 49133292 ns 49107667 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s) 45028500 ns 43180145.5 ns 1.04
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s) 151579083 ns 151771375 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/GPU/CUDA 2681579.5 ns 2680326.5 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s) 605176584 ns 662703292 ns 0.91
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s) 410334917 ns 414205958 ns 0.99
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s) 400305958 ns 397227958 ns 1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s) 683219833 ns 688889667 ns 0.99
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA 14579057 ns 14602708 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s) 711709958.5 ns 715248166.5 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s) 670438250 ns 686640708 ns 0.98
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s) 1044946792 ns 1044047896 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s) 997944375 ns 994524042 ns 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Please sign in to comment.