Skip to content

Commit

Permalink
chore: bump compat for DataAugmentation to 0.3 for package DDIM, (kee…
Browse files Browse the repository at this point in the history
…p existing compat) (#877)

Co-authored-by: CompatHelper Julia <compathelper_noreply@julialang.org>
  • Loading branch information
github-actions[bot] and CompatHelper Julia authored Sep 5, 2024
1 parent 4361e51 commit 59f83fc
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions examples/DDIM/Project.toml
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
[deps]
AMDGPU = "21141c5a-9bdb-4563-92ae-f87d6854732e"
ArgCheck = "dce04be8-c92d-5529-be00-80e4d2c0e197"
CairoMakie = "13f3f980-e62b-5c42-98c6-ff1f3baf88f0"
ChainRulesCore = "d360d2e6-b24c-11e9-a2a3-2a2ae2dbcce4"
Expand All @@ -11,7 +12,6 @@ ImageCore = "a09fc81d-aa75-5fe9-8630-4744c3626534"
ImageIO = "82e4d734-157c-48bb-816b-45c225c6df19"
JLD2 = "033835bb-8acc-5ee8-8aae-3f567f8a3819"
Lux = "b2108857-7c20-44ae-9111-449ecde12c47"
AMDGPU = "21141c5a-9bdb-4563-92ae-f87d6854732e"
LuxCUDA = "d0bbae9a-e099-4d5b-a835-1c6931763bda"
MLUtils = "f1d291b0-491e-4a28-83b9-f70985020b54"
Optimisers = "3bd65402-5787-11e9-1adc-39752487f4e2"
Expand All @@ -25,19 +25,19 @@ TensorBoardLogger = "899adc3e-224a-11e9-021f-63837185c80f"
Zygote = "e88e6eb3-aa80-5325-afca-941959d7151f"

[compat]
AMDGPU = "0.9.6, 1"
ArgCheck = "2.3.0"
CairoMakie = "0.12"
ChainRulesCore = "1.23"
Comonicon = "1"
ConcreteStructs = "0.2.3"
DataAugmentation = "0.2.12"
DataAugmentation = "0.2.12, 0.3"
DataDeps = "0.7.13"
FileIO = "1.16"
ImageCore = "0.9, 0.10"
ImageIO = "0.6"
JLD2 = "0.4.48"
Lux = "0.5.52"
AMDGPU = "0.9.6, 1"
LuxCUDA = "0.3"
MLUtils = "0.4"
Optimisers = " 0.3"
Expand Down

1 comment on commit 59f83fc

@github-actions
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lux Benchmarks

Benchmark suite Current: 59f83fc Previous: 49f49a5 Ratio
Dense(512 => 512, identity)(512 x 128)/forward/CPU/2 thread(s) 412520.5 ns 414937.5 ns 0.99
Dense(512 => 512, identity)(512 x 128)/forward/CPU/4 thread(s) 323042 ns 322917 ns 1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/8 thread(s) 323583 ns 323167 ns 1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/1 thread(s) 752166.5 ns 739334 ns 1.02
Dense(512 => 512, identity)(512 x 128)/forward/GPU/CUDA 44168 ns 43603 ns 1.01
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/2 thread(s) 1384083 ns 1281041.5 ns 1.08
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/4 thread(s) 2451854 ns 2448000 ns 1.00
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/8 thread(s) 14238812.5 ns 14112208.5 ns 1.01
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/1 thread(s) 2239125 ns 2281500 ns 0.98
Dense(512 => 512, identity)(512 x 128)/zygote/GPU/CUDA 210250 ns 209418 ns 1.00
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/2 thread(s) 1411875 ns 1389292 ns 1.02
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/4 thread(s) 897520.5 ns 885541 ns 1.01
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/8 thread(s) 1516292 ns 1564334 ns 0.97
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/1 thread(s) 2210229 ns 2244666 ns 0.98
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s) 1725583 ns 1768458 ns 0.98
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s) 1017708.5 ns 1070292 ns 0.95
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s) 1538333 ns 1534708 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s) 3006583 ns 2945750 ns 1.02
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/GPU/CUDA 210559 ns 210107 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s) 12112667 ns 12156916 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s) 8809666.5 ns 8795791 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s) 9192709 ns 9216583 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s) 18570834 ns 18566125 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/GPU/CUDA 1504910 ns 1491618 ns 1.01
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s) 17273542 ns 17331499.5 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s) 13992292 ns 13987542 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s) 14538625 ns 14472812.5 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s) 21824875 ns 21820291 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s) 249443729 ns 249342958.5 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s) 148456250 ns 148241750 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s) 115795563 ns 116015000 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s) 454024458 ns 453798708 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/GPU/CUDA 5474002 ns 5449223 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s) 1144391209 ns 1146266250 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s) 981113333 ns 981704875 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s) 853440021 ns 841522708.5 ns 1.01
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s) 1805007208 ns 1759323917 ns 1.03
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA 31357343 ns 31586701 ns 0.99
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s) 1034466750 ns 1042907459 ns 0.99
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s) 1009660729.5 ns 1000416291.5 ns 1.01
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s) 1324456604 ns 1298076750 ns 1.02
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s) 1728354792 ns 1737205625 ns 0.99
lenet(28, 28, 1, 32)/forward/CPU/2 thread(s) 1093583 ns 1119562.5 ns 0.98
lenet(28, 28, 1, 32)/forward/CPU/4 thread(s) 1583083 ns 1622875 ns 0.98
lenet(28, 28, 1, 32)/forward/CPU/8 thread(s) 3678000 ns 3548958 ns 1.04
lenet(28, 28, 1, 32)/forward/CPU/1 thread(s) 779625 ns 785583 ns 0.99
lenet(28, 28, 1, 32)/forward/GPU/CUDA 273068.5 ns 274324 ns 1.00
lenet(28, 28, 1, 32)/zygote/CPU/2 thread(s) 2985458.5 ns 3038000 ns 0.98
lenet(28, 28, 1, 32)/zygote/CPU/4 thread(s) 4106125 ns 4083437.5 ns 1.01
lenet(28, 28, 1, 32)/zygote/CPU/8 thread(s) 10555937 ns 11037583 ns 0.96
lenet(28, 28, 1, 32)/zygote/CPU/1 thread(s) 3131667 ns 3144521 ns 1.00
lenet(28, 28, 1, 32)/zygote/GPU/CUDA 1134574.5 ns 1135705 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s) 2275083 ns 2308333.5 ns 0.99
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s) 1429583 ns 1430208 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s) 1656125 ns 1667021 ns 0.99
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s) 4200438 ns 4209459 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/GPU/CUDA 210634 ns 210374 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s) 19375958 ns 19417292 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s) 16086292 ns 16085209 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s) 17180583 ns 17361667 ns 0.99
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s) 25782875 ns 25874854.5 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA 1606705 ns 1598161.5 ns 1.01
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s) 34182625 ns 34253375 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s) 30811875 ns 30840208 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s) 31108104 ns 31540625 ns 0.99
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s) 36403791 ns 36820354 ns 0.99
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s) 4540667 ns 4533500 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s) 2769500.5 ns 2754437.5 ns 1.01
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s) 2921250 ns 2922958.5 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s) 8391917 ns 8379896 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/GPU/CUDA 423308 ns 425541 ns 0.99
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s) 39022250 ns 38931833.5 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s) 32067021 ns 32059166 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s) 32250916.5 ns 32304958 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s) 51820375 ns 51832000 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA 2657162.5 ns 2625272.5 ns 1.01
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s) 88606874.5 ns 89088083.5 ns 0.99
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s) 113796125 ns 114374167 ns 0.99
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s) 223648041 ns 224208209 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s) 74335583.5 ns 74528979 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s) 267029417 ns 268596959 ns 0.99
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s) 158942229.5 ns 159233333.5 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s) 126886229 ns 126780333 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s) 487631541 ns 484901875 ns 1.01
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/GPU/CUDA 6889435 ns 7002114 ns 0.98
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s) 1474300812.5 ns 1474144916.5 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s) 1174433750 ns 1144467750 ns 1.03
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s) 1063095500 ns 1075737187.5 ns 0.99
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s) 2007751479 ns 2026289333.5 ns 0.99
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA 34685949 ns 34635293 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s) 1689349708 ns 1704908667 ns 0.99
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s) 1535787500 ns 1477917583.5 ns 1.04
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s) 1814518792 ns 1882348542 ns 0.96
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s) 2211056708.5 ns 2231847042 ns 0.99
lenet(28, 28, 1, 128)/forward/CPU/2 thread(s) 2089187.5 ns 2004166.5 ns 1.04
lenet(28, 28, 1, 128)/forward/CPU/4 thread(s) 2976458 ns 2569125 ns 1.16
lenet(28, 28, 1, 128)/forward/CPU/8 thread(s) 7304583 ns 6929417 ns 1.05
lenet(28, 28, 1, 128)/forward/CPU/1 thread(s) 2476917 ns 2435021 ns 1.02
lenet(28, 28, 1, 128)/forward/GPU/CUDA 272072.5 ns 267867 ns 1.02
lenet(28, 28, 1, 128)/zygote/CPU/2 thread(s) 9643854 ns 9579229.5 ns 1.01
lenet(28, 28, 1, 128)/zygote/CPU/4 thread(s) 12014792 ns 11450124.5 ns 1.05
lenet(28, 28, 1, 128)/zygote/CPU/8 thread(s) 25647896 ns 24113603.5 ns 1.06
lenet(28, 28, 1, 128)/zygote/CPU/1 thread(s) 11736104 ns 11704333 ns 1.00
lenet(28, 28, 1, 128)/zygote/GPU/CUDA 1173736.5 ns 1169053.5 ns 1.00
vgg16(32, 32, 3, 32)/forward/CPU/2 thread(s) 380778209 ns 380411333 ns 1.00
vgg16(32, 32, 3, 32)/forward/CPU/4 thread(s) 282717792 ns 282013417 ns 1.00
vgg16(32, 32, 3, 32)/forward/CPU/8 thread(s) 238251708.5 ns 241718292 ns 0.99
vgg16(32, 32, 3, 32)/forward/CPU/1 thread(s) 453270208 ns 452199062 ns 1.00
vgg16(32, 32, 3, 32)/forward/GPU/CUDA 4856475 ns 4861447 ns 1.00
vgg16(32, 32, 3, 32)/zygote/CPU/2 thread(s) 1156978917 ns 1177026875 ns 0.98
vgg16(32, 32, 3, 32)/zygote/CPU/4 thread(s) 919622250 ns 911798208 ns 1.01
vgg16(32, 32, 3, 32)/zygote/CPU/8 thread(s) 945107000 ns 959270583 ns 0.99
vgg16(32, 32, 3, 32)/zygote/CPU/1 thread(s) 1428489000 ns 1420082458 ns 1.01
vgg16(32, 32, 3, 32)/zygote/GPU/CUDA 17978082 ns 18016614 ns 1.00
lenet(28, 28, 1, 64)/forward/CPU/2 thread(s) 1021959 ns 1506958 ns 0.68
lenet(28, 28, 1, 64)/forward/CPU/4 thread(s) 2001250 ns 1619042 ns 1.24
lenet(28, 28, 1, 64)/forward/CPU/8 thread(s) 6008000 ns 6078187.5 ns 0.99
lenet(28, 28, 1, 64)/forward/CPU/1 thread(s) 1374000 ns 1294916 ns 1.06
lenet(28, 28, 1, 64)/forward/GPU/CUDA 268964 ns 267932 ns 1.00
lenet(28, 28, 1, 64)/zygote/CPU/2 thread(s) 6414395.5 ns 6812000 ns 0.94
lenet(28, 28, 1, 64)/zygote/CPU/4 thread(s) 12403896 ns 13135583 ns 0.94
lenet(28, 28, 1, 64)/zygote/CPU/8 thread(s) 20716333 ns 19155666 ns 1.08
lenet(28, 28, 1, 64)/zygote/CPU/1 thread(s) 6079792 ns 6056354 ns 1.00
lenet(28, 28, 1, 64)/zygote/GPU/CUDA 1209955 ns 1212499 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s) 70501749.5 ns 70511583 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s) 43580771 ns 43537375 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s) 39491375 ns 39409417 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s) 132802458.5 ns 133783708 ns 0.99
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/GPU/CUDA 1859689 ns 1933585.5 ns 0.96
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s) 384818104 ns 381557562.5 ns 1.01
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s) 295632667 ns 295764895.5 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s) 281694167 ns 281324083 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s) 534727063 ns 535257270.5 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/GPU/CUDA 12284399.5 ns 12290544.5 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s) 396068167 ns 412505875 ns 0.96
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s) 409321729.5 ns 373209375 ns 1.10
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s) 678917958 ns 688004291 ns 0.99
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s) 711312959 ns 709404875 ns 1.00
vgg16(32, 32, 3, 128)/forward/CPU/2 thread(s) 1190798042 ns 1186087125 ns 1.00
vgg16(32, 32, 3, 128)/forward/CPU/4 thread(s) 688321229 ns 688362479 ns 1.00
vgg16(32, 32, 3, 128)/forward/CPU/8 thread(s) 630150084 ns 626514875 ns 1.01
vgg16(32, 32, 3, 128)/forward/CPU/1 thread(s) 1776546083 ns 1778854333.5 ns 1.00
vgg16(32, 32, 3, 128)/forward/GPU/CUDA 12315985 ns 12319166.5 ns 1.00
vgg16(32, 32, 3, 128)/zygote/CPU/2 thread(s) 3607588771 ns 3506982229 ns 1.03
vgg16(32, 32, 3, 128)/zygote/CPU/4 thread(s) 2756374750 ns 2794034750 ns 0.99
vgg16(32, 32, 3, 128)/zygote/CPU/8 thread(s) 2714951667 ns 2699392833 ns 1.01
vgg16(32, 32, 3, 128)/zygote/CPU/1 thread(s) 4951023834 ns 4950907833 ns 1.00
vgg16(32, 32, 3, 128)/zygote/GPU/CUDA 49373771 ns 49414957 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s) 3429083.5 ns 3424125 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s) 2066792 ns 2051500 ns 1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s) 2527666 ns 2533250 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s) 6016750 ns 6031895.5 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/GPU/CUDA 311191 ns 313046 ns 0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s) 25518541 ns 25554687.5 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s) 18527417 ns 18540916 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s) 18707833 ns 18962271 ns 0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s) 38890083 ns 38399291 ns 1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA 2479107 ns 2470998 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s) 54171458 ns 54650500 ns 0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s) 78979625 ns 78908438 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s) 171331479 ns 169063625 ns 1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s) 45540167 ns 45558958 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s) 1785458 ns 1786417 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s) 1046062.5 ns 1086125 ns 0.96
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s) 1583208.5 ns 1603104.5 ns 0.99
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s) 3024416.5 ns 3030083 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/GPU/CUDA 213982 ns 214935.5 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s) 12521375 ns 12546292 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s) 9184167 ns 9205583.5 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s) 9599958.5 ns 9646125 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s) 18940458 ns 18948583 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA 1538264 ns 1529511.5 ns 1.01
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s) 17640750 ns 17691958 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s) 14307771 ns 14322292 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s) 14507583 ns 14657000 ns 0.99
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s) 22177500 ns 22150500 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s) 70512937 ns 70485250 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s) 43444479.5 ns 43560250 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s) 39626750 ns 39651125 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s) 132598874.5 ns 132456250 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/GPU/CUDA 1950639 ns 1934167.5 ns 1.01
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s) 359565417 ns 359706666 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s) 293550333 ns 289803812 ns 1.01
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s) 287837104.5 ns 287024520.5 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s) 622550708.5 ns 620943458 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA 13384881.5 ns 13389118 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s) 419108729 ns 418207750.5 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s) 424758959 ns 426872417 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s) 717519375 ns 708863792 ns 1.01
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s) 716499833 ns 714272667 ns 1.00
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/2 thread(s) 1521229 ns 1467229.5 ns 1.04
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/4 thread(s) 1235833 ns 1164958 ns 1.06
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/8 thread(s) 1246625 ns 1223792 ns 1.02
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/1 thread(s) 2300875 ns 2308375 ns 1.00
mlp7layer_bn(gelu)(32 x 256)/forward/GPU/CUDA 587061.5 ns 583756 ns 1.01
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/2 thread(s) 8812333 ns 8755666.5 ns 1.01
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/4 thread(s) 12926416 ns 12812083 ns 1.01
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/8 thread(s) 30195584 ns 31879208 ns 0.95
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/1 thread(s) 9787000 ns 9792458 ns 1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/GPU/CUDA 1419851.5 ns 1392390 ns 1.02
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/2 thread(s) 18056125 ns 17932750 ns 1.01
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/4 thread(s) 16803125 ns 17135625 ns 0.98
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/8 thread(s) 29287584 ns 29811583 ns 0.98
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/1 thread(s) 14378083 ns 14460729.5 ns 0.99
Dense(512 => 512, relu)(512 x 128)/forward/CPU/2 thread(s) 805145.5 ns 823083.5 ns 0.98
Dense(512 => 512, relu)(512 x 128)/forward/CPU/4 thread(s) 589041.5 ns 620625 ns 0.95
Dense(512 => 512, relu)(512 x 128)/forward/CPU/8 thread(s) 1034812.5 ns 1022854.5 ns 1.01
Dense(512 => 512, relu)(512 x 128)/forward/CPU/1 thread(s) 726750 ns 740791 ns 0.98
Dense(512 => 512, relu)(512 x 128)/forward/GPU/CUDA 47938.5 ns 47357.5 ns 1.01
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/2 thread(s) 1542875 ns 1528750 ns 1.01
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/4 thread(s) 1000270.5 ns 953917 ns 1.05
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/8 thread(s) 1504041 ns 1387583 ns 1.08
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/1 thread(s) 2294104 ns 2279146 ns 1.01
Dense(512 => 512, relu)(512 x 128)/zygote/GPU/CUDA 236494.5 ns 233369 ns 1.01
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/2 thread(s) 1722687.5 ns 1748792 ns 0.99
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/4 thread(s) 1250438 ns 1258250 ns 0.99
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/8 thread(s) 1858854.5 ns 1680104 ns 1.11
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/1 thread(s) 2311917 ns 2337833.5 ns 0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s) 3404416 ns 3398000.5 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s) 2046208 ns 2032875 ns 1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s) 2516916.5 ns 2524750 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s) 6013625 ns 5998916 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/GPU/CUDA 285181.5 ns 282348 ns 1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s) 24021312.5 ns 24156145.5 ns 0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s) 17217833 ns 17254937.5 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s) 17101666.5 ns 17217979.5 ns 0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s) 37551396 ns 37524604.5 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/GPU/CUDA 2407620 ns 2399084 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s) 52545812.5 ns 52823083 ns 0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s) 80522312.5 ns 80975187.5 ns 0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s) 166982250.5 ns 170431562.5 ns 0.98
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s) 44529604 ns 44543937.5 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s) 250184208.5 ns 251011396 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s) 147977833 ns 148156125 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s) 115557083.5 ns 115824354 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s) 447150583.5 ns 454908937.5 ns 0.98
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/GPU/CUDA 5457630 ns 5336248.5 ns 1.02
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s) 1128644583 ns 1130772458 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s) 881731833.5 ns 881484167 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s) 805115667 ns 804587958 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s) 1757118042 ns 1745692959 ns 1.01
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/GPU/CUDA 28927493 ns 28847342 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s) 1058828646 ns 1027064583.5 ns 1.03
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s) 973248125 ns 959640250 ns 1.01
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s) 1362518583 ns 1261786916 ns 1.08
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s) 1744326604 ns 1731191479.5 ns 1.01
mlp7layer_bn(relu)(32 x 256)/forward/CPU/2 thread(s) 1317667 ns 1173624.5 ns 1.12
mlp7layer_bn(relu)(32 x 256)/forward/CPU/4 thread(s) 936250 ns 906000 ns 1.03
mlp7layer_bn(relu)(32 x 256)/forward/CPU/8 thread(s) 907396 ns 939334 ns 0.97
mlp7layer_bn(relu)(32 x 256)/forward/CPU/1 thread(s) 2059708 ns 2039708.5 ns 1.01
mlp7layer_bn(relu)(32 x 256)/forward/GPU/CUDA 573972.5 ns 570174 ns 1.01
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/2 thread(s) 5872667 ns 5806917 ns 1.01
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/4 thread(s) 6537417 ns 7014250 ns 0.93
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/8 thread(s) 24586229.5 ns 25017291.5 ns 0.98
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/1 thread(s) 7039792 ns 7060041 ns 1.00
mlp7layer_bn(relu)(32 x 256)/zygote/GPU/CUDA 1375117 ns 1340455.5 ns 1.03
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/2 thread(s) 11464417 ns 11530292 ns 0.99
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/4 thread(s) 10266333 ns 8850020.5 ns 1.16
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/8 thread(s) 17693667 ns 17434458 ns 1.01
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/1 thread(s) 8866896 ns 8551437.5 ns 1.04
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/2 thread(s) 487208 ns 506417 ns 0.96
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/4 thread(s) 474584 ns 273625 ns 1.73
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/8 thread(s) 2175853.5 ns 2396979 ns 0.91
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/1 thread(s) 87541 ns 90000 ns 0.97
Dense(128 => 128, gelu)(128 x 128)/forward/GPU/CUDA 28408 ns 27635 ns 1.03
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/2 thread(s) 383437.5 ns 385625 ns 0.99
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/4 thread(s) 444333.5 ns 348812.5 ns 1.27
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/8 thread(s) 4385583 ns 4572979.5 ns 0.96
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/1 thread(s) 268292 ns 262125 ns 1.02
Dense(128 => 128, gelu)(128 x 128)/zygote/GPU/CUDA 225901 ns 220978.5 ns 1.02
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/2 thread(s) 706959 ns 707916 ns 1.00
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/4 thread(s) 722500 ns 579562.5 ns 1.25
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/8 thread(s) 1069791 ns 1057604 ns 1.01
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/1 thread(s) 447125 ns 449729 ns 0.99
Dense(128 => 128, relu)(128 x 128)/forward/CPU/2 thread(s) 432125 ns 456750 ns 0.95
Dense(128 => 128, relu)(128 x 128)/forward/CPU/4 thread(s) 418166 ns 212166 ns 1.97
Dense(128 => 128, relu)(128 x 128)/forward/CPU/8 thread(s) 742500 ns 729000 ns 1.02
Dense(128 => 128, relu)(128 x 128)/forward/CPU/1 thread(s) 53208 ns 54895.5 ns 0.97
Dense(128 => 128, relu)(128 x 128)/forward/GPU/CUDA 28501 ns 27483.5 ns 1.04
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/2 thread(s) 338770.5 ns 339208 ns 1.00
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/4 thread(s) 338750 ns 194896 ns 1.74
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/8 thread(s) 737375 ns 864542 ns 0.85
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/1 thread(s) 154208 ns 153187.5 ns 1.01
Dense(128 => 128, relu)(128 x 128)/zygote/GPU/CUDA 210566 ns 206000 ns 1.02
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/2 thread(s) 404125 ns 406291 ns 0.99
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/4 thread(s) 405916.5 ns 262083.5 ns 1.55
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/8 thread(s) 983208 ns 828042 ns 1.19
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/1 thread(s) 174750 ns 173792 ns 1.01
vgg16(32, 32, 3, 64)/forward/CPU/2 thread(s) 603527917 ns 600740375 ns 1.00
vgg16(32, 32, 3, 64)/forward/CPU/4 thread(s) 431057458.5 ns 425777500 ns 1.01
vgg16(32, 32, 3, 64)/forward/CPU/8 thread(s) 375361437.5 ns 373716000 ns 1.00
vgg16(32, 32, 3, 64)/forward/CPU/1 thread(s) 872552854 ns 873713812.5 ns 1.00
vgg16(32, 32, 3, 64)/forward/GPU/CUDA 7040620 ns 7032511.5 ns 1.00
vgg16(32, 32, 3, 64)/zygote/CPU/2 thread(s) 1986550813 ns 2084258688 ns 0.95
vgg16(32, 32, 3, 64)/zygote/CPU/4 thread(s) 1668902250 ns 1651169312.5 ns 1.01
vgg16(32, 32, 3, 64)/zygote/CPU/8 thread(s) 1651138625 ns 1580932771 ns 1.04
vgg16(32, 32, 3, 64)/zygote/CPU/1 thread(s) 2764176416 ns 2753232709 ns 1.00
vgg16(32, 32, 3, 64)/zygote/GPU/CUDA 25979788.5 ns 26093846 ns 1.00
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/2 thread(s) 521833 ns 534708 ns 0.98
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/4 thread(s) 437250 ns 428292 ns 1.02
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/8 thread(s) 1710708 ns 1851583 ns 0.92
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/1 thread(s) 866062.5 ns 866334 ns 1.00
Dense(512 => 512, gelu)(512 x 128)/forward/GPU/CUDA 47823 ns 46927 ns 1.02
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/2 thread(s) 1842562.5 ns 1888062.5 ns 0.98
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/4 thread(s) 2356875 ns 2316896 ns 1.02
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/8 thread(s) 14345020.5 ns 14585209 ns 0.98
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/1 thread(s) 2764166 ns 2757583.5 ns 1.00
Dense(512 => 512, gelu)(512 x 128)/zygote/GPU/CUDA 252466.5 ns 247984.5 ns 1.02
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/2 thread(s) 2751750 ns 2751959 ns 1.00
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/4 thread(s) 2316083 ns 2279292 ns 1.02
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/8 thread(s) 4360708 ns 3318791.5 ns 1.31
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/1 thread(s) 4727708 ns 3395625 ns 1.39
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/2 thread(s) 1581500 ns 1510000 ns 1.05
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/4 thread(s) 1216229.5 ns 1177708 ns 1.03
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/8 thread(s) 1177645.5 ns 1195583 ns 0.98
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/1 thread(s) 2314729 ns 2315167 ns 1.00
mlp7layer_bn(tanh)(32 x 256)/forward/GPU/CUDA 547137 ns 588506 ns 0.93
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/2 thread(s) 5877292 ns 5715958.5 ns 1.03
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/4 thread(s) 6745916.5 ns 6618896 ns 1.02
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/8 thread(s) 24550687.5 ns 24170542 ns 1.02
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/1 thread(s) 7266312 ns 7277583.5 ns 1.00
mlp7layer_bn(tanh)(32 x 256)/zygote/GPU/CUDA 1351645 ns 1377478 ns 0.98
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/2 thread(s) 12285333.5 ns 12783958 ns 0.96
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/4 thread(s) 12037124.5 ns 11833292 ns 1.02
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/8 thread(s) 20466187 ns 19658396.5 ns 1.04
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/1 thread(s) 10853417 ns 9760416.5 ns 1.11
Dense(16 => 16, relu)(16 x 128)/forward/CPU/2 thread(s) 2500 ns 2750 ns 0.91
Dense(16 => 16, relu)(16 x 128)/forward/CPU/4 thread(s) 2750 ns 2583 ns 1.06
Dense(16 => 16, relu)(16 x 128)/forward/CPU/8 thread(s) 3416 ns 3250 ns 1.05
Dense(16 => 16, relu)(16 x 128)/forward/CPU/1 thread(s) 3041 ns 4771 ns 0.64
Dense(16 => 16, relu)(16 x 128)/forward/GPU/CUDA 24989 ns 24855 ns 1.01
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/2 thread(s) 8333 ns 8708 ns 0.96
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/4 thread(s) 8625 ns 8500 ns 1.01
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/8 thread(s) 8667 ns 8416 ns 1.03
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/1 thread(s) 8770.5 ns 8479.5 ns 1.03
Dense(16 => 16, relu)(16 x 128)/zygote/GPU/CUDA 213236.5 ns 213745.5 ns 1.00
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/2 thread(s) 16750 ns 16583 ns 1.01
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/4 thread(s) 16375 ns 16583 ns 0.99
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/8 thread(s) 16792 ns 16625 ns 1.01
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/1 thread(s) 10917 ns 10709 ns 1.02
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/2 thread(s) 10792 ns 11791 ns 0.92
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/4 thread(s) 18083 ns 16125 ns 1.12
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/8 thread(s) 11666 ns 11750 ns 0.99
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/1 thread(s) 7666.5 ns 7583 ns 1.01
Dense(16 => 16, gelu)(16 x 128)/forward/GPU/CUDA 24865.5 ns 24983 ns 1.00
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/2 thread(s) 22333 ns 22292 ns 1.00
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/4 thread(s) 22291 ns 22625 ns 0.99
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/8 thread(s) 22500 ns 22416.5 ns 1.00
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/1 thread(s) 22375 ns 22541 ns 0.99
Dense(16 => 16, gelu)(16 x 128)/zygote/GPU/CUDA 233562.5 ns 235287.5 ns 0.99
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/2 thread(s) 52042 ns 52250 ns 1.00
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/4 thread(s) 52125 ns 52375 ns 1.00
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/8 thread(s) 52270.5 ns 52417 ns 1.00
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/1 thread(s) 44000 ns 43792 ns 1.00
Dense(128 => 128, identity)(128 x 128)/forward/CPU/2 thread(s) 28979.5 ns 29333 ns 0.99
Dense(128 => 128, identity)(128 x 128)/forward/CPU/4 thread(s) 29208 ns 28791 ns 1.01
Dense(128 => 128, identity)(128 x 128)/forward/CPU/8 thread(s) 28458 ns 29167 ns 0.98
Dense(128 => 128, identity)(128 x 128)/forward/CPU/1 thread(s) 46209 ns 46167 ns 1.00
Dense(128 => 128, identity)(128 x 128)/forward/GPU/CUDA 26274 ns 26056 ns 1.01
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/2 thread(s) 229062.5 ns 209667 ns 1.09
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/4 thread(s) 263041 ns 257250 ns 1.02
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/8 thread(s) 4056646 ns 4075916 ns 1.00
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/1 thread(s) 154437.5 ns 147625 ns 1.05
Dense(128 => 128, identity)(128 x 128)/zygote/GPU/CUDA 215509 ns 220948.5 ns 0.98
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/2 thread(s) 329834 ns 308542 ns 1.07
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/4 thread(s) 292583 ns 282917 ns 1.03
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/8 thread(s) 817500 ns 767042 ns 1.07
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/1 thread(s) 161708 ns 161708 ns 1
Dense(16 => 16, identity)(16 x 128)/forward/CPU/2 thread(s) 2041 ns 2042 ns 1.00
Dense(16 => 16, identity)(16 x 128)/forward/CPU/4 thread(s) 1833 ns 1958 ns 0.94
Dense(16 => 16, identity)(16 x 128)/forward/CPU/8 thread(s) 2750 ns 2312.5 ns 1.19
Dense(16 => 16, identity)(16 x 128)/forward/CPU/1 thread(s) 1917 ns 1958 ns 0.98
Dense(16 => 16, identity)(16 x 128)/forward/GPU/CUDA 23258 ns 22938 ns 1.01
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/2 thread(s) 7208 ns 7375 ns 0.98
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/4 thread(s) 7042 ns 7250 ns 0.97
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/8 thread(s) 7750 ns 7625 ns 1.02
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/1 thread(s) 7125 ns 7250 ns 0.98
Dense(16 => 16, identity)(16 x 128)/zygote/GPU/CUDA 267733.5 ns 263592.5 ns 1.02
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/2 thread(s) 11334 ns 11292 ns 1.00
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/4 thread(s) 11375 ns 11458 ns 0.99
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/8 thread(s) 11708 ns 11500 ns 1.02
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/1 thread(s) 6958 ns 7000 ns 0.99
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s) 79930209 ns 79852292 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s) 49066500 ns 49068812.5 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s) 45049708 ns 45007187.5 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s) 151430167 ns 151374416 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/GPU/CUDA 2719840 ns 2720111.5 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s) 497512959 ns 607847917 ns 0.82
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s) 411297375 ns 412172583 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s) 396546125 ns 398297875 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s) 736651313 ns 737514583.5 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA 14587409 ns 14594549 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s) 709337374.5 ns 713373500 ns 0.99
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s) 664763792 ns 665302083 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s) 1022853709 ns 1010864625 ns 1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s) 996468292 ns 998393833 ns 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Please sign in to comment.