Skip to content

Commit

Permalink
docs: update old docs
Browse files Browse the repository at this point in the history
  • Loading branch information
avik-pal committed Sep 19, 2024
1 parent 149bc0f commit 4f29928
Show file tree
Hide file tree
Showing 3 changed files with 1 addition and 11 deletions.
3 changes: 1 addition & 2 deletions src/layers/basic.jl
Original file line number Diff line number Diff line change
Expand Up @@ -241,8 +241,7 @@ be `Chain((x, ps, st) -> (relu.(x), st))`. An easier thing to do would be
## Inputs
- `x`: s.t `hasmethod(f, (typeof(x),))` is `true` if :direct_call else
`hasmethod(f, (typeof(x), NamedTuple, NamedTuple))` is `true`
- `x`: will be directly passed to `f`
## Returns
Expand Down
3 changes: 0 additions & 3 deletions src/layers/conv.jl
Original file line number Diff line number Diff line change
Expand Up @@ -482,9 +482,6 @@ resolution images while upscaling them.
See `NNlib.pixel_shuffle` for more details.
PixelShuffle is not a Layer, rather it returns a [`WrappedFunction`](@ref) with the
function set to `Base.Fix2(pixel_shuffle, r)`
## Arguments
- `r`: Upscale factor
Expand Down
6 changes: 0 additions & 6 deletions src/layers/normalize.jl
Original file line number Diff line number Diff line change
Expand Up @@ -410,12 +410,6 @@ y = \frac{x - \mathbb{E}[x]}{\sqrt{Var[x] + \epsilon}} * \gamma + \beta
where ``\gamma`` & ``\beta`` are trainable parameters if `affine=true`.
!!! warning "Inconsistent Defaults till v0.5.0"
As of v0.5.0, the doc used to say `affine::Bool=false`, but the code actually had
`affine::Bool=true` as the default. Now the doc reflects the code, so please check
whether your assumptions about the default (if made) were invalid.
## Arguments
- `shape`: Broadcastable shape of input array excluding the batch dimension.
Expand Down

1 comment on commit 4f29928

@github-actions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lux Benchmarks

Benchmark suite Current: 4f29928 Previous: 1b7c9a9 Ratio
Dense(512 => 512, identity)(512 x 128)/forward/CPU/2 thread(s) 414750 ns 414167 ns 1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/4 thread(s) 243729.5 ns 243812.5 ns 1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/8 thread(s) 243645.5 ns 243375 ns 1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/1 thread(s) 739937.5 ns 739750 ns 1.00
Dense(512 => 512, identity)(512 x 128)/forward/GPU/CUDA 44131.5 ns 43608.5 ns 1.01
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/2 thread(s) 1277770.5 ns 1274750 ns 1.00
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/4 thread(s) 1251833 ns 1257604 ns 1.00
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/8 thread(s) 16532875 ns 16232709 ns 1.02
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/1 thread(s) 2259416 ns 2193229 ns 1.03
Dense(512 => 512, identity)(512 x 128)/zygote/GPU/CUDA 211816 ns 205508.5 ns 1.03
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/2 thread(s) 1353417 ns 1311791 ns 1.03
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/4 thread(s) 1287562.5 ns 1296000 ns 0.99
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/8 thread(s) 16470396.5 ns 16564750 ns 0.99
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/1 thread(s) 2246000 ns 2236917 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s) 1755729 ns 1656771 ns 1.06
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s) 1021000.5 ns 1101167 ns 0.93
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s) 1537666 ns 1519083 ns 1.01
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s) 2999834 ns 2996500 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/GPU/CUDA 209878 ns 206771 ns 1.02
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s) 12143375 ns 12074917 ns 1.01
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s) 8839500 ns 8846125 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s) 9220916.5 ns 9185812.5 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s) 18588542 ns 18620646 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/GPU/CUDA 1491282 ns 1506641 ns 0.99
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s) 17297167 ns 17279459 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s) 13998334 ns 14009229.5 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s) 14528812.5 ns 14468291.5 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s) 21846291.5 ns 21873146 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s) 250636687.5 ns 252162083.5 ns 0.99
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s) 148810208 ns 148884583 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s) 116894000 ns 116232875 ns 1.01
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s) 447336459 ns 447534666 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/GPU/CUDA 5498322 ns 5465296 ns 1.01
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s) 1223363500 ns 1230946875 ns 0.99
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s) 932727167 ns 931953750 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s) 835497354 ns 826867750.5 ns 1.01
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s) 1631111709 ns 1631748667 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA 31309225 ns 31362804 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s) 1143243667 ns 1146184875 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s) 994946937.5 ns 997853916.5 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s) 1312863792 ns 1329065916.5 ns 0.99
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s) 1733454958 ns 1736617187.5 ns 1.00
lenet(28, 28, 1, 32)/forward/CPU/2 thread(s) 1116500 ns 1111541.5 ns 1.00
lenet(28, 28, 1, 32)/forward/CPU/4 thread(s) 1643667 ns 1663917 ns 0.99
lenet(28, 28, 1, 32)/forward/CPU/8 thread(s) 3643542 ns 3634917 ns 1.00
lenet(28, 28, 1, 32)/forward/CPU/1 thread(s) 789041 ns 788500 ns 1.00
lenet(28, 28, 1, 32)/forward/GPU/CUDA 269726 ns 262430.5 ns 1.03
lenet(28, 28, 1, 32)/zygote/CPU/2 thread(s) 2991666 ns 2981646 ns 1.00
lenet(28, 28, 1, 32)/zygote/CPU/4 thread(s) 4148959 ns 4151854.5 ns 1.00
lenet(28, 28, 1, 32)/zygote/CPU/8 thread(s) 11609792 ns 10487312.5 ns 1.11
lenet(28, 28, 1, 32)/zygote/CPU/1 thread(s) 3148729 ns 3265083 ns 0.96
lenet(28, 28, 1, 32)/zygote/GPU/CUDA 1125192 ns 1131749 ns 0.99
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s) 2335333.5 ns 2342791 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s) 1299062.5 ns 1260000 ns 1.03
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s) 1557416 ns 1539542 ns 1.01
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s) 4213104 ns 4176916 ns 1.01
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/GPU/CUDA 210272 ns 208157.5 ns 1.01
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s) 19407291 ns 19392625 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s) 16096667 ns 16105895.5 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s) 17317666.5 ns 17329250 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s) 25907750 ns 25905125 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA 1592153 ns 1607984 ns 0.99
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s) 34310312.5 ns 34168604 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s) 30986792 ns 30734292 ns 1.01
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s) 31273833 ns 30891041.5 ns 1.01
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s) 36596250 ns 36714750 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s) 4515667 ns 4532000 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s) 2556459 ns 2546584 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s) 2688249.5 ns 2675583.5 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s) 8389334 ns 8386333 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/GPU/CUDA 427314.5 ns 419971 ns 1.02
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s) 39047458 ns 38621250 ns 1.01
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s) 32181166.5 ns 32144146 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s) 32260292 ns 32234313 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s) 52002833 ns 51925709 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA 2620622.5 ns 2628667 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s) 89392542 ns 89245375 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s) 115571416.5 ns 115663979 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s) 230633541.5 ns 223717000 ns 1.03
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s) 74339646 ns 74519062.5 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s) 268599084 ns 270237667 ns 0.99
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s) 156359333 ns 156197542 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s) 123836375 ns 123423271 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s) 485309084 ns 485408250 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/GPU/CUDA 7055046.5 ns 7027939 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s) 1473572104 ns 1473080062.5 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s) 1171549667 ns 1168760792 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s) 1064310166.5 ns 1063953145.5 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s) 2002969750 ns 2006090104 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA 34640088 ns 34772934.5 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s) 1719792625 ns 1719270959 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s) 1528780229.5 ns 1530344979 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s) 1913538000 ns 1879104875 ns 1.02
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s) 2212588749.5 ns 2217620458 ns 1.00
lenet(28, 28, 1, 128)/forward/CPU/2 thread(s) 2096854 ns 2066124.5 ns 1.01
lenet(28, 28, 1, 128)/forward/CPU/4 thread(s) 3045708 ns 3080917 ns 0.99
lenet(28, 28, 1, 128)/forward/CPU/8 thread(s) 7809458 ns 7964834 ns 0.98
lenet(28, 28, 1, 128)/forward/CPU/1 thread(s) 2327583 ns 2511771 ns 0.93
lenet(28, 28, 1, 128)/forward/GPU/CUDA 272243 ns 272286 ns 1.00
lenet(28, 28, 1, 128)/zygote/CPU/2 thread(s) 9676145.5 ns 9629792 ns 1.00
lenet(28, 28, 1, 128)/zygote/CPU/4 thread(s) 12104584 ns 12051208 ns 1.00
lenet(28, 28, 1, 128)/zygote/CPU/8 thread(s) 25834750.5 ns 23782666.5 ns 1.09
lenet(28, 28, 1, 128)/zygote/CPU/1 thread(s) 11757916.5 ns 11321791 ns 1.04
lenet(28, 28, 1, 128)/zygote/GPU/CUDA 1202175.5 ns 1192316.5 ns 1.01
vgg16(32, 32, 3, 32)/forward/CPU/2 thread(s) 381271021 ns 379182875 ns 1.01
vgg16(32, 32, 3, 32)/forward/CPU/4 thread(s) 310142458.5 ns 311332270.5 ns 1.00
vgg16(32, 32, 3, 32)/forward/CPU/8 thread(s) 259645854 ns 260260313 ns 1.00
vgg16(32, 32, 3, 32)/forward/CPU/1 thread(s) 452979375.5 ns 450681833 ns 1.01
vgg16(32, 32, 3, 32)/forward/GPU/CUDA 4824218 ns 4857816 ns 0.99
vgg16(32, 32, 3, 32)/zygote/CPU/2 thread(s) 1158461917 ns 1151703750 ns 1.01
vgg16(32, 32, 3, 32)/zygote/CPU/4 thread(s) 943496166 ns 938427709 ns 1.01
vgg16(32, 32, 3, 32)/zygote/CPU/8 thread(s) 962405792 ns 943142791 ns 1.02
vgg16(32, 32, 3, 32)/zygote/CPU/1 thread(s) 1401166834 ns 1396853084 ns 1.00
vgg16(32, 32, 3, 32)/zygote/GPU/CUDA 17976769 ns 17794579 ns 1.01
lenet(28, 28, 1, 64)/forward/CPU/2 thread(s) 1054125 ns 1048833 ns 1.01
lenet(28, 28, 1, 64)/forward/CPU/4 thread(s) 1661875 ns 1655208.5 ns 1.00
lenet(28, 28, 1, 64)/forward/CPU/8 thread(s) 5315041.5 ns 4851812 ns 1.10
lenet(28, 28, 1, 64)/forward/CPU/1 thread(s) 1312541 ns 1291167 ns 1.02
lenet(28, 28, 1, 64)/forward/GPU/CUDA 262820 ns 278270.5 ns 0.94
lenet(28, 28, 1, 64)/zygote/CPU/2 thread(s) 6268084 ns 6497104 ns 0.96
lenet(28, 28, 1, 64)/zygote/CPU/4 thread(s) 13117917 ns 13086396 ns 1.00
lenet(28, 28, 1, 64)/zygote/CPU/8 thread(s) 19113646 ns 18753875 ns 1.02
lenet(28, 28, 1, 64)/zygote/CPU/1 thread(s) 6086916.5 ns 5891208.5 ns 1.03
lenet(28, 28, 1, 64)/zygote/GPU/CUDA 1202851 ns 1253158.5 ns 0.96
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s) 70511500 ns 70556458 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s) 43790645.5 ns 44452167 ns 0.99
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s) 39687083 ns 39837500 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s) 132685124.5 ns 132581125 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/GPU/CUDA 1858084 ns 1865473 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s) 356003687.5 ns 356767520.5 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s) 270657292 ns 272336833 ns 0.99
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s) 253711000 ns 255661771 ns 0.99
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s) 535180938 ns 534829208.5 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/GPU/CUDA 12300153.5 ns 12304649 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s) 396495000 ns 395040042 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s) 373274250 ns 370401500 ns 1.01
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s) 728479375 ns 693812291 ns 1.05
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s) 713118958 ns 711246750 ns 1.00
vgg16(32, 32, 3, 128)/forward/CPU/2 thread(s) 1190615875 ns 1188023709 ns 1.00
vgg16(32, 32, 3, 128)/forward/CPU/4 thread(s) 832981520.5 ns 835256562.5 ns 1.00
vgg16(32, 32, 3, 128)/forward/CPU/8 thread(s) 636784729 ns 638885750 ns 1.00
vgg16(32, 32, 3, 128)/forward/CPU/1 thread(s) 1772366146 ns 1768729250 ns 1.00
vgg16(32, 32, 3, 128)/forward/GPU/CUDA 12314533 ns 12316863.5 ns 1.00
vgg16(32, 32, 3, 128)/zygote/CPU/2 thread(s) 3626681875 ns 3627838020.5 ns 1.00
vgg16(32, 32, 3, 128)/zygote/CPU/4 thread(s) 2823362334 ns 2824735750 ns 1.00
vgg16(32, 32, 3, 128)/zygote/CPU/8 thread(s) 2696862625 ns 2694929167 ns 1.00
vgg16(32, 32, 3, 128)/zygote/CPU/1 thread(s) 5012508375 ns 5002434750 ns 1.00
vgg16(32, 32, 3, 128)/zygote/GPU/CUDA 49738182 ns 49730192 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s) 3411709 ns 3432375.5 ns 0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s) 2072167 ns 2078583 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s) 2511937 ns 2530500 ns 0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s) 6036208.5 ns 6020833 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/GPU/CUDA 339526 ns 339043.5 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s) 26069125 ns 25844354 ns 1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s) 19056209 ns 18918770.5 ns 1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s) 19089979 ns 19719959 ns 0.97
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s) 39346459 ns 39362209 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA 2459884 ns 2460010 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s) 54485937.5 ns 54493625 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s) 83611520.5 ns 84184417 ns 0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s) 172934000 ns 173059688 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s) 45625250 ns 45573959 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s) 1782458 ns 1783437.5 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s) 1105812.5 ns 1098584 ns 1.01
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s) 1567000 ns 1563624.5 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s) 3042542 ns 3028979 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/GPU/CUDA 211807 ns 212147.5 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s) 12573896 ns 12574667 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s) 9226667 ns 9223854 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s) 9578916 ns 9681958 ns 0.99
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s) 19028833 ns 18996416 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA 1529077 ns 1525057 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s) 17667042 ns 17650833 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s) 14347562.5 ns 14332292 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s) 14567083 ns 14552750 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s) 22199146 ns 22194208 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s) 70625625 ns 70637271 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s) 43726917 ns 44500249.5 ns 0.98
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s) 39746812.5 ns 40038333 ns 0.99
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s) 132787937.5 ns 132595500 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/GPU/CUDA 1934732 ns 1878861 ns 1.03
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s) 359903500 ns 361106062 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s) 348164021 ns 349644938 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s) 304529250 ns 304116708.5 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s) 723383792 ns 723634000 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA 13382889 ns 13382866.5 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s) 421402083.5 ns 419845083.5 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s) 425694708 ns 427670459 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s) 747909395.5 ns 765524104 ns 0.98
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s) 716447375 ns 715822875 ns 1.00
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/2 thread(s) 1592833 ns 1591792 ns 1.00
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/4 thread(s) 1158167 ns 1165292 ns 0.99
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/8 thread(s) 1146250 ns 1150479.5 ns 1.00
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/1 thread(s) 2412542 ns 2435375 ns 0.99
mlp7layer_bn(gelu)(32 x 256)/forward/GPU/CUDA 572386.5 ns 580934.5 ns 0.99
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/2 thread(s) 8854896 ns 8855583 ns 1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/4 thread(s) 13602562.5 ns 13566583 ns 1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/8 thread(s) 33345229.5 ns 33371313 ns 1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/1 thread(s) 9874291.5 ns 9856250 ns 1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/GPU/CUDA 1430613 ns 1447660.5 ns 0.99
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/2 thread(s) 16524209 ns 16614333.5 ns 0.99
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/4 thread(s) 23380666 ns 22957687.5 ns 1.02
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/8 thread(s) 43658750 ns 45530875 ns 0.96
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/1 thread(s) 13137667 ns 13137979 ns 1.00
Dense(512 => 512, relu)(512 x 128)/forward/CPU/2 thread(s) 824166.5 ns 830833 ns 0.99
Dense(512 => 512, relu)(512 x 128)/forward/CPU/4 thread(s) 570124.5 ns 515458 ns 1.11
Dense(512 => 512, relu)(512 x 128)/forward/CPU/8 thread(s) 1063916 ns 1061583 ns 1.00
Dense(512 => 512, relu)(512 x 128)/forward/CPU/1 thread(s) 725458.5 ns 723895.5 ns 1.00
Dense(512 => 512, relu)(512 x 128)/forward/GPU/CUDA 47937 ns 48058.5 ns 1.00
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/2 thread(s) 1459250 ns 1549792 ns 0.94
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/4 thread(s) 1049437 ns 1043458 ns 1.01
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/8 thread(s) 1395208 ns 1717459 ns 0.81
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/1 thread(s) 2260625 ns 2249729 ns 1.00
Dense(512 => 512, relu)(512 x 128)/zygote/GPU/CUDA 238994 ns 235968.5 ns 1.01
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/2 thread(s) 1530500 ns 1556416 ns 0.98
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/4 thread(s) 1089333 ns 1068292 ns 1.02
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/8 thread(s) 1620292 ns 1707875 ns 0.95
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/1 thread(s) 2253083 ns 2224354 ns 1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s) 3403625 ns 3404875 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s) 2061291.5 ns 2061708 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s) 2484792 ns 2526583 ns 0.98
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s) 6026312.5 ns 6005458 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/GPU/CUDA 284269 ns 284654 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s) 24093750 ns 24057375 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s) 17201292 ns 17188917 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s) 17041500 ns 17108854 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s) 37570375 ns 37589750 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/GPU/CUDA 2411977 ns 2418683.5 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s) 52911625 ns 52962291.5 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s) 84393791 ns 85344416 ns 0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s) 172819250 ns 171244354 ns 1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s) 44615937.5 ns 44652208.5 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s) 250487334 ns 251293750 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s) 148602334 ns 148493709 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s) 116391208 ns 116314333.5 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s) 448074791.5 ns 447949229.5 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/GPU/CUDA 5454241 ns 5446386 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s) 1105117875 ns 1103974709 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s) 858058396 ns 855630395.5 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s) 825075479.5 ns 831750854.5 ns 0.99
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s) 1753955542 ns 1754110584 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/GPU/CUDA 28910957.5 ns 28887646 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s) 1030979062.5 ns 1030795771 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s) 972989292 ns 973527459 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s) 1286035166 ns 1276835833 ns 1.01
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s) 1723177166.5 ns 1741435895.5 ns 0.99
mlp7layer_bn(relu)(32 x 256)/forward/CPU/2 thread(s) 1140750 ns 1102104.5 ns 1.04
mlp7layer_bn(relu)(32 x 256)/forward/CPU/4 thread(s) 760750 ns 764333 ns 1.00
mlp7layer_bn(relu)(32 x 256)/forward/CPU/8 thread(s) 752167 ns 784979 ns 0.96
mlp7layer_bn(relu)(32 x 256)/forward/CPU/1 thread(s) 2053417 ns 1957854 ns 1.05
mlp7layer_bn(relu)(32 x 256)/forward/GPU/CUDA 562591 ns 563252 ns 1.00
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/2 thread(s) 5876834 ns 5885125 ns 1.00
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/4 thread(s) 8974250 ns 9085895.5 ns 0.99
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/8 thread(s) 25959750 ns 26897042 ns 0.97
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/1 thread(s) 7106396 ns 7099083 ns 1.00
mlp7layer_bn(relu)(32 x 256)/zygote/GPU/CUDA 1411580 ns 1415829 ns 1.00
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/2 thread(s) 9670166.5 ns 9699771 ns 1.00
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/4 thread(s) 16148166 ns 15967729 ns 1.01
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/8 thread(s) 33000792 ns 32771687.5 ns 1.01
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/1 thread(s) 7621875 ns 7633666 ns 1.00
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/2 thread(s) 516896 ns 514458 ns 1.00
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/4 thread(s) 415479.5 ns 384604.5 ns 1.08
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/8 thread(s) 2957791.5 ns 3059459 ns 0.97
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/1 thread(s) 89500 ns 87833 ns 1.02
Dense(128 => 128, gelu)(128 x 128)/forward/GPU/CUDA 28198 ns 28219 ns 1.00
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/2 thread(s) 380583 ns 381812.5 ns 1.00
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/4 thread(s) 444083.5 ns 447750 ns 0.99
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/8 thread(s) 4683416 ns 4678459 ns 1.00
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/1 thread(s) 258979.5 ns 258375 ns 1.00
Dense(128 => 128, gelu)(128 x 128)/zygote/GPU/CUDA 227826.5 ns 228924.5 ns 1.00
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/2 thread(s) 413416 ns 410916.5 ns 1.01
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/4 thread(s) 475458 ns 479208 ns 0.99
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/8 thread(s) 4631791 ns 4649000 ns 1.00
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/1 thread(s) 271583 ns 270833 ns 1.00
Dense(128 => 128, relu)(128 x 128)/forward/CPU/2 thread(s) 462958 ns 461250.5 ns 1.00
Dense(128 => 128, relu)(128 x 128)/forward/CPU/4 thread(s) 355875 ns 322625 ns 1.10
Dense(128 => 128, relu)(128 x 128)/forward/CPU/8 thread(s) 767000.5 ns 768834 ns 1.00
Dense(128 => 128, relu)(128 x 128)/forward/CPU/1 thread(s) 53917 ns 52875 ns 1.02
Dense(128 => 128, relu)(128 x 128)/forward/GPU/CUDA 28301 ns 28278 ns 1.00
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/2 thread(s) 339959 ns 342333 ns 0.99
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/4 thread(s) 341521 ns 347625 ns 0.98
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/8 thread(s) 898375 ns 396687 ns 2.26
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/1 thread(s) 151708 ns 151250 ns 1.00
Dense(128 => 128, relu)(128 x 128)/zygote/GPU/CUDA 212644.5 ns 212495 ns 1.00
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/2 thread(s) 355000 ns 356000 ns 1.00
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/4 thread(s) 356709 ns 362937.5 ns 0.98
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/8 thread(s) 944500 ns 740771 ns 1.28
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/1 thread(s) 151167 ns 150875 ns 1.00
vgg16(32, 32, 3, 64)/forward/CPU/2 thread(s) 603130416 ns 601061209 ns 1.00
vgg16(32, 32, 3, 64)/forward/CPU/4 thread(s) 428986854 ns 430671250 ns 1.00
vgg16(32, 32, 3, 64)/forward/CPU/8 thread(s) 386662562 ns 383040583 ns 1.01
vgg16(32, 32, 3, 64)/forward/CPU/1 thread(s) 871726083.5 ns 870727020.5 ns 1.00
vgg16(32, 32, 3, 64)/forward/GPU/CUDA 7027236 ns 7032100 ns 1.00
vgg16(32, 32, 3, 64)/zygote/CPU/2 thread(s) 2003136437 ns 2000504228.5 ns 1.00
vgg16(32, 32, 3, 64)/zygote/CPU/4 thread(s) 1606958687.5 ns 1604685125 ns 1.00
vgg16(32, 32, 3, 64)/zygote/CPU/8 thread(s) 1550423687 ns 1652458646 ns 0.94
vgg16(32, 32, 3, 64)/zygote/CPU/1 thread(s) 2625941250 ns 2626165250 ns 1.00
vgg16(32, 32, 3, 64)/zygote/GPU/CUDA 25917847 ns 25934443 ns 1.00
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/2 thread(s) 520000 ns 526333 ns 0.99
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/4 thread(s) 394895.5 ns 400458.5 ns 0.99
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/8 thread(s) 2701958 ns 3022187.5 ns 0.89
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/1 thread(s) 866188 ns 868667 ns 1.00
Dense(512 => 512, gelu)(512 x 128)/forward/GPU/CUDA 47079 ns 47967.5 ns 0.98
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/2 thread(s) 1772187.5 ns 1757062.5 ns 1.01
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/4 thread(s) 1781709 ns 1694333 ns 1.05
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/8 thread(s) 16286125 ns 16312334 ns 1.00
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/1 thread(s) 2723250 ns 2651375 ns 1.03
Dense(512 => 512, gelu)(512 x 128)/zygote/GPU/CUDA 248319.5 ns 257253 ns 0.97
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/2 thread(s) 1850645.5 ns 1894750.5 ns 0.98
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/4 thread(s) 1848146 ns 1834625 ns 1.01
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/8 thread(s) 16689875 ns 16537333 ns 1.01
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/1 thread(s) 2754291 ns 2736604.5 ns 1.01
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/2 thread(s) 1469521 ns 1496021 ns 0.98
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/4 thread(s) 1034625 ns 931750 ns 1.11
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/8 thread(s) 988249.5 ns 1059667 ns 0.93
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/1 thread(s) 2212416.5 ns 2319292 ns 0.95
mlp7layer_bn(tanh)(32 x 256)/forward/GPU/CUDA 574726 ns 585808.5 ns 0.98
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/2 thread(s) 5868937.5 ns 5882458 ns 1.00
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/4 thread(s) 9178042 ns 8563167 ns 1.07
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/8 thread(s) 27617875 ns 26031937 ns 1.06
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/1 thread(s) 7341854.5 ns 7331479 ns 1.00
mlp7layer_bn(tanh)(32 x 256)/zygote/GPU/CUDA 1351520 ns 1393892 ns 0.97
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/2 thread(s) 11650895.5 ns 11701667 ns 1.00
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/4 thread(s) 18290208 ns 18292896 ns 1.00
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/8 thread(s) 38510270.5 ns 39864875 ns 0.97
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/1 thread(s) 9545666 ns 9527500 ns 1.00
Dense(16 => 16, relu)(16 x 128)/forward/CPU/2 thread(s) 2583 ns 2750 ns 0.94
Dense(16 => 16, relu)(16 x 128)/forward/CPU/4 thread(s) 2458 ns 2334 ns 1.05
Dense(16 => 16, relu)(16 x 128)/forward/CPU/8 thread(s) 3250 ns 3292 ns 0.99
Dense(16 => 16, relu)(16 x 128)/forward/CPU/1 thread(s) 4562.5 ns 2583 ns 1.77
Dense(16 => 16, relu)(16 x 128)/forward/GPU/CUDA 24500.5 ns 24864 ns 0.99
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/2 thread(s) 6833 ns 7041 ns 0.97
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/4 thread(s) 6875 ns 7166 ns 0.96
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/8 thread(s) 7292 ns 7250 ns 1.01
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/1 thread(s) 7166.5 ns 7083 ns 1.01
Dense(16 => 16, relu)(16 x 128)/zygote/GPU/CUDA 209627.5 ns 216254.5 ns 0.97
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/2 thread(s) 8084 ns 8250 ns 0.98
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/4 thread(s) 8166 ns 8459 ns 0.97
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/8 thread(s) 8520.5 ns 8542 ns 1.00
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/1 thread(s) 6020.5 ns 5834 ns 1.03
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/2 thread(s) 10042 ns 10479.5 ns 0.96
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/4 thread(s) 14396 ns 13062.5 ns 1.10
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/8 thread(s) 9625 ns 10500 ns 0.92
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/1 thread(s) 7333.5 ns 7500 ns 0.98
Dense(16 => 16, gelu)(16 x 128)/forward/GPU/CUDA 24458 ns 25125 ns 0.97
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/2 thread(s) 19792 ns 19916 ns 0.99
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/4 thread(s) 19708 ns 19917 ns 0.99
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/8 thread(s) 20125 ns 20270.5 ns 0.99
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/1 thread(s) 19875 ns 20000 ns 0.99
Dense(16 => 16, gelu)(16 x 128)/zygote/GPU/CUDA 229625 ns 238014.5 ns 0.96
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/2 thread(s) 23562.5 ns 23541 ns 1.00
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/4 thread(s) 23542 ns 23584 ns 1.00
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/8 thread(s) 23791 ns 23917 ns 0.99
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/1 thread(s) 21520.5 ns 21333 ns 1.01
Dense(128 => 128, identity)(128 x 128)/forward/CPU/2 thread(s) 27000 ns 28687.5 ns 0.94
Dense(128 => 128, identity)(128 x 128)/forward/CPU/4 thread(s) 28416.5 ns 28458 ns 1.00
Dense(128 => 128, identity)(128 x 128)/forward/CPU/8 thread(s) 28188 ns 28750 ns 0.98
Dense(128 => 128, identity)(128 x 128)/forward/CPU/1 thread(s) 46083 ns 46041 ns 1.00
Dense(128 => 128, identity)(128 x 128)/forward/GPU/CUDA 25611 ns 26166 ns 0.98
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/2 thread(s) 224666 ns 224416 ns 1.00
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/4 thread(s) 278416 ns 277458 ns 1.00
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/8 thread(s) 3900375.5 ns 3940416 ns 0.99
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/1 thread(s) 145292 ns 145375 ns 1.00
Dense(128 => 128, identity)(128 x 128)/zygote/GPU/CUDA 211892 ns 215900.5 ns 0.98
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/2 thread(s) 243417 ns 241916.5 ns 1.01
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/4 thread(s) 295959 ns 294834 ns 1.00
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/8 thread(s) 4528416.5 ns 4072750 ns 1.11
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/1 thread(s) 145875 ns 145500 ns 1.00
Dense(16 => 16, identity)(16 x 128)/forward/CPU/2 thread(s) 2667 ns 1750 ns 1.52
Dense(16 => 16, identity)(16 x 128)/forward/CPU/4 thread(s) 1791 ns 1709 ns 1.05
Dense(16 => 16, identity)(16 x 128)/forward/CPU/8 thread(s) 2708 ns 2833 ns 0.96
Dense(16 => 16, identity)(16 x 128)/forward/CPU/1 thread(s) 1959 ns 1792 ns 1.09
Dense(16 => 16, identity)(16 x 128)/forward/GPU/CUDA 23071 ns 23320 ns 0.99
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/2 thread(s) 5125 ns 5250 ns 0.98
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/4 thread(s) 5083 ns 5084 ns 1.00
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/8 thread(s) 5333 ns 5375 ns 0.99
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/1 thread(s) 5125 ns 5250 ns 0.98
Dense(16 => 16, identity)(16 x 128)/zygote/GPU/CUDA 266994 ns 273997 ns 0.97
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/2 thread(s) 7500 ns 7500 ns 1
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/4 thread(s) 7500 ns 7458 ns 1.01
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/8 thread(s) 7625 ns 7625 ns 1
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/1 thread(s) 5125 ns 5125 ns 1
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s) 80068250 ns 79922000 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s) 47839854.5 ns 48869292 ns 0.98
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s) 43348791 ns 43653750 ns 0.99
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s) 151521792 ns 151454541 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/GPU/CUDA 2715083 ns 2718779 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s) 665235792 ns 663985416 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s) 410381834 ns 413249125 ns 0.99
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s) 394582542 ns 397260000 ns 0.99
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s) 682653250 ns 684524000 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA 14595495 ns 14579213 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s) 712441042 ns 713434583.5 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s) 680663916 ns 675522709 ns 1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s) 1031283708 ns 997663125 ns 1.03
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s) 997418875 ns 999548041 ns 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Please sign in to comment.