Skip to content

Commit

Permalink
docs: clarify line about "not saving the model" (#965)
Browse files Browse the repository at this point in the history
* Remove line about "not saving the model"

Not sure what this is but it seems counterintuitive.  Feel free to reject or modify.

* Update examples/SimpleRNN/main.jl
  • Loading branch information
asinghvi17 authored Oct 2, 2024
1 parent 4dda683 commit aabeafb
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion examples/SimpleRNN/main.jl
Original file line number Diff line number Diff line change
Expand Up @@ -174,7 +174,7 @@ nothing #hide

# We can save the model using JLD2 (and any other serialization library of your choice)
# Note that we transfer the model to CPU before saving. Additionally, we recommend that
# you don't save the model
# you don't save the model struct and only save the parameters and states.

@save "trained_model.jld2" ps_trained st_trained

Expand Down

1 comment on commit aabeafb

@github-actions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lux Benchmarks

Benchmark suite Current: aabeafb Previous: dcb6c6d Ratio
Dense(512 => 512, identity)(512 x 128)/forward/CPU/2 thread(s) 414750 ns 415083 ns 1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/4 thread(s) 243291 ns 243562.5 ns 1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/8 thread(s) 243812 ns 243917 ns 1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/1 thread(s) 740250 ns 740187.5 ns 1.00
Dense(512 => 512, identity)(512 x 128)/forward/GPU/CUDA 44241.5 ns 43145 ns 1.03
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/2 thread(s) 1300541 ns 1349145.5 ns 0.96
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/4 thread(s) 1208729.5 ns 1217021 ns 0.99
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/8 thread(s) 16389813 ns 16523666 ns 0.99
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/1 thread(s) 2193750.5 ns 2260375 ns 0.97
Dense(512 => 512, identity)(512 x 128)/zygote/GPU/CUDA 204422 ns 198205.5 ns 1.03
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/2 thread(s) 1351250 ns 1319125 ns 1.02
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/4 thread(s) 1304500 ns 1304979 ns 1.00
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/8 thread(s) 16451042 ns 16162208.5 ns 1.02
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/1 thread(s) 2234083 ns 2198917 ns 1.02
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s) 1778229 ns 1670458 ns 1.06
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s) 1089166 ns 1107375 ns 0.98
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s) 1517729.5 ns 1527771 ns 0.99
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s) 2822937.5 ns 3019125 ns 0.94
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/GPU/CUDA 213434.5 ns 211316 ns 1.01
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s) 12152042 ns 12175041 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s) 8825458 ns 8824145.5 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s) 9241187.5 ns 9233625 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s) 18620000 ns 18591583 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/GPU/CUDA 1927990 ns 1930057 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s) 17288833 ns 17307313 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s) 13973166 ns 13969291.5 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s) 14514209 ns 14519583 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s) 21820500.5 ns 21863458 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s) 250505834 ns 250175667 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s) 148891667 ns 148788625 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s) 116479042 ns 116216917 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s) 446831708 ns 446783750 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/GPU/CUDA 5489940.5 ns 5483992 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s) 1222461916 ns 1221582792 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s) 933470000 ns 934823708 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s) 831062083.5 ns 825393979 ns 1.01
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s) 1628130417 ns 1634434500 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA 31351227 ns 31104295 ns 1.01
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s) 1143937584 ns 1147938166 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s) 996799687.5 ns 996908396 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s) 1394632583 ns 1315038312.5 ns 1.06
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s) 1729939125 ns 1733258437.5 ns 1.00
lenet(28, 28, 1, 32)/forward/CPU/2 thread(s) 1124916 ns 1124250 ns 1.00
lenet(28, 28, 1, 32)/forward/CPU/4 thread(s) 1654938 ns 1648541.5 ns 1.00
lenet(28, 28, 1, 32)/forward/CPU/8 thread(s) 3728958.5 ns 3458500 ns 1.08
lenet(28, 28, 1, 32)/forward/CPU/1 thread(s) 783791 ns 790708 ns 0.99
lenet(28, 28, 1, 32)/forward/GPU/CUDA 271605.5 ns 276890 ns 0.98
lenet(28, 28, 1, 32)/zygote/CPU/2 thread(s) 2988375 ns 2989917 ns 1.00
lenet(28, 28, 1, 32)/zygote/CPU/4 thread(s) 4138125 ns 4140375 ns 1.00
lenet(28, 28, 1, 32)/zygote/CPU/8 thread(s) 9730708 ns 10581541.5 ns 0.92
lenet(28, 28, 1, 32)/zygote/CPU/1 thread(s) 3139396 ns 3136958 ns 1.00
lenet(28, 28, 1, 32)/zygote/GPU/CUDA 1090612 ns 1129684 ns 0.97
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s) 2394854 ns 2390166 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s) 1356083 ns 1353000 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s) 1583167 ns 1581708 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s) 4315958 ns 4332708 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/GPU/CUDA 210126 ns 210207 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s) 20322937.5 ns 20303291.5 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s) 16992041 ns 16973958 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s) 18180104.5 ns 18209958 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s) 26773500 ns 26748042 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA 2003457 ns 2004316 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s) 46196000 ns 44366000 ns 1.04
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s) 41023041 ns 40975041.5 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s) 41211229 ns 41237167 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s) 47740229.5 ns 47733416.5 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s) 4671645.5 ns 4673042 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s) 2597375 ns 2607958 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s) 2731125 ns 2740083 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s) 8666750 ns 8646250 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/GPU/CUDA 472288 ns 471597 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s) 40564771 ns 40513208 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s) 33968749.5 ns 33898583 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s) 34077228.5 ns 34004896 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s) 53636000 ns 53682375 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA 3237384.5 ns 3025195 ns 1.07
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s) 114136958 ns 109957125 ns 1.04
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s) 137210958 ns 136423624.5 ns 1.01
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s) 252781791.5 ns 249203917 ns 1.01
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s) 96450834 ns 96417375 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s) 270801125 ns 270485625 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s) 157803395.5 ns 157422417 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s) 125472208.5 ns 125021063 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s) 489251916 ns 489717917 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/GPU/CUDA 7025130.5 ns 6887253.5 ns 1.02
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s) 1502032791.5 ns 1500178749.5 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s) 1212619792 ns 1209776166 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s) 1094094249.5 ns 1101673604 ns 0.99
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s) 2035942062.5 ns 2033012896.5 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA 34741051 ns 34855481.5 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s) 2092986416.5 ns 2031056270.5 ns 1.03
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s) 1853923708 ns 1850536958 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s) 2191693458 ns 2173376541.5 ns 1.01
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s) 2559907833 ns 2563569208 ns 1.00
lenet(28, 28, 1, 128)/forward/CPU/2 thread(s) 2043042 ns 2043208 ns 1.00
lenet(28, 28, 1, 128)/forward/CPU/4 thread(s) 3065417 ns 3056708 ns 1.00
lenet(28, 28, 1, 128)/forward/CPU/8 thread(s) 7640000 ns 8256479.5 ns 0.93
lenet(28, 28, 1, 128)/forward/CPU/1 thread(s) 2471937.5 ns 2476666 ns 1.00
lenet(28, 28, 1, 128)/forward/GPU/CUDA 271311 ns 276146.5 ns 0.98
lenet(28, 28, 1, 128)/zygote/CPU/2 thread(s) 9382958 ns 9654583 ns 0.97
lenet(28, 28, 1, 128)/zygote/CPU/4 thread(s) 11975979 ns 12054625 ns 0.99
lenet(28, 28, 1, 128)/zygote/CPU/8 thread(s) 23565167 ns 24288042 ns 0.97
lenet(28, 28, 1, 128)/zygote/CPU/1 thread(s) 11755708 ns 11746854.5 ns 1.00
lenet(28, 28, 1, 128)/zygote/GPU/CUDA 1167532 ns 1181147.5 ns 0.99
vgg16(32, 32, 3, 32)/forward/CPU/2 thread(s) 380391500 ns 381419291.5 ns 1.00
vgg16(32, 32, 3, 32)/forward/CPU/4 thread(s) 309372541.5 ns 308744166.5 ns 1.00
vgg16(32, 32, 3, 32)/forward/CPU/8 thread(s) 258576062.5 ns 262197666.5 ns 0.99
vgg16(32, 32, 3, 32)/forward/CPU/1 thread(s) 453610708.5 ns 453805292 ns 1.00
vgg16(32, 32, 3, 32)/forward/GPU/CUDA 4830626 ns 4853504 ns 1.00
vgg16(32, 32, 3, 32)/zygote/CPU/2 thread(s) 1155083167 ns 1144266542 ns 1.01
vgg16(32, 32, 3, 32)/zygote/CPU/4 thread(s) 977011375 ns 964566583 ns 1.01
vgg16(32, 32, 3, 32)/zygote/CPU/8 thread(s) 968797667 ns 971379334 ns 1.00
vgg16(32, 32, 3, 32)/zygote/CPU/1 thread(s) 1402145750 ns 1404606542 ns 1.00
vgg16(32, 32, 3, 32)/zygote/GPU/CUDA 16260465 ns 16465783 ns 0.99
lenet(28, 28, 1, 64)/forward/CPU/2 thread(s) 1055895.5 ns 1058521 ns 1.00
lenet(28, 28, 1, 64)/forward/CPU/4 thread(s) 1667979.5 ns 1665374.5 ns 1.00
lenet(28, 28, 1, 64)/forward/CPU/8 thread(s) 6843083 ns 6526666 ns 1.05
lenet(28, 28, 1, 64)/forward/CPU/1 thread(s) 1384584 ns 1370042 ns 1.01
lenet(28, 28, 1, 64)/forward/GPU/CUDA 277409 ns 274033.5 ns 1.01
lenet(28, 28, 1, 64)/zygote/CPU/2 thread(s) 6359084 ns 6516541 ns 0.98
lenet(28, 28, 1, 64)/zygote/CPU/4 thread(s) 13140791 ns 13102708.5 ns 1.00
lenet(28, 28, 1, 64)/zygote/CPU/8 thread(s) 19435708 ns 18363000 ns 1.06
lenet(28, 28, 1, 64)/zygote/CPU/1 thread(s) 6077249.5 ns 6084354.5 ns 1.00
lenet(28, 28, 1, 64)/zygote/GPU/CUDA 1208939 ns 1233343.5 ns 0.98
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s) 70453792 ns 70574042 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s) 43702291.5 ns 43797125 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s) 39707333.5 ns 39782958.5 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s) 132609749.5 ns 132781271 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/GPU/CUDA 1874873 ns 1956000 ns 0.96
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s) 355396771 ns 355154708.5 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s) 270399625 ns 270770334 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s) 253531375 ns 254052708 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s) 534771792 ns 534690875 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/GPU/CUDA 13205988 ns 13245522.5 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s) 395708167 ns 396827750 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s) 370167417 ns 372318834 ns 0.99
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s) 693674958.5 ns 671683959 ns 1.03
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s) 713328458 ns 713207834 ns 1.00
vgg16(32, 32, 3, 128)/forward/CPU/2 thread(s) 1188384834 ns 1189840458 ns 1.00
vgg16(32, 32, 3, 128)/forward/CPU/4 thread(s) 834836604 ns 834600270.5 ns 1.00
vgg16(32, 32, 3, 128)/forward/CPU/8 thread(s) 637057646.5 ns 643996000 ns 0.99
vgg16(32, 32, 3, 128)/forward/CPU/1 thread(s) 1772765937.5 ns 1771218270.5 ns 1.00
vgg16(32, 32, 3, 128)/forward/GPU/CUDA 12404766 ns 12386792 ns 1.00
vgg16(32, 32, 3, 128)/zygote/CPU/2 thread(s) 3636048750.5 ns 3632394041.5 ns 1.00
vgg16(32, 32, 3, 128)/zygote/CPU/4 thread(s) 2830058583 ns 2819490917 ns 1.00
vgg16(32, 32, 3, 128)/zygote/CPU/8 thread(s) 2710979542 ns 2703852750 ns 1.00
vgg16(32, 32, 3, 128)/zygote/CPU/1 thread(s) 5040694000 ns 5046837084 ns 1.00
vgg16(32, 32, 3, 128)/zygote/GPU/CUDA 49203468 ns 49275819 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s) 3430792 ns 3417875 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s) 2072500 ns 2080042 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s) 2523083 ns 2540459 ns 0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s) 6041625 ns 6037792 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/GPU/CUDA 576145 ns 571807.5 ns 1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s) 25980041 ns 25947667 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s) 18928584 ns 18971396.5 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s) 19397250 ns 19516791.5 ns 0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s) 39276562.5 ns 39348958.5 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA 3194284 ns 3001343 ns 1.06
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s) 55601354.5 ns 55429166.5 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s) 82732437.5 ns 81557583 ns 1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s) 172634208 ns 172942167 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s) 45521875 ns 45661541.5 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s) 1786354 ns 1786354.5 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s) 1103937.5 ns 1106458 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s) 1593375 ns 1570978.5 ns 1.01
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s) 3039125 ns 3033375 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/GPU/CUDA 217854.5 ns 214775.5 ns 1.01
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s) 12554187.5 ns 12557750 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s) 9216896 ns 9236583.5 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s) 9695292 ns 9630708 ns 1.01
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s) 19000833.5 ns 19044937.5 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA 1987678 ns 1985531 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s) 17658520.5 ns 17664084 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s) 14335542 ns 14332709 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s) 14581459 ns 14595146 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s) 22170084 ns 22201042 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s) 70566666 ns 70526583 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s) 43756000 ns 43708417 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s) 39751249.5 ns 39735812.5 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s) 132626292 ns 132615771 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/GPU/CUDA 1957247.5 ns 1938634 ns 1.01
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s) 360065750 ns 360222063 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s) 348987521 ns 348659791.5 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s) 305633709 ns 302374833.5 ns 1.01
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s) 727326125 ns 727881666 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA 14293471 ns 14325162 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s) 419491583.5 ns 419531958.5 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s) 436122125 ns 434088375 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s) 697189791.5 ns 691688416.5 ns 1.01
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s) 717699792 ns 717541625 ns 1.00
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/2 thread(s) 1670125 ns 1673625 ns 1.00
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/4 thread(s) 1362354 ns 1384958 ns 0.98
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/8 thread(s) 1381854 ns 1378083 ns 1.00
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/1 thread(s) 2542833 ns 2664374.5 ns 0.95
mlp7layer_bn(gelu)(32 x 256)/forward/GPU/CUDA 588475 ns 568730 ns 1.03
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/2 thread(s) 9263749.5 ns 9240188 ns 1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/4 thread(s) 15705708 ns 14792541.5 ns 1.06
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/8 thread(s) 32641083.5 ns 32052875 ns 1.02
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/1 thread(s) 10211375 ns 10208834 ns 1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/GPU/CUDA 1444503 ns 1422888 ns 1.02
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/2 thread(s) 17200333 ns 22285625 ns 0.77
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/4 thread(s) 28628708 ns 28463000 ns 1.01
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/8 thread(s) 55361125.5 ns 56517729 ns 0.98
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/1 thread(s) 18888770.5 ns 18854687.5 ns 1.00
Dense(512 => 512, relu)(512 x 128)/forward/CPU/2 thread(s) 694124.5 ns 699792 ns 0.99
Dense(512 => 512, relu)(512 x 128)/forward/CPU/4 thread(s) 597229 ns 644209 ns 0.93
Dense(512 => 512, relu)(512 x 128)/forward/CPU/8 thread(s) 1060208 ns 1065062.5 ns 1.00
Dense(512 => 512, relu)(512 x 128)/forward/CPU/1 thread(s) 728834 ns 728292 ns 1.00
Dense(512 => 512, relu)(512 x 128)/forward/GPU/CUDA 47763 ns 47086.5 ns 1.01
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/2 thread(s) 1471625 ns 1513416 ns 0.97
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/4 thread(s) 1008083.5 ns 1010604 ns 1.00
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/8 thread(s) 1445375 ns 1606083 ns 0.90
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/1 thread(s) 2292479 ns 2291666 ns 1.00
Dense(512 => 512, relu)(512 x 128)/zygote/GPU/CUDA 233511 ns 226725.5 ns 1.03
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/2 thread(s) 1548708 ns 1516750.5 ns 1.02
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/4 thread(s) 1059208 ns 1076208 ns 0.98
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/8 thread(s) 1684000 ns 1449125 ns 1.16
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/1 thread(s) 2239812.5 ns 2256125 ns 0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s) 3399917 ns 3415417 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s) 2063416.5 ns 2053167 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s) 2497542 ns 2513229.5 ns 0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s) 6012541 ns 6017583.5 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/GPU/CUDA 573050 ns 568598 ns 1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s) 24040667 ns 24077208 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s) 17197333 ns 17182291.5 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s) 17115834 ns 17150417 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s) 37555250 ns 37549833 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/GPU/CUDA 3173627 ns 2938820 ns 1.08
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s) 53697479 ns 53630958.5 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s) 82955166.5 ns 81466625 ns 1.02
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s) 172570500 ns 169486084 ns 1.02
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s) 44541583 ns 44624500 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s) 250473458 ns 250522209 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s) 148642041 ns 148626708 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s) 116173562.5 ns 116110708.5 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s) 448035771 ns 447858917 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/GPU/CUDA 5470189 ns 5427690.5 ns 1.01
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s) 1105702334 ns 1104123000 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s) 856343812.5 ns 859505875 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s) 831890979.5 ns 829538646 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s) 1754046583 ns 1754815708 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/GPU/CUDA 28859350.5 ns 28735758 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s) 1019485791.5 ns 1018403979.5 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s) 978846583 ns 983568208 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s) 1297118667 ns 1335719333 ns 0.97
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s) 1724027750 ns 1728379395.5 ns 1.00
mlp7layer_bn(relu)(32 x 256)/forward/CPU/2 thread(s) 1083542 ns 1082292 ns 1.00
mlp7layer_bn(relu)(32 x 256)/forward/CPU/4 thread(s) 795958 ns 764959 ns 1.04
mlp7layer_bn(relu)(32 x 256)/forward/CPU/8 thread(s) 685709 ns 682709 ns 1.00
mlp7layer_bn(relu)(32 x 256)/forward/CPU/1 thread(s) 2058417 ns 2044125 ns 1.01
mlp7layer_bn(relu)(32 x 256)/forward/GPU/CUDA 568330.5 ns 554259.5 ns 1.03
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/2 thread(s) 5923958 ns 5934375 ns 1.00
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/4 thread(s) 9029417 ns 9162896 ns 0.99
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/8 thread(s) 26532333 ns 26061854.5 ns 1.02
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/1 thread(s) 6412458 ns 7111479 ns 0.90
mlp7layer_bn(relu)(32 x 256)/zygote/GPU/CUDA 1397840 ns 1357512.5 ns 1.03
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/2 thread(s) 9312833.5 ns 9683542 ns 0.96
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/4 thread(s) 16060416.5 ns 16162959 ns 0.99
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/8 thread(s) 33894583 ns 33355375 ns 1.02
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/1 thread(s) 7611459 ns 7620375 ns 1.00
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/2 thread(s) 386041 ns 388541 ns 0.99
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/4 thread(s) 458083.5 ns 518208.5 ns 0.88
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/8 thread(s) 3025042 ns 3052583 ns 0.99
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/1 thread(s) 89917 ns 89500 ns 1.00
Dense(128 => 128, gelu)(128 x 128)/forward/GPU/CUDA 28382 ns 27832 ns 1.02
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/2 thread(s) 405854 ns 404666 ns 1.00
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/4 thread(s) 456708 ns 454791 ns 1.00
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/8 thread(s) 4404625 ns 4601375 ns 0.96
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/1 thread(s) 273541 ns 280000 ns 0.98
Dense(128 => 128, gelu)(128 x 128)/zygote/GPU/CUDA 217486 ns 213087 ns 1.02
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/2 thread(s) 713041 ns 677583 ns 1.05
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/4 thread(s) 728791 ns 726708.5 ns 1.00
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/8 thread(s) 5034583 ns 4653542 ns 1.08
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/1 thread(s) 510750 ns 522959 ns 0.98
Dense(128 => 128, relu)(128 x 128)/forward/CPU/2 thread(s) 330208.5 ns 334437.5 ns 0.99
Dense(128 => 128, relu)(128 x 128)/forward/CPU/4 thread(s) 387375 ns 451521 ns 0.86
Dense(128 => 128, relu)(128 x 128)/forward/CPU/8 thread(s) 759854 ns 774437.5 ns 0.98
Dense(128 => 128, relu)(128 x 128)/forward/CPU/1 thread(s) 54333.5 ns 52833 ns 1.03
Dense(128 => 128, relu)(128 x 128)/forward/GPU/CUDA 28377 ns 28056 ns 1.01
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/2 thread(s) 354479.5 ns 352584 ns 1.01
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/4 thread(s) 337167 ns 333875 ns 1.01
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/8 thread(s) 889417 ns 902834 ns 0.99
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/1 thread(s) 151333 ns 151959 ns 1.00
Dense(128 => 128, relu)(128 x 128)/zygote/GPU/CUDA 204706.5 ns 199603.5 ns 1.03
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/2 thread(s) 368875 ns 367333 ns 1.00
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/4 thread(s) 351334 ns 348125 ns 1.01
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/8 thread(s) 517084 ns 945562.5 ns 0.55
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/1 thread(s) 151020.5 ns 151375 ns 1.00
vgg16(32, 32, 3, 64)/forward/CPU/2 thread(s) 601511250 ns 601502916 ns 1.00
vgg16(32, 32, 3, 64)/forward/CPU/4 thread(s) 431188729 ns 430191604 ns 1.00
vgg16(32, 32, 3, 64)/forward/CPU/8 thread(s) 393417333.5 ns 390437000 ns 1.01
vgg16(32, 32, 3, 64)/forward/CPU/1 thread(s) 872312292 ns 871755417 ns 1.00
vgg16(32, 32, 3, 64)/forward/GPU/CUDA 7628702 ns 7623148 ns 1.00
vgg16(32, 32, 3, 64)/zygote/CPU/2 thread(s) 1997345666 ns 1994407979.5 ns 1.00
vgg16(32, 32, 3, 64)/zygote/CPU/4 thread(s) 1639731729 ns 1636880541.5 ns 1.00
vgg16(32, 32, 3, 64)/zygote/CPU/8 thread(s) 1653667375 ns 1572982645.5 ns 1.05
vgg16(32, 32, 3, 64)/zygote/CPU/1 thread(s) 2659433292 ns 2658913333 ns 1.00
vgg16(32, 32, 3, 64)/zygote/GPU/CUDA 26464110 ns 26625956 ns 0.99
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/2 thread(s) 527583 ns 525833 ns 1.00
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/4 thread(s) 403020.5 ns 401229.5 ns 1.00
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/8 thread(s) 2740959 ns 2770750 ns 0.99
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/1 thread(s) 874042 ns 872645.5 ns 1.00
Dense(512 => 512, gelu)(512 x 128)/forward/GPU/CUDA 47791.5 ns 46979 ns 1.02
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/2 thread(s) 1937458 ns 1876563 ns 1.03
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/4 thread(s) 1800375 ns 1830166.5 ns 0.98
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/8 thread(s) 16241167 ns 16303459 ns 1.00
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/1 thread(s) 2831458 ns 2794834 ns 1.01
Dense(512 => 512, gelu)(512 x 128)/zygote/GPU/CUDA 247132 ns 240187.5 ns 1.03
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/2 thread(s) 3023000 ns 2919520.5 ns 1.04
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/4 thread(s) 5005208 ns 5015167 ns 1.00
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/8 thread(s) 16718208 ns 16524271 ns 1.01
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/1 thread(s) 3710916.5 ns 3743292 ns 0.99
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/2 thread(s) 1477209 ns 1368417 ns 1.08
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/4 thread(s) 948500 ns 979958 ns 0.97
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/8 thread(s) 932209 ns 930917 ns 1.00
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/1 thread(s) 2349125 ns 2342208.5 ns 1.00
mlp7layer_bn(tanh)(32 x 256)/forward/GPU/CUDA 589164 ns 565552 ns 1.04
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/2 thread(s) 5873625 ns 5910334 ns 0.99
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/4 thread(s) 8384708 ns 8430229 ns 0.99
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/8 thread(s) 25818812.5 ns 25837625 ns 1.00
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/1 thread(s) 7145417 ns 7325812 ns 0.98
mlp7layer_bn(tanh)(32 x 256)/zygote/GPU/CUDA 1381067 ns 1327441 ns 1.04
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/2 thread(s) 11656666 ns 11696354 ns 1.00
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/4 thread(s) 17362667 ns 18020208.5 ns 0.96
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/8 thread(s) 36263250 ns 39373729 ns 0.92
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/1 thread(s) 9526166.5 ns 9553833 ns 1.00
Dense(16 => 16, relu)(16 x 128)/forward/CPU/2 thread(s) 2916 ns 2459 ns 1.19
Dense(16 => 16, relu)(16 x 128)/forward/CPU/4 thread(s) 2708 ns 2416 ns 1.12
Dense(16 => 16, relu)(16 x 128)/forward/CPU/8 thread(s) 3500 ns 2792 ns 1.25
Dense(16 => 16, relu)(16 x 128)/forward/CPU/1 thread(s) 2667 ns 4583 ns 0.58
Dense(16 => 16, relu)(16 x 128)/forward/GPU/CUDA 25448 ns 24428 ns 1.04
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/2 thread(s) 7250 ns 7291 ns 0.99
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/4 thread(s) 7250 ns 6958 ns 1.04
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/8 thread(s) 7334 ns 7333 ns 1.00
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/1 thread(s) 7167 ns 6750 ns 1.06
Dense(16 => 16, relu)(16 x 128)/zygote/GPU/CUDA 210626 ns 200289 ns 1.05
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/2 thread(s) 8375 ns 8416 ns 1.00
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/4 thread(s) 8334 ns 8333 ns 1.00
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/8 thread(s) 8458 ns 8250 ns 1.03
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/1 thread(s) 6167 ns 5625 ns 1.10
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/2 thread(s) 10250 ns 10459 ns 0.98
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/4 thread(s) 12459 ns 12958 ns 0.96
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/8 thread(s) 10875 ns 11333.5 ns 0.96
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/1 thread(s) 7959 ns 7791 ns 1.02
Dense(16 => 16, gelu)(16 x 128)/forward/GPU/CUDA 25816 ns 24856 ns 1.04
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/2 thread(s) 21542 ns 21709 ns 0.99
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/4 thread(s) 21708 ns 21459 ns 1.01
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/8 thread(s) 21709 ns 21792 ns 1.00
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/1 thread(s) 21417 ns 21167 ns 1.01
Dense(16 => 16, gelu)(16 x 128)/zygote/GPU/CUDA 227232.5 ns 220349.5 ns 1.03
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/2 thread(s) 57500 ns 53584 ns 1.07
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/4 thread(s) 53667 ns 53583 ns 1.00
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/8 thread(s) 53500 ns 53770.5 ns 0.99
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/1 thread(s) 51375 ns 51125 ns 1.00
Dense(128 => 128, identity)(128 x 128)/forward/CPU/2 thread(s) 28834 ns 28750 ns 1.00
Dense(128 => 128, identity)(128 x 128)/forward/CPU/4 thread(s) 29020.5 ns 28916 ns 1.00
Dense(128 => 128, identity)(128 x 128)/forward/CPU/8 thread(s) 28667 ns 28875 ns 0.99
Dense(128 => 128, identity)(128 x 128)/forward/CPU/1 thread(s) 46333 ns 45875 ns 1.01
Dense(128 => 128, identity)(128 x 128)/forward/GPU/CUDA 27068 ns 26054 ns 1.04
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/2 thread(s) 225125 ns 228541 ns 0.99
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/4 thread(s) 275292 ns 275333 ns 1.00
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/8 thread(s) 3755229.5 ns 4217667 ns 0.89
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/1 thread(s) 145084 ns 145250 ns 1.00
Dense(128 => 128, identity)(128 x 128)/zygote/GPU/CUDA 206498 ns 199681 ns 1.03
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/2 thread(s) 241125 ns 246459 ns 0.98
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/4 thread(s) 293875 ns 293145.5 ns 1.00
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/8 thread(s) 4193250 ns 4145854 ns 1.01
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/1 thread(s) 145771 ns 145542 ns 1.00
Dense(16 => 16, identity)(16 x 128)/forward/CPU/2 thread(s) 1750 ns 1959 ns 0.89
Dense(16 => 16, identity)(16 x 128)/forward/CPU/4 thread(s) 1833 ns 2000 ns 0.92
Dense(16 => 16, identity)(16 x 128)/forward/CPU/8 thread(s) 2500 ns 2000 ns 1.25
Dense(16 => 16, identity)(16 x 128)/forward/CPU/1 thread(s) 2000 ns 1708 ns 1.17
Dense(16 => 16, identity)(16 x 128)/forward/GPU/CUDA 24021 ns 22940 ns 1.05
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/2 thread(s) 5250 ns 5334 ns 0.98
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/4 thread(s) 5250 ns 5125 ns 1.02
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/8 thread(s) 5375 ns 5166 ns 1.04
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/1 thread(s) 5292 ns 4792 ns 1.10
Dense(16 => 16, identity)(16 x 128)/zygote/GPU/CUDA 243494.5 ns 232790 ns 1.05
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/2 thread(s) 7416 ns 7417 ns 1.00
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/4 thread(s) 7417 ns 7375 ns 1.01
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/8 thread(s) 7500 ns 7459 ns 1.01
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/1 thread(s) 5167 ns 5250 ns 0.98
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s) 81062125 ns 81082749.5 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s) 48607333 ns 48527208 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s) 43732646 ns 43737084 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s) 153570458 ns 153734041 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/GPU/CUDA 2719369 ns 2717702 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s) 619835291 ns 621583083 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s) 427440708 ns 427560417 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s) 410233145.5 ns 412343333.5 ns 0.99
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s) 698627167 ns 697842291 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA 15605483 ns 15532428 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s) 906046958 ns 851105979 ns 1.06
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s) 847422938 ns 840062312.5 ns 1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s) 1164315146 ns 1156974917 ns 1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s) 1175767687.5 ns 1177103062.5 ns 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Please sign in to comment.