Skip to content

Commit

Permalink
ci(buildkite): fix cond
Browse files Browse the repository at this point in the history
  • Loading branch information
avik-pal committed Sep 7, 2024
1 parent 86246b3 commit abc7057
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion .buildkite/pipeline.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
steps:
- label: "Triggering Pipelines (Pull Request)"
if: build.branch != "main" || build.tag == null
if: build.branch != "main" && build.tag == null
agents:
queue: "juliagpu"
plugins:
Expand Down

3 comments on commit abc7057

@github-actions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lux Benchmarks

Benchmark suite Current: abc7057 Previous: 59f83fc Ratio
Dense(512 => 512, identity)(512 x 128)/forward/CPU/2 thread(s) 412833 ns 412520.5 ns 1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/4 thread(s) 324917 ns 323042 ns 1.01
Dense(512 => 512, identity)(512 x 128)/forward/CPU/8 thread(s) 322791 ns 323583 ns 1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/1 thread(s) 741270.5 ns 752166.5 ns 0.99
Dense(512 => 512, identity)(512 x 128)/forward/GPU/CUDA 44918 ns 44168 ns 1.02
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/2 thread(s) 1358250 ns 1384083 ns 0.98
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/4 thread(s) 2444062.5 ns 2451854 ns 1.00
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/8 thread(s) 14162791 ns 14238812.5 ns 0.99
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/1 thread(s) 2277500 ns 2239125 ns 1.02
Dense(512 => 512, identity)(512 x 128)/zygote/GPU/CUDA 212604 ns 210250 ns 1.01
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/2 thread(s) 1450562.5 ns 1411875 ns 1.03
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/4 thread(s) 960958.5 ns 897520.5 ns 1.07
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/8 thread(s) 1778125 ns 1516292 ns 1.17
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/1 thread(s) 2274000 ns 2210229 ns 1.03
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s) 1767833.5 ns 1725583 ns 1.02
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s) 1083978.5 ns 1017708.5 ns 1.07
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s) 1529021 ns 1538333 ns 0.99
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s) 2954750 ns 3006583 ns 0.98
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/GPU/CUDA 209644 ns 210559 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s) 12148854.5 ns 12112667 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s) 8834958.5 ns 8809666.5 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s) 9230875 ns 9192709 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s) 18631937.5 ns 18570834 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/GPU/CUDA 1509941 ns 1504910 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s) 17314333 ns 17273542 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s) 13961542 ns 13992292 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s) 14514291 ns 14538625 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s) 21865437.5 ns 21824875 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s) 249016958.5 ns 249443729 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s) 148521291 ns 148456250 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s) 116073791 ns 115795563 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s) 447568292 ns 454024458 ns 0.99
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/GPU/CUDA 5499808 ns 5474002 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s) 1227795916 ns 1144391209 ns 1.07
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s) 931180042 ns 981113333 ns 0.95
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s) 831332521 ns 853440021 ns 0.97
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s) 1629694167 ns 1805007208 ns 0.90
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA 31376705.5 ns 31357343 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s) 1167771625 ns 1034466750 ns 1.13
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s) 1003953563 ns 1009660729.5 ns 0.99
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s) 1322017146 ns 1324456604 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s) 1730835103.5 ns 1728354792 ns 1.00
lenet(28, 28, 1, 32)/forward/CPU/2 thread(s) 1100791 ns 1093583 ns 1.01
lenet(28, 28, 1, 32)/forward/CPU/4 thread(s) 1624625 ns 1583083 ns 1.03
lenet(28, 28, 1, 32)/forward/CPU/8 thread(s) 3431229 ns 3678000 ns 0.93
lenet(28, 28, 1, 32)/forward/CPU/1 thread(s) 781521 ns 779625 ns 1.00
lenet(28, 28, 1, 32)/forward/GPU/CUDA 272287.5 ns 273068.5 ns 1.00
lenet(28, 28, 1, 32)/zygote/CPU/2 thread(s) 3015146 ns 2985458.5 ns 1.01
lenet(28, 28, 1, 32)/zygote/CPU/4 thread(s) 4087333.5 ns 4106125 ns 1.00
lenet(28, 28, 1, 32)/zygote/CPU/8 thread(s) 10933000 ns 10555937 ns 1.04
lenet(28, 28, 1, 32)/zygote/CPU/1 thread(s) 3238167 ns 3131667 ns 1.03
lenet(28, 28, 1, 32)/zygote/GPU/CUDA 1132885 ns 1134574.5 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s) 2306750 ns 2275083 ns 1.01
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s) 1433208.5 ns 1429583 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s) 1678625.5 ns 1656125 ns 1.01
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s) 4201375 ns 4200438 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/GPU/CUDA 209995 ns 210634 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s) 19417729 ns 19375958 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s) 16114625 ns 16086292 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s) 17220375 ns 17180583 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s) 25992250 ns 25782875 ns 1.01
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA 1600144 ns 1606705 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s) 34149500 ns 34182625 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s) 30894937.5 ns 30811875 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s) 31140666 ns 31108104 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s) 36754250 ns 36403791 ns 1.01
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s) 4526959 ns 4540667 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s) 2746459 ns 2769500.5 ns 0.99
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s) 2911584 ns 2921250 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s) 8399583 ns 8391917 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/GPU/CUDA 373956 ns 423308 ns 0.88
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s) 38745459 ns 39022250 ns 0.99
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s) 32111709 ns 32067021 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s) 32268625 ns 32250916.5 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s) 52066792 ns 51820375 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA 2635152.5 ns 2657162.5 ns 0.99
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s) 88780729 ns 88606874.5 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s) 84997250 ns 113796125 ns 0.75
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s) 218329542 ns 223648041 ns 0.98
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s) 74358917 ns 74335583.5 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s) 267246875 ns 267029417 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s) 158965875 ns 158942229.5 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s) 126688521 ns 126886229 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s) 485596792 ns 487631541 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/GPU/CUDA 7022210 ns 6889435 ns 1.02
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s) 1468898146 ns 1474300812.5 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s) 1171204459 ns 1174433750 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s) 1068921333.5 ns 1063095500 ns 1.01
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s) 2001229479 ns 2007751479 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA 34725068.5 ns 34685949 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s) 1692415625 ns 1689349708 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s) 1500720958.5 ns 1535787500 ns 0.98
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s) 1766379833 ns 1814518792 ns 0.97
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s) 2224153125 ns 2211056708.5 ns 1.01
lenet(28, 28, 1, 128)/forward/CPU/2 thread(s) 1760875 ns 2089187.5 ns 0.84
lenet(28, 28, 1, 128)/forward/CPU/4 thread(s) 2595167 ns 2976458 ns 0.87
lenet(28, 28, 1, 128)/forward/CPU/8 thread(s) 7433916.5 ns 7304583 ns 1.02
lenet(28, 28, 1, 128)/forward/CPU/1 thread(s) 2426041.5 ns 2476917 ns 0.98
lenet(28, 28, 1, 128)/forward/GPU/CUDA 273792 ns 272072.5 ns 1.01
lenet(28, 28, 1, 128)/zygote/CPU/2 thread(s) 9254417 ns 9643854 ns 0.96
lenet(28, 28, 1, 128)/zygote/CPU/4 thread(s) 11474333 ns 12014792 ns 0.96
lenet(28, 28, 1, 128)/zygote/CPU/8 thread(s) 25126166 ns 25647896 ns 0.98
lenet(28, 28, 1, 128)/zygote/CPU/1 thread(s) 11780750 ns 11736104 ns 1.00
lenet(28, 28, 1, 128)/zygote/GPU/CUDA 1194908 ns 1173736.5 ns 1.02
vgg16(32, 32, 3, 32)/forward/CPU/2 thread(s) 381207125 ns 380778209 ns 1.00
vgg16(32, 32, 3, 32)/forward/CPU/4 thread(s) 285815709 ns 282717792 ns 1.01
vgg16(32, 32, 3, 32)/forward/CPU/8 thread(s) 233745708 ns 238251708.5 ns 0.98
vgg16(32, 32, 3, 32)/forward/CPU/1 thread(s) 453344667 ns 453270208 ns 1.00
vgg16(32, 32, 3, 32)/forward/GPU/CUDA 4852271 ns 4856475 ns 1.00
vgg16(32, 32, 3, 32)/zygote/CPU/2 thread(s) 1157427583 ns 1156978917 ns 1.00
vgg16(32, 32, 3, 32)/zygote/CPU/4 thread(s) 931406250 ns 919622250 ns 1.01
vgg16(32, 32, 3, 32)/zygote/CPU/8 thread(s) 929761209 ns 945107000 ns 0.98
vgg16(32, 32, 3, 32)/zygote/CPU/1 thread(s) 1403593291 ns 1428489000 ns 0.98
vgg16(32, 32, 3, 32)/zygote/GPU/CUDA 19807136 ns 17978082 ns 1.10
lenet(28, 28, 1, 64)/forward/CPU/2 thread(s) 1051042 ns 1021959 ns 1.03
lenet(28, 28, 1, 64)/forward/CPU/4 thread(s) 1930834 ns 2001250 ns 0.96
lenet(28, 28, 1, 64)/forward/CPU/8 thread(s) 4821271 ns 6008000 ns 0.80
lenet(28, 28, 1, 64)/forward/CPU/1 thread(s) 1297541 ns 1374000 ns 0.94
lenet(28, 28, 1, 64)/forward/GPU/CUDA 269906 ns 268964 ns 1.00
lenet(28, 28, 1, 64)/zygote/CPU/2 thread(s) 6495729 ns 6414395.5 ns 1.01
lenet(28, 28, 1, 64)/zygote/CPU/4 thread(s) 12306583.5 ns 12403896 ns 0.99
lenet(28, 28, 1, 64)/zygote/CPU/8 thread(s) 18165416.5 ns 20716333 ns 0.88
lenet(28, 28, 1, 64)/zygote/CPU/1 thread(s) 6025750 ns 6079792 ns 0.99
lenet(28, 28, 1, 64)/zygote/GPU/CUDA 1207681.5 ns 1209955 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s) 70586437.5 ns 70501749.5 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s) 43556333.5 ns 43580771 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s) 39526083 ns 39491375 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s) 132710667 ns 132802458.5 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/GPU/CUDA 1944845 ns 1859689 ns 1.05
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s) 356816354 ns 384818104 ns 0.93
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s) 270253083 ns 295632667 ns 0.91
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s) 254146791.5 ns 281694167 ns 0.90
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s) 534914958.5 ns 534727063 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/GPU/CUDA 12308008 ns 12284399.5 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s) 396010084 ns 396068167 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s) 407805500 ns 409321729.5 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s) 706921292 ns 678917958 ns 1.04
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s) 711811750 ns 711312959 ns 1.00
vgg16(32, 32, 3, 128)/forward/CPU/2 thread(s) 1187507791 ns 1190798042 ns 1.00
vgg16(32, 32, 3, 128)/forward/CPU/4 thread(s) 764568937.5 ns 688321229 ns 1.11
vgg16(32, 32, 3, 128)/forward/CPU/8 thread(s) 631341166 ns 630150084 ns 1.00
vgg16(32, 32, 3, 128)/forward/CPU/1 thread(s) 1772828250 ns 1776546083 ns 1.00
vgg16(32, 32, 3, 128)/forward/GPU/CUDA 12544942.5 ns 12315985 ns 1.02
vgg16(32, 32, 3, 128)/zygote/CPU/2 thread(s) 3767262229 ns 3607588771 ns 1.04
vgg16(32, 32, 3, 128)/zygote/CPU/4 thread(s) 2869944333 ns 2756374750 ns 1.04
vgg16(32, 32, 3, 128)/zygote/CPU/8 thread(s) 2705287250 ns 2714951667 ns 1.00
vgg16(32, 32, 3, 128)/zygote/CPU/1 thread(s) 5058993459 ns 4951023834 ns 1.02
vgg16(32, 32, 3, 128)/zygote/GPU/CUDA 49891272 ns 49373771 ns 1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s) 3429042 ns 3429083.5 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s) 2081583 ns 2066792 ns 1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s) 2543583 ns 2527666 ns 1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s) 6024375 ns 6016750 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/GPU/CUDA 338827 ns 311191 ns 1.09
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s) 26104562.5 ns 25518541 ns 1.02
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s) 19078958.5 ns 18527417 ns 1.03
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s) 19625020.5 ns 18707833 ns 1.05
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s) 39317959 ns 38890083 ns 1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA 2462668 ns 2479107 ns 0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s) 54777416 ns 54171458 ns 1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s) 80697167 ns 78979625 ns 1.02
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s) 170440292 ns 171331479 ns 0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s) 45420250 ns 45540167 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s) 1787458 ns 1785458 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s) 1101875 ns 1046062.5 ns 1.05
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s) 1569708 ns 1583208.5 ns 0.99
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s) 3035500 ns 3024416.5 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/GPU/CUDA 215425 ns 213982 ns 1.01
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s) 12537208 ns 12521375 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s) 9283500 ns 9184167 ns 1.01
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s) 9641937.5 ns 9599958.5 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s) 18984166.5 ns 18940458 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA 1531405 ns 1538264 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s) 17668583 ns 17640750 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s) 14332291.5 ns 14307771 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s) 14569250 ns 14507583 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s) 22181083.5 ns 22177500 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s) 70579000.5 ns 70512937 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s) 43509167 ns 43444479.5 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s) 39545292 ns 39626750 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s) 132823604.5 ns 132598874.5 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/GPU/CUDA 1947535 ns 1950639 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s) 361581166 ns 359565417 ns 1.01
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s) 345861541.5 ns 293550333 ns 1.18
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s) 303584333 ns 287837104.5 ns 1.05
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s) 724116959 ns 622550708.5 ns 1.16
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA 13351785.5 ns 13384881.5 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s) 419705187.5 ns 419108729 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s) 420514459 ns 424758959 ns 0.99
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s) 697427687 ns 717519375 ns 0.97
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s) 717027625 ns 716499833 ns 1.00
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/2 thread(s) 1700896 ns 1521229 ns 1.12
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/4 thread(s) 1344562.5 ns 1235833 ns 1.09
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/8 thread(s) 1353750 ns 1246625 ns 1.09
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/1 thread(s) 2400417 ns 2300875 ns 1.04
mlp7layer_bn(gelu)(32 x 256)/forward/GPU/CUDA 590707 ns 587061.5 ns 1.01
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/2 thread(s) 8924250 ns 8812333 ns 1.01
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/4 thread(s) 12992208 ns 12926416 ns 1.01
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/8 thread(s) 30772062.5 ns 30195584 ns 1.02
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/1 thread(s) 9884229.5 ns 9787000 ns 1.01
mlp7layer_bn(gelu)(32 x 256)/zygote/GPU/CUDA 1479651 ns 1419851.5 ns 1.04
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/2 thread(s) 17441145.5 ns 18056125 ns 0.97
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/4 thread(s) 16807333 ns 16803125 ns 1.00
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/8 thread(s) 30461791.5 ns 29287584 ns 1.04
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/1 thread(s) 14317375 ns 14378083 ns 1.00
Dense(512 => 512, relu)(512 x 128)/forward/CPU/2 thread(s) 789375 ns 805145.5 ns 0.98
Dense(512 => 512, relu)(512 x 128)/forward/CPU/4 thread(s) 595083.5 ns 589041.5 ns 1.01
Dense(512 => 512, relu)(512 x 128)/forward/CPU/8 thread(s) 1038125 ns 1034812.5 ns 1.00
Dense(512 => 512, relu)(512 x 128)/forward/CPU/1 thread(s) 725167 ns 726750 ns 1.00
Dense(512 => 512, relu)(512 x 128)/forward/GPU/CUDA 48555.5 ns 47938.5 ns 1.01
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/2 thread(s) 1507084 ns 1542875 ns 0.98
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/4 thread(s) 1043292 ns 1000270.5 ns 1.04
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/8 thread(s) 1413583 ns 1504041 ns 0.94
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/1 thread(s) 2256583 ns 2294104 ns 0.98
Dense(512 => 512, relu)(512 x 128)/zygote/GPU/CUDA 241345.5 ns 236494.5 ns 1.02
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/2 thread(s) 1541063 ns 1722687.5 ns 0.89
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/4 thread(s) 1073583.5 ns 1250438 ns 0.86
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/8 thread(s) 1495667 ns 1858854.5 ns 0.80
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/1 thread(s) 2216500 ns 2311917 ns 0.96
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s) 3407458.5 ns 3404416 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s) 2060208 ns 2046208 ns 1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s) 2504792 ns 2516916.5 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s) 6019500 ns 6013625 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/GPU/CUDA 283414 ns 285181.5 ns 0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s) 24068584 ns 24021312.5 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s) 17256458.5 ns 17217833 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s) 17166250 ns 17101666.5 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s) 37584937.5 ns 37551396 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/GPU/CUDA 2397302 ns 2407620 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s) 52933521 ns 52545812.5 ns 1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s) 83805875 ns 80522312.5 ns 1.04
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s) 168151312.5 ns 166982250.5 ns 1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s) 44568645.5 ns 44529604 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s) 250376958 ns 250184208.5 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s) 148122999.5 ns 147977833 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s) 115699917 ns 115557083.5 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s) 448012646 ns 447150583.5 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/GPU/CUDA 5442645 ns 5457630 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s) 1105356584 ns 1128644583 ns 0.98
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s) 854303812.5 ns 881731833.5 ns 0.97
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s) 826724000 ns 805115667 ns 1.03
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s) 1752988167 ns 1757118042 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/GPU/CUDA 28762466 ns 28927493 ns 0.99
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s) 1031896104 ns 1058828646 ns 0.97
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s) 962579167 ns 973248125 ns 0.99
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s) 1179808792 ns 1362518583 ns 0.87
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s) 1752419187.5 ns 1744326604 ns 1.00
mlp7layer_bn(relu)(32 x 256)/forward/CPU/2 thread(s) 1246312 ns 1317667 ns 0.95
mlp7layer_bn(relu)(32 x 256)/forward/CPU/4 thread(s) 981667 ns 936250 ns 1.05
mlp7layer_bn(relu)(32 x 256)/forward/CPU/8 thread(s) 924938 ns 907396 ns 1.02
mlp7layer_bn(relu)(32 x 256)/forward/CPU/1 thread(s) 1952875 ns 2059708 ns 0.95
mlp7layer_bn(relu)(32 x 256)/forward/GPU/CUDA 559173.5 ns 573972.5 ns 0.97
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/2 thread(s) 5968250 ns 5872667 ns 1.02
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/4 thread(s) 6725083 ns 6537417 ns 1.03
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/8 thread(s) 24147709 ns 24586229.5 ns 0.98
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/1 thread(s) 7125208 ns 7039792 ns 1.01
mlp7layer_bn(relu)(32 x 256)/zygote/GPU/CUDA 1363102 ns 1375117 ns 0.99
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/2 thread(s) 10592083.5 ns 11464417 ns 0.92
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/4 thread(s) 9872770.5 ns 10266333 ns 0.96
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/8 thread(s) 16891792 ns 17693667 ns 0.95
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/1 thread(s) 8542250.5 ns 8866896 ns 0.96
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/2 thread(s) 490083 ns 487208 ns 1.01
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/4 thread(s) 414250 ns 474584 ns 0.87
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/8 thread(s) 1848916.5 ns 2175853.5 ns 0.85
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/1 thread(s) 89417 ns 87541 ns 1.02
Dense(128 => 128, gelu)(128 x 128)/forward/GPU/CUDA 27713 ns 28408 ns 0.98
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/2 thread(s) 381875 ns 383437.5 ns 1.00
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/4 thread(s) 447500 ns 444333.5 ns 1.01
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/8 thread(s) 4415146 ns 4385583 ns 1.01
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/1 thread(s) 259083.5 ns 268292 ns 0.97
Dense(128 => 128, gelu)(128 x 128)/zygote/GPU/CUDA 221456.5 ns 225901 ns 0.98
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/2 thread(s) 412875 ns 706959 ns 0.58
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/4 thread(s) 474250 ns 722500 ns 0.66
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/8 thread(s) 4220333 ns 1069791 ns 3.95
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/1 thread(s) 271166 ns 447125 ns 0.61
Dense(128 => 128, relu)(128 x 128)/forward/CPU/2 thread(s) 434854 ns 432125 ns 1.01
Dense(128 => 128, relu)(128 x 128)/forward/CPU/4 thread(s) 353250 ns 418166 ns 0.84
Dense(128 => 128, relu)(128 x 128)/forward/CPU/8 thread(s) 650792 ns 742500 ns 0.88
Dense(128 => 128, relu)(128 x 128)/forward/CPU/1 thread(s) 54375 ns 53208 ns 1.02
Dense(128 => 128, relu)(128 x 128)/forward/GPU/CUDA 27922 ns 28501 ns 0.98
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/2 thread(s) 339896.5 ns 338770.5 ns 1.00
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/4 thread(s) 340500 ns 338750 ns 1.01
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/8 thread(s) 611187.5 ns 737375 ns 0.83
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/1 thread(s) 152292 ns 154208 ns 0.99
Dense(128 => 128, relu)(128 x 128)/zygote/GPU/CUDA 206825 ns 210566 ns 0.98
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/2 thread(s) 356792 ns 404125 ns 0.88
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/4 thread(s) 355875 ns 405916.5 ns 0.88
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/8 thread(s) 420542 ns 983208 ns 0.43
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/1 thread(s) 151000 ns 174750 ns 0.86
vgg16(32, 32, 3, 64)/forward/CPU/2 thread(s) 603607250 ns 603527917 ns 1.00
vgg16(32, 32, 3, 64)/forward/CPU/4 thread(s) 425272979 ns 431057458.5 ns 0.99
vgg16(32, 32, 3, 64)/forward/CPU/8 thread(s) 372455458 ns 375361437.5 ns 0.99
vgg16(32, 32, 3, 64)/forward/CPU/1 thread(s) 873099458 ns 872552854 ns 1.00
vgg16(32, 32, 3, 64)/forward/GPU/CUDA 7619709 ns 7040620 ns 1.08
vgg16(32, 32, 3, 64)/zygote/CPU/2 thread(s) 2006739833.5 ns 1986550813 ns 1.01
vgg16(32, 32, 3, 64)/zygote/CPU/4 thread(s) 1613467771 ns 1668902250 ns 0.97
vgg16(32, 32, 3, 64)/zygote/CPU/8 thread(s) 1601604000 ns 1651138625 ns 0.97
vgg16(32, 32, 3, 64)/zygote/CPU/1 thread(s) 2628483083 ns 2764176416 ns 0.95
vgg16(32, 32, 3, 64)/zygote/GPU/CUDA 26335134 ns 25979788.5 ns 1.01
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/2 thread(s) 520146 ns 521833 ns 1.00
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/4 thread(s) 434479 ns 437250 ns 0.99
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/8 thread(s) 1898520.5 ns 1710708 ns 1.11
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/1 thread(s) 866625 ns 866062.5 ns 1.00
Dense(512 => 512, gelu)(512 x 128)/forward/GPU/CUDA 47286 ns 47823 ns 0.99
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/2 thread(s) 1848208.5 ns 1842562.5 ns 1.00
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/4 thread(s) 2786229 ns 2356875 ns 1.18
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/8 thread(s) 14679500 ns 14345020.5 ns 1.02
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/1 thread(s) 2771958 ns 2764166 ns 1.00
Dense(512 => 512, gelu)(512 x 128)/zygote/GPU/CUDA 249296.5 ns 252466.5 ns 0.99
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/2 thread(s) 1937125 ns 2751750 ns 0.70
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/4 thread(s) 5035312.5 ns 2316083 ns 2.17
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/8 thread(s) 14724291.5 ns 4360708 ns 3.38
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/1 thread(s) 2768167 ns 4727708 ns 0.59
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/2 thread(s) 1574791.5 ns 1581500 ns 1.00
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/4 thread(s) 1257666 ns 1216229.5 ns 1.03
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/8 thread(s) 1200500 ns 1177645.5 ns 1.02
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/1 thread(s) 2226083 ns 2314729 ns 0.96
mlp7layer_bn(tanh)(32 x 256)/forward/GPU/CUDA 584985.5 ns 547137 ns 1.07
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/2 thread(s) 5976500 ns 5877292 ns 1.02
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/4 thread(s) 4604667 ns 6745916.5 ns 0.68
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/8 thread(s) 25216125 ns 24550687.5 ns 1.03
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/1 thread(s) 7317042 ns 7266312 ns 1.01
mlp7layer_bn(tanh)(32 x 256)/zygote/GPU/CUDA 1363255 ns 1351645 ns 1.01
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/2 thread(s) 12710625 ns 12285333.5 ns 1.03
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/4 thread(s) 11988958 ns 12037124.5 ns 1.00
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/8 thread(s) 21409084 ns 20466187 ns 1.05
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/1 thread(s) 10882083 ns 10853417 ns 1.00
Dense(16 => 16, relu)(16 x 128)/forward/CPU/2 thread(s) 2291 ns 2500 ns 0.92
Dense(16 => 16, relu)(16 x 128)/forward/CPU/4 thread(s) 2708 ns 2750 ns 0.98
Dense(16 => 16, relu)(16 x 128)/forward/CPU/8 thread(s) 2959 ns 3416 ns 0.87
Dense(16 => 16, relu)(16 x 128)/forward/CPU/1 thread(s) 2375 ns 3041 ns 0.78
Dense(16 => 16, relu)(16 x 128)/forward/GPU/CUDA 24451.5 ns 24989 ns 0.98
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/2 thread(s) 7042 ns 8333 ns 0.85
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/4 thread(s) 7084 ns 8625 ns 0.82
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/8 thread(s) 7209 ns 8667 ns 0.83
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/1 thread(s) 7166 ns 8770.5 ns 0.82
Dense(16 => 16, relu)(16 x 128)/zygote/GPU/CUDA 210193.5 ns 213236.5 ns 0.99
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/2 thread(s) 8125 ns 16750 ns 0.49
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/4 thread(s) 8292 ns 16375 ns 0.51
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/8 thread(s) 8208 ns 16792 ns 0.49
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/1 thread(s) 5917 ns 10917 ns 0.54
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/2 thread(s) 11000.5 ns 10792 ns 1.02
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/4 thread(s) 16166 ns 18083 ns 0.89
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/8 thread(s) 11146 ns 11666 ns 0.96
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/1 thread(s) 7125 ns 7666.5 ns 0.93
Dense(16 => 16, gelu)(16 x 128)/forward/GPU/CUDA 24717 ns 24865.5 ns 0.99
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/2 thread(s) 20000 ns 22333 ns 0.90
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/4 thread(s) 20000 ns 22291 ns 0.90
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/8 thread(s) 20125 ns 22500 ns 0.89
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/1 thread(s) 20250 ns 22375 ns 0.91
Dense(16 => 16, gelu)(16 x 128)/zygote/GPU/CUDA 230632.5 ns 233562.5 ns 0.99
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/2 thread(s) 23375 ns 52042 ns 0.45
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/4 thread(s) 23417 ns 52125 ns 0.45
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/8 thread(s) 23645.5 ns 52270.5 ns 0.45
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/1 thread(s) 21375 ns 44000 ns 0.49
Dense(128 => 128, identity)(128 x 128)/forward/CPU/2 thread(s) 29458 ns 28979.5 ns 1.02
Dense(128 => 128, identity)(128 x 128)/forward/CPU/4 thread(s) 28834 ns 29208 ns 0.99
Dense(128 => 128, identity)(128 x 128)/forward/CPU/8 thread(s) 28625 ns 28458 ns 1.01
Dense(128 => 128, identity)(128 x 128)/forward/CPU/1 thread(s) 46333 ns 46209 ns 1.00
Dense(128 => 128, identity)(128 x 128)/forward/GPU/CUDA 25821.5 ns 26274 ns 0.98
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/2 thread(s) 226542 ns 229062.5 ns 0.99
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/4 thread(s) 274167 ns 263041 ns 1.04
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/8 thread(s) 4023229.5 ns 4056646 ns 0.99
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/1 thread(s) 145708 ns 154437.5 ns 0.94
Dense(128 => 128, identity)(128 x 128)/zygote/GPU/CUDA 205677 ns 215509 ns 0.95
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/2 thread(s) 339625 ns 329834 ns 1.03
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/4 thread(s) 311625 ns 292583 ns 1.07
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/8 thread(s) 520417 ns 817500 ns 0.64
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/1 thread(s) 161292 ns 161708 ns 1.00
Dense(16 => 16, identity)(16 x 128)/forward/CPU/2 thread(s) 1875 ns 2041 ns 0.92
Dense(16 => 16, identity)(16 x 128)/forward/CPU/4 thread(s) 1833 ns 1833 ns 1
Dense(16 => 16, identity)(16 x 128)/forward/CPU/8 thread(s) 2104.5 ns 2750 ns 0.77
Dense(16 => 16, identity)(16 x 128)/forward/CPU/1 thread(s) 1625 ns 1917 ns 0.85
Dense(16 => 16, identity)(16 x 128)/forward/GPU/CUDA 22965 ns 23258 ns 0.99
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/2 thread(s) 5250 ns 7208 ns 0.73
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/4 thread(s) 5250 ns 7042 ns 0.75
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/8 thread(s) 5292 ns 7750 ns 0.68
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/1 thread(s) 5208 ns 7125 ns 0.73
Dense(16 => 16, identity)(16 x 128)/zygote/GPU/CUDA 261526 ns 267733.5 ns 0.98
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/2 thread(s) 11208 ns 11334 ns 0.99
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/4 thread(s) 11333 ns 11375 ns 1.00
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/8 thread(s) 11459 ns 11708 ns 0.98
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/1 thread(s) 6708 ns 6958 ns 0.96
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s) 79891416 ns 79930209 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s) 49038584 ns 49066500 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s) 44836791 ns 45049708 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s) 151572917 ns 151430167 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/GPU/CUDA 2695899 ns 2719840 ns 0.99
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s) 665802334 ns 497512959 ns 1.34
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s) 410890125 ns 411297375 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s) 399102167 ns 396546125 ns 1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s) 681784916 ns 736651313 ns 0.93
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA 14619713 ns 14587409 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s) 710708249.5 ns 709337374.5 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s) 671159083 ns 664763792 ns 1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s) 978285458 ns 1022853709 ns 0.96
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s) 996959708 ns 996468292 ns 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@avik-pal
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JuliaRegistrator
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Registration pull request created: JuliaRegistries/General/114718

Tip: Release Notes

Did you know you can add release notes too? Just add markdown formatted text underneath the comment after the text
"Release notes:" and it will be added to the registry PR, and if TagBot is installed it will also be added to the
release that TagBot creates. i.e.

@JuliaRegistrator register

Release notes:

## Breaking changes

- blah

To add them here just re-invoke and the PR will be updated.

Tagging

After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version.

This will be done automatically if the Julia TagBot GitHub Action is installed, or can be done manually through the github interface, or via:

git tag -a v1.0.0 -m "<description of version>" abc70577d16807280c836c2f87cfb08d892b41f1
git push origin v1.0.0

Please sign in to comment.