-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store transposed transition matrix to speed up forward #107
Conversation
Benchmark resultJudge resultBenchmark Report for /home/runner/work/HiddenMarkovModels.jl/HiddenMarkovModels.jlJob Properties
ResultsA ratio greater than
Benchmark Group ListHere's a list of all the benchmark groups executed by this job:
Julia versioninfoTarget
Baseline
Target resultBenchmark Report for /home/runner/work/HiddenMarkovModels.jl/HiddenMarkovModels.jlJob Properties
ResultsBelow is a table of this job's results, obtained by running the benchmarks.
Benchmark Group ListHere's a list of all the benchmark groups executed by this job:
Julia versioninfo
Baseline resultBenchmark Report for /home/runner/work/HiddenMarkovModels.jl/HiddenMarkovModels.jlJob Properties
ResultsBelow is a table of this job's results, obtained by running the benchmarks.
Benchmark Group ListHere's a list of all the benchmark groups executed by this job:
Julia versioninfo
Runtime information
|
@THargreaves this seems to worsen performance for very small state spaces, any idea why? |
Benchmark resultJudge resultBenchmark Report for /home/runner/work/HiddenMarkovModels.jl/HiddenMarkovModels.jlJob Properties
ResultsA ratio greater than
Benchmark Group ListHere's a list of all the benchmark groups executed by this job:
Julia versioninfoTarget
Baseline
Target resultBenchmark Report for /home/runner/work/HiddenMarkovModels.jl/HiddenMarkovModels.jlJob Properties
ResultsBelow is a table of this job's results, obtained by running the benchmarks.
Benchmark Group ListHere's a list of all the benchmark groups executed by this job:
Julia versioninfo
Baseline resultBenchmark Report for /home/runner/work/HiddenMarkovModels.jl/HiddenMarkovModels.jlJob Properties
ResultsBelow is a table of this job's results, obtained by running the benchmarks.
Benchmark Group ListHere's a list of all the benchmark groups executed by this job:
Julia versioninfo
Runtime information
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #107 +/- ##
==========================================
- Coverage 91.00% 90.07% -0.93%
==========================================
Files 17 17
Lines 489 504 +15
==========================================
+ Hits 445 454 +9
- Misses 44 50 +6 ☔ View full report in Codecov by Sentry. |
Oh well, maybe the CI benchmarks are just very noisy |
Hmm that's peculiar. I can understand why it might not have a speed boost for small matrices but a regression surprises me. |
I think it was a benchmarking artefact. But I'm actually also surprised about the speed boost: for matrix-vector products, isnt't it just as efficient to store the matrix in column-major ( |
I'm starting to wonder if the whole thing was just a benchmarking artefact. I initially decided to test this because I'd had experiences with CUDA.jl where setting the transpose flag for the input matrix led to much slower computations. I made a change similar to yours and the benchmark results looked quite consistent but could just have been noise/specific to my hardware. Interestingly though, testing this directly with julia> @benchmark $x * $y
BenchmarkTools.Trial: 10000 samples with 932 evaluations.
Range (min … max): 108.235 ns … 75.204 μs ┊ GC (min … max): 0.00% … 99.83%
Time (median): 115.343 ns ┊ GC (median): 0.00%
Time (mean ± σ): 122.019 ns ± 750.931 ns ┊ GC (mean ± σ): 6.32% ± 2.23%
▁█▃ ▂▁
▁▁▂▃███▅▂▁▁▂▂▂▁▁▁▁▃▅███▆▄▃▂▃▃▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
108 ns Histogram: frequency by time 131 ns <
Memory estimate: 144 bytes, allocs estimate: 1.
julia> @benchmark $(transpose(x)) * $y
BenchmarkTools.Trial: 10000 samples with 985 evaluations.
Range (min … max): 57.318 ns … 75.435 μs ┊ GC (min … max): 0.00% … 99.90%
Time (median): 63.155 ns ┊ GC (median): 0.00%
Time (mean ± σ): 70.518 ns ± 753.754 ns ┊ GC (mean ± σ): 11.00% ± 2.73%
▅█ ▁
▁▁▄▅▅███▃▁▁▁▁▁▂▄▄██▆▇▅▄▄▃▃▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
57.3 ns Histogram: frequency by time 79.5 ns <
Memory estimate: 144 bytes, allocs estimate: 1.
julia> @benchmark transpose($x) * $y
BenchmarkTools.Trial: 10000 samples with 984 evaluations.
Range (min … max): 57.333 ns … 71.357 μs ┊ GC (min … max): 0.00% … 99.88%
Time (median): 63.051 ns ┊ GC (median): 0.00%
Time (mean ± σ): 69.734 ns ± 712.979 ns ┊ GC (mean ± σ): 10.53% ± 2.72%
▃█▆ ▃▂
▁▁▃▅▅▄▅████▅▃▂▁▁▁▁▁▁▁▁▂▄▅▆██▇▆▅▄▃▄▄▃▃▃▃▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
57.3 ns Histogram: frequency by time 72.2 ns <
Memory estimate: 144 bytes, allocs estimate: 1. |
In light of these findings, I suggest we deep the results inconclusive enough to give up on this tweak for now? |
Certainly, agreed. |
Fix #106 by storing a copy of the transposed transition matrix (and its elementwise logs) inside the concrete
HMM
type. To remain generic, I add methods forAbstractHMM
that fall back ontranspose
.