diff --git a/dev/.documenter-siteinfo.json b/dev/.documenter-siteinfo.json
index aa9d7527..0f0d7cdf 100644
--- a/dev/.documenter-siteinfo.json
+++ b/dev/.documenter-siteinfo.json
@@ -1 +1 @@
-{"documenter":{"julia_version":"1.9.4","generation_timestamp":"2023-12-11T20:44:19","documenter_version":"1.2.1"}}
\ No newline at end of file
+{"documenter":{"julia_version":"1.9.4","generation_timestamp":"2023-12-12T10:00:14","documenter_version":"1.2.1"}}
\ No newline at end of file
diff --git a/dev/alternatives/index.html b/dev/alternatives/index.html
index 167f2cac..98bb4063 100644
--- a/dev/alternatives/index.html
+++ b/dev/alternatives/index.html
@@ -1,2 +1,2 @@
-
We discard MarkovModels.jl because its focus is GPU computation. There are also more generic packages for probabilistic programming, which are able to perform MCMC or variational inference (eg. Turing.jl) but we leave those aside.
We discard MarkovModels.jl because its focus is GPU computation. There are also more generic packages for probabilistic programming, which are able to perform MCMC or variational inference (eg. Turing.jl) but we leave those aside.
Return a type that can accommodate forward-backward computations for hmm on observations similar to obs.
It is typically a promotion between the element type of the initialization, the element type of the transition matrix, and the type of an observation logdensity evaluated at obs.
Run the forward algorithm to compute the loglikelihood of obs_seq for hmm, integrating over all possible state sequences.
Keyword arguments
control_seq: a control sequence with the same length as obs_seq
seq_ends: in the case where obs_seq and control_seq are concatenations of multiple sequences, seq_ends contains the indices at which each of those sequences ends
Run the forward algorithm to compute the the joint loglikelihood of obs_seq and state_seq for hmm.
Keyword arguments
control_seq: a control sequence with the same length as obs_seq
seq_ends: in the case where obs_seq and control_seq are concatenations of multiple sequences, seq_ends contains the indices at which each of those sequences ends
Apply the forward algorithm to infer the current state after sequence obs_seq for hmm.
Return a tuple (storage.α, sum(storage.logL)) where storage is of type ForwardStorage.
Keyword arguments
control_seq: a control sequence with the same length as obs_seq
seq_ends: in the case where obs_seq and control_seq are concatenations of multiple sequences, seq_ends contains the indices at which each of those sequences ends
Apply the Viterbi algorithm to infer the most likely state sequence corresponding to obs_seq for hmm.
Return a tuple (storage.q, sum(storage.logL)) where storage is of type ViterbiStorage.
Keyword arguments
control_seq: a control sequence with the same length as obs_seq
seq_ends: in the case where obs_seq and control_seq are concatenations of multiple sequences, seq_ends contains the indices at which each of those sequences ends
Apply the forward-backward algorithm to infer the posterior state and transition marginals during sequence obs_seq for hmm.
Return a tuple (storage.γ, sum(storage.logL)) where storage is of type ForwardBackwardStorage.
control_seq: a control sequence with the same length as obs_seq
seq_ends: in the case where obs_seq and control_seq are concatenations of multiple sequences, seq_ends contains the indices at which each of those sequences ends
Most algorithms below ingest the data with three (keyword) arguments: obs_seq, control_seq and seq_ends.
If the data consists of a single sequence, obs_seq and control_seq are the corresponding vectors of observations and controls, and you don't need to provide seq_ends.
If the data consists of multiple sequences, obs_seq and control_seq are concatenations of several vectors, whose end indices are given by seq_ends. Starting from separate sequences obs_seqs and control_seqs, you can run the following snippet:
Return a type that can accommodate forward-backward computations for hmm on observations similar to obs.
It is typically a promotion between the element type of the initialization, the element type of the transition matrix, and the type of an observation logdensity evaluated at obs.
Apply the Baum-Welch algorithm to estimate the parameters of an HMM on obs_seq, starting from hmm_guess.
Return a tuple (hmm_est, loglikelihood_evolution) where hmm_est is the estimated HMM and loglikelihood_evolution is a vector of loglikelihood values, one per iteration of the algorithm.
Keyword arguments
control_seq: a control sequence with the same length as obs_seq
seq_ends: in the case where obs_seq and control_seq are concatenations of multiple sequences, seq_ends contains the indices at which each of those sequences ends
atol: minimum loglikelihood increase at an iteration of the algorithm (otherwise the algorithm is deemed to have converged)
max_iterations: maximum number of iterations of the algorithm
loglikelihood_increasing: whether to throw an error if the loglikelihood decreases
Apply the Baum-Welch algorithm to estimate the parameters of an HMM on obs_seq, starting from hmm_guess.
Return a tuple (hmm_est, loglikelihood_evolution) where hmm_est is the estimated HMM and loglikelihood_evolution is a vector of loglikelihood values, one per iteration of the algorithm.
Keyword arguments
atol: minimum loglikelihood increase at an iteration of the algorithm (otherwise the algorithm is deemed to have converged)
max_iterations: maximum number of iterations of the algorithm
loglikelihood_increasing: whether to throw an error if the loglikelihood decreases
The most frequent error you will encounter is an OverflowError during inference, telling you that some values are infinite or NaN. This can happen for a variety of reasons, so here are a few leads worth investigating:
Increase the duration of the sequence / the number of sequences to get more data
Add a prior to your transition matrix / observation distributions to avoid degenerate behavior like zero variance in a Gaussian
Reduce the number of states to make every one of them useful
Pick a better initialization to start closer to the supposed ground truth
Use numerically stable number types (such as LogarithmicNumbers.jl) in strategic places, but beware: these numbers don't play nicely with Distributions.jl, so you may have to roll out your own observation distributions.
The most frequent error you will encounter is an OverflowError during inference, telling you that some values are infinite or NaN. This can happen for a variety of reasons, so here are a few leads worth investigating:
Increase the duration of the sequence / the number of sequences to get more data
Add a prior to your transition matrix / observation distributions to avoid degenerate behavior like zero variance in a Gaussian
Reduce the number of states to make every one of them useful
Pick a better initialization to start closer to the supposed ground truth
Use numerically stable number types (such as LogarithmicNumbers.jl) in strategic places, but beware: these numbers don't play nicely with Distributions.jl, so you may have to roll out your own observation distributions.
Once we have gradients of the loglikelihood, it is a natural idea to perform gradient descent in order to fit the parameters of a custom HMM. However, there are two caveats we must keep in mind.
First, computing a gradient essentially requires running the forward-backward algorithm, which means it is expensive. Given the output of forward-backward, if there is a way to perform a more accurate parameter update (like going straight to the maximum likelihood value), it is probably worth it. That is what we show in the other tutorials with the reimplementation of the fit! method.
Second, HMM parameters live in a constrained space, which calls for a projected gradient descent. Most notably, the transition matrix must be stochastic, and the orthogonal projection onto this set (the Birkhoff polytope) is not easy to obtain.
Still, first order optimization can be relevant when we lack explicit formulas for maximum likelihood.