Here we show how to use the essential ingredients of the package.
using Distributions
+using HiddenMarkovModels
+using LinearAlgebra
+using Random
rng = Random.default_rng()
+Random.seed!(rng, 63);
The package provides a versatile HMM
type with three attributes:
- a vector of state initialization probabilities
- a matrix of state transition probabilities
- a vector of observation distributions, one for each state
We keep it simple for now by leveraging Distributions.jl.
d = 3
+init = [0.8, 0.2]
+trans = [0.7 0.3; 0.3 0.7]
+dists = [MvNormal(-1.0 * ones(d), I), MvNormal(+1.0 * ones(d), I)]
+hmm = HMM(init, trans, dists);
You can simulate a pair of state and observation sequences with rand
by specifying how long you want them to be.
state_seq, obs_seq = rand(rng, hmm, 20);
Note that the observation sequence is a vector, whose elements have whatever type an observation distribution returns when sampled.
state_seq[1], obs_seq[1]
(1, [-1.3663439583546293, 0.5170782955799473, -1.272497384182936])
In practical applications, the state sequence is not known, which is why we need inference algorithms to gather information about it.
The Viterbi algorithm (viterbi
) returns the most likely state sequence $\hat{X}_{1:T} = \underset{X_{1:T}}{\mathrm{argmax}}~\mathbb{P}(X_{1:T} \vert Y_{1:T})$, along with the joint loglikelihood $\mathbb{P}(\hat{X}_{1:T}, Y_{1:T})$.
best_state_seq, best_joint_loglikelihood = viterbi(hmm, obs_seq);
As we can see, it is very close to the true state sequence, but not necessarily equal.
vcat(state_seq', best_state_seq')
2×20 Matrix{Int64}:
+ 1 1 1 2 2 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2
+ 1 1 1 2 2 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2
The forward algorithm (forward
) returns a matrix of filtered state marginals $\alpha[i, t] = \mathbb{P}(X_t = i | Y_{1:t})$, along with the loglikelihood $\mathbb{P}(Y_{1:T})$ of the observation sequence.
filtered_state_marginals, obs_seq_loglikelihood1 = forward(hmm, obs_seq);
At each time $t$, it takes only the observations up to time $t$ into account. This is particularly useful to infer the marginal distribution of the last state.
filtered_state_marginals[:, end]
2-element Vector{Float64}:
+ 0.11023296335725714
+ 0.8897670366427429
Conversely, the forward-backward algorithm (forward_backward
) returns a matrix of smoothed state marginals $\gamma[i, t] = \mathbb{P}(X_t = i | Y_{1:T})$, along with the loglikelihood $\mathbb{P}(Y_{1:T})$ of the observation sequence.
smoothed_state_marginals, obs_seq_loglikelihood2 = forward_backward(hmm, obs_seq);
At each time $t$, it takes all observations up to time $T$ into account. This is particularly useful during learning. Note that forward and forward-backward only coincide at the last time step.
collect(zip(filtered_state_marginals, smoothed_state_marginals))
2×20 Matrix{Tuple{Float64, Float64}}:
+ (0.996424, 0.998463) (0.999794, 0.999903) … (0.110233, 0.110233)
+ (0.00357638, 0.00153653) (0.000206066, 9.66202e-5) (0.889767, 0.889767)
Finally, we provide a thin wrapper (logdensityof
) around the forward algorithm for observation sequence loglikelihoods $\mathbb{P}(Y_{1:T})$.
logdensityof(hmm, obs_seq)
-98.11265371636571
The same function can also compute joint loglikelihoods $\mathbb{P}(X_{1:T}, Y_{1:T})$ that take the states into account.
logdensityof(hmm, obs_seq, state_seq)
-100.01642675640207
For instance, we can check that the output of Viterbi is at least as likely as the true state sequence.
logdensityof(hmm, obs_seq, best_state_seq)
-99.72520698031965
The Baum-Welch algorithm (baum_welch
) is a variant of Expectation-Maximization, designed specifically to estimate HMM parameters. Since it is a local optimization procedure, it requires a starting point that is close enough to the true model.
init_guess = [0.7, 0.3]
+trans_guess = [0.6 0.4; 0.4 0.6]
+dists_guess = [MvNormal(-0.7 * ones(d), I), MvNormal(+0.7 * ones(d), I)]
+hmm_guess = HMM(init_guess, trans_guess, dists_guess);
Let's estimate parameters based on a slightly longer sequence.
_, long_obs_seq = rand(rng, hmm, 100)
+hmm_est, loglikelihood_evolution = baum_welch(hmm_guess, long_obs_seq);
An essential guarantee of this algorithm is that the loglikelihood of the observation sequence keeps increasing as the model improves.
first(loglikelihood_evolution), last(loglikelihood_evolution)
(-494.8997144295873, -481.61350056670784)
We can check that the transition matrix estimate has improved.
cat(transition_matrix(hmm_est), transition_matrix(hmm); dims=3)
2×2×2 Array{Float64, 3}:
+[:, :, 1] =
+ 0.666944 0.333056
+ 0.261979 0.738021
+
+[:, :, 2] =
+ 0.7 0.3
+ 0.3 0.7
And so have the estimates for the observation distributions.
map(dist -> dist.μ, hcat(obs_distributions(hmm_est), obs_distributions(hmm)))
2×2 Matrix{Vector{Float64}}:
+ [-0.893128, -1.0123, -0.575828] [-1.0, -1.0, -1.0]
+ [0.718477, 1.25049, 0.751827] [1.0, 1.0, 1.0]
On the other hand, the initialization is concentrated on one state. This effect can be mitigated by learning from several independent sequences.
hcat(initialization(hmm_est), initialization(hmm))
2×2 Matrix{Float64}:
+ 1.0 0.8
+ 3.61167e-8 0.2
Since HMMs are not identifiable up to a permutation of the states, there is no guarantee that state $i$ in the true model will correspond to state $i$ in the estimated model. This is important to keep in mind when testing new models.
In many applications, we have access to various observation sequences of different lengths.
_, long_obs_seq2 = rand(rng, hmm, 300)
+_, long_obs_seq3 = rand(rng, hmm, 200)
+long_obs_seqs = [long_obs_seq, long_obs_seq2, long_obs_seq3];
Every algorithm in the package accepts multiple sequences in a concatenated form. The user must also specify where each sequence ends in the concatenated vector, by passing seq_ends
as a keyword argument. Otherwise, the input will be treated as a unique observation sequence, which is mathematically incorrect.
long_obs_seq_concat = reduce(vcat, long_obs_seqs)
+seq_ends = cumsum(length.(long_obs_seqs))
3-element Vector{Int64}:
+ 100
+ 400
+ 600
The outputs of inference algorithms are then concatenated, and the associated loglikelihoods are summed over all sequences.
best_state_seq_concat, _ = viterbi(hmm, long_obs_seq_concat; seq_ends);
+length(best_state_seq_concat)
600
The function seq_limits
returns the begin and end of a given sequence in the concatenated vector. It can be used to untangle the results.
start2, stop2 = seq_limits(seq_ends, 2)
(101, 400)
best_state_seq_concat[start2:stop2] == first(viterbi(hmm, long_obs_seq2))
true
While inference algorithms can also be run separately on each sequence without changing the results, considering multiple sequences together is nontrivial for Baum-Welch. That is why the package takes care of it automatically.
hmm_est_concat, _ = baum_welch(hmm_guess, long_obs_seq_concat; seq_ends);
Our estimate should be a little better.
cat(transition_matrix(hmm_est_concat), transition_matrix(hmm); dims=3)
2×2×2 Array{Float64, 3}:
+[:, :, 1] =
+ 0.675638 0.324362
+ 0.307826 0.692174
+
+[:, :, 2] =
+ 0.7 0.3
+ 0.3 0.7
map(dist -> dist.μ, hcat(obs_distributions(hmm_est_concat), obs_distributions(hmm)))
2×2 Matrix{Vector{Float64}}:
+ [-0.924126, -1.03495, -1.03126] [-1.0, -1.0, -1.0]
+ [0.989244, 1.00748, 0.933342] [1.0, 1.0, 1.0]
This page was generated using Literate.jl.