diff --git a/dev/.documenter-siteinfo.json b/dev/.documenter-siteinfo.json index aa9d7527..0f0d7cdf 100644 --- a/dev/.documenter-siteinfo.json +++ b/dev/.documenter-siteinfo.json @@ -1 +1 @@ -{"documenter":{"julia_version":"1.9.4","generation_timestamp":"2023-12-11T20:44:19","documenter_version":"1.2.1"}} \ No newline at end of file +{"documenter":{"julia_version":"1.9.4","generation_timestamp":"2023-12-12T10:00:14","documenter_version":"1.2.1"}} \ No newline at end of file diff --git a/dev/alternatives/index.html b/dev/alternatives/index.html index 167f2cac..98bb4063 100644 --- a/dev/alternatives/index.html +++ b/dev/alternatives/index.html @@ -1,2 +1,2 @@ -Alternatives · HiddenMarkovModels.jl

Competitors

Julia

We compare features among the following Julia packages:

We discard MarkovModels.jl because its focus is GPU computation. There are also more generic packages for probabilistic programming, which are able to perform MCMC or variational inference (eg. Turing.jl) but we leave those aside.

HMMs.jlHMMBase.jlHMMGradients.jl
AlgorithmsSim, FB, Vit, BWSim, FB, Vit, BWFB
Observation typesanythingNumber / Vectoranything
Observation distributionsDensityInterface.jlDistributions.jlmanual
Multiple sequencesyesnoyes
Priors / structurespossiblenopossible
Temporal dependencyyesnono
Control dependencyyesnono
Number typesanythingFloat64AbstractFloat
Automatic differentiationyesnoyes
Linear algebrayesyesno
Logarithmic probabilitieshalfwayhalfwayyes

Sim = Simulation, FB = Forward-Backward, Vit = Viterbi, BW = Baum-Welch

Python

+Alternatives · HiddenMarkovModels.jl

Competitors

Julia

We compare features among the following Julia packages:

We discard MarkovModels.jl because its focus is GPU computation. There are also more generic packages for probabilistic programming, which are able to perform MCMC or variational inference (eg. Turing.jl) but we leave those aside.

HMMs.jlHMMBase.jlHMMGradients.jl
AlgorithmsSim, FB, Vit, BWSim, FB, Vit, BWFB
Observation typesanythingNumber / Vectoranything
Observation distributionsDensityInterface.jlDistributions.jlmanual
Multiple sequencesyesnoyes
Priors / structurespossiblenopossible
Temporal dependencyyesnono
Control dependencyyesnono
Number typesanythingFloat64AbstractFloat
Automatic differentiationyesnoyes
Linear algebrayesyesno
Logarithmic probabilitieshalfwayhalfwayyes

Sim = Simulation, FB = Forward-Backward, Vit = Viterbi, BW = Baum-Welch

Python

diff --git a/dev/api/index.html b/dev/api/index.html index eade6fd4..be7f2a71 100644 --- a/dev/api/index.html +++ b/dev/api/index.html @@ -1,14 +1,16 @@ -API reference · HiddenMarkovModels.jl

API reference

HiddenMarkovModelsModule
HiddenMarkovModels

A Julia package for HMM modeling, simulation, inference and learning.

source

Types

HiddenMarkovModels.AbstractHMMType
AbstractHMM

Abstract supertype for an HMM amenable to simulation, inference and learning.

Interface

To create your own subtype of AbstractHMM, you need to implement the following methods for inference:

For learning, two more methods must be implemented:

Applicable functions

Any AbstractHMM which satisfies the inference interface can be given to the following functions:

If it satisfies the learning interface, the following function also applies:

  • [baum_welch(hmm, hmm_guess, obs_seq; control_seq, seq_ends)]
source
HiddenMarkovModels.HMMType
struct HMM{V<:(AbstractVector), M<:(AbstractMatrix), VD<:(AbstractVector)} <: AbstractHMM

Basic implementation of an HMM.

Fields

  • init::AbstractVector: initial state probabilities

  • trans::AbstractMatrix: state transition matrix

  • dists::AbstractVector: observation distributions

source

Interface

HiddenMarkovModels.obs_distributionsFunction
obs_distributions(hmm)
-obs_distributions(hmm, control)

Return a vector of observation distributions, one for each state of hmm (when control is applied).

These distribution objects should implement

  • Random.rand(rng, dist) for sampling
  • DensityInterface.logdensityof(dist, obs) for inference
  • StatsAPI.fit!(dist, obs_seq, weight_seq) for learning
source

Utils

Base.randFunction
rand([rng,] hmm, T)
-rand([rng,] hmm, control_seq)

Simulate hmm for T time steps, or when the sequence control_seq is applied.

Return a named tuple (; state_seq, obs_seq).

source
Base.eltypeFunction
eltype(hmm, obs, control)

Return a type that can accommodate forward-backward computations for hmm on observations similar to obs.

It is typically a promotion between the element type of the initialization, the element type of the transition matrix, and the type of an observation logdensity evaluated at obs.

source
HiddenMarkovModels.seq_limitsFunction
seq_limits(seq_ends, k)
-

Return a tuple (t1, t2) giving the begin and end indices of subsequence k within a set of sequences ending at seq_ends.

source

Inference

DensityInterface.logdensityofFunction
logdensityof(hmm)

Return the prior loglikelihood associated with the parameters of hmm.

source
logdensityof(hmm, obs_seq; control_seq, seq_ends)
-

Run the forward algorithm to compute the loglikelihood of obs_seq for hmm, integrating over all possible state sequences.

Keyword arguments

  • control_seq: a control sequence with the same length as obs_seq
  • seq_ends: in the case where obs_seq and control_seq are concatenations of multiple sequences, seq_ends contains the indices at which each of those sequences ends
source
logdensityof(hmm, obs_seq, state_seq; control_seq, seq_ends)
-

Run the forward algorithm to compute the the joint loglikelihood of obs_seq and state_seq for hmm.

Keyword arguments

  • control_seq: a control sequence with the same length as obs_seq
  • seq_ends: in the case where obs_seq and control_seq are concatenations of multiple sequences, seq_ends contains the indices at which each of those sequences ends
source
HiddenMarkovModels.forwardFunction
forward(hmm, obs_seq; control_seq, seq_ends)
-

Apply the forward algorithm to infer the current state after sequence obs_seq for hmm.

Return a tuple (storage.α, sum(storage.logL)) where storage is of type ForwardStorage.

Keyword arguments

  • control_seq: a control sequence with the same length as obs_seq
  • seq_ends: in the case where obs_seq and control_seq are concatenations of multiple sequences, seq_ends contains the indices at which each of those sequences ends
source
HiddenMarkovModels.viterbiFunction
viterbi(hmm, obs_seq; control_seq, seq_ends)
-

Apply the Viterbi algorithm to infer the most likely state sequence corresponding to obs_seq for hmm.

Return a tuple (storage.q, sum(storage.logL)) where storage is of type ViterbiStorage.

Keyword arguments

  • control_seq: a control sequence with the same length as obs_seq
  • seq_ends: in the case where obs_seq and control_seq are concatenations of multiple sequences, seq_ends contains the indices at which each of those sequences ends
source
HiddenMarkovModels.forward_backwardFunction
forward_backward(hmm, obs_seq; control_seq, seq_ends)
-

Apply the forward-backward algorithm to infer the posterior state and transition marginals during sequence obs_seq for hmm.

Return a tuple (storage.γ, sum(storage.logL)) where storage is of type ForwardBackwardStorage.

  • control_seq: a control sequence with the same length as obs_seq
  • seq_ends: in the case where obs_seq and control_seq are concatenations of multiple sequences, seq_ends contains the indices at which each of those sequences ends
source

Learning

HiddenMarkovModels.baum_welchFunction
baum_welch(
+API reference · HiddenMarkovModels.jl

API reference

HiddenMarkovModelsModule
HiddenMarkovModels

A Julia package for HMM modeling, simulation, inference and learning.

source

Sequence formatting

Most algorithms below ingest the data with three (keyword) arguments: obs_seq, control_seq and seq_ends.

  • If the data consists of a single sequence, obs_seq and control_seq are the corresponding vectors of observations and controls, and you don't need to provide seq_ends.
  • If the data consists of multiple sequences, obs_seq and control_seq are concatenations of several vectors, whose end indices are given by seq_ends. Starting from separate sequences obs_seqs and control_seqs, you can run the following snippet:
obs_seq = reduce(vcat, obs_seqs)
+control_seq = reduce(vcat, control_seqs)
+seq_ends = cumsum(length.(obs_seqs))

Types

HiddenMarkovModels.HMMType
struct HMM{V<:(AbstractVector), M<:(AbstractMatrix), VD<:(AbstractVector)} <: AbstractHMM

Basic implementation of an HMM.

Fields

  • init::AbstractVector: initial state probabilities

  • trans::AbstractMatrix: state transition matrix

  • dists::AbstractVector: observation distributions

source

Interface

HiddenMarkovModels.obs_distributionsFunction
obs_distributions(hmm)
+obs_distributions(hmm, control)

Return a vector of observation distributions, one for each state of hmm (possibly when control is applied).

These distribution objects should implement

  • Random.rand(rng, dist) for sampling
  • DensityInterface.logdensityof(dist, obs) for inference
  • StatsAPI.fit!(dist, obs_seq, weight_seq) for learning
source

Utils

Base.randFunction
rand([rng,] hmm, T)
+rand([rng,] hmm, control_seq)

Simulate hmm for T time steps, or when the sequence control_seq is applied.

Return a named tuple (; state_seq, obs_seq).

source
Base.eltypeFunction
eltype(hmm, obs, control)

Return a type that can accommodate forward-backward computations for hmm on observations similar to obs.

It is typically a promotion between the element type of the initialization, the element type of the transition matrix, and the type of an observation logdensity evaluated at obs.

source
HiddenMarkovModels.seq_limitsFunction
seq_limits(seq_ends, k)
+

Return a tuple (t1, t2) giving the begin and end indices of subsequence k within a set of sequences ending at seq_ends.

source

Inference

DensityInterface.logdensityofFunction
logdensityof(hmm)

Return the prior loglikelihood associated with the parameters of hmm.

source
logdensityof(hmm, obs_seq; control_seq, seq_ends)
+

Run the forward algorithm to compute the loglikelihood of obs_seq for hmm, integrating over all possible state sequences.

source
logdensityof(hmm, obs_seq, state_seq; control_seq, seq_ends)
+

Run the forward algorithm to compute the the joint loglikelihood of obs_seq and state_seq for hmm.

source
HiddenMarkovModels.forwardFunction
forward(hmm, obs_seq; control_seq, seq_ends)
+

Apply the forward algorithm to infer the current state after sequence obs_seq for hmm.

Return a tuple (storage.α, sum(storage.logL)) where storage is of type ForwardStorage.

source
HiddenMarkovModels.viterbiFunction
viterbi(hmm, obs_seq; control_seq, seq_ends)
+

Apply the Viterbi algorithm to infer the most likely state sequence corresponding to obs_seq for hmm.

Return a tuple (storage.q, sum(storage.logL)) where storage is of type ViterbiStorage.

source
HiddenMarkovModels.forward_backwardFunction
forward_backward(hmm, obs_seq; control_seq, seq_ends)
+

Apply the forward-backward algorithm to infer the posterior state and transition marginals during sequence obs_seq for hmm.

Return a tuple (storage.γ, sum(storage.logL)) where storage is of type ForwardBackwardStorage.

source

Learning

HiddenMarkovModels.baum_welchFunction
baum_welch(
     hmm_guess,
     obs_seq;
     control_seq,
@@ -17,24 +19,24 @@
     max_iterations,
     loglikelihood_increasing
 )
-

Apply the Baum-Welch algorithm to estimate the parameters of an HMM on obs_seq, starting from hmm_guess.

Return a tuple (hmm_est, loglikelihood_evolution) where hmm_est is the estimated HMM and loglikelihood_evolution is a vector of loglikelihood values, one per iteration of the algorithm.

Keyword arguments

  • control_seq: a control sequence with the same length as obs_seq

  • seq_ends: in the case where obs_seq and control_seq are concatenations of multiple sequences, seq_ends contains the indices at which each of those sequences ends

  • atol: minimum loglikelihood increase at an iteration of the algorithm (otherwise the algorithm is deemed to have converged)

  • max_iterations: maximum number of iterations of the algorithm

  • loglikelihood_increasing: whether to throw an error if the loglikelihood decreases

source
StatsAPI.fit!Function
fit!(
+

Apply the Baum-Welch algorithm to estimate the parameters of an HMM on obs_seq, starting from hmm_guess.

Return a tuple (hmm_est, loglikelihood_evolution) where hmm_est is the estimated HMM and loglikelihood_evolution is a vector of loglikelihood values, one per iteration of the algorithm.

Keyword arguments

  • atol: minimum loglikelihood increase at an iteration of the algorithm (otherwise the algorithm is deemed to have converged)
  • max_iterations: maximum number of iterations of the algorithm
  • loglikelihood_increasing: whether to throw an error if the loglikelihood decreases
source
StatsAPI.fit!Function
fit!(
     hmm::AbstractHMM,
     fb_storage::ForwardBackwardStorage,
     obs_seq::AbstractVector;
     control_seq::AbstractVector,
     seq_ends::AbstractVector{Int},
-)

Update hmm in-place based on information generated during forward-backward.

source

In-place versions

HiddenMarkovModels.ForwardStorageType
struct ForwardStorage{R}

Fields

Only the fields with a description are part of the public API.

  • α::Matrix: posterior last state marginals `α[i] = ℙ(X[T]=i | Y[1:T])

  • logL::Vector: one loglikelihood per observation sequence

  • B::Matrix

  • c::Vector

source
HiddenMarkovModels.ViterbiStorageType
struct ViterbiStorage{R}

Fields

Only the fields with a description are part of the public API.

  • q::Vector{Int64}: most likely state sequence q[t] = argmaxᵢ ℙ(X[t]=i | Y[1:T])

  • logL::Vector: one joint loglikelihood per pair (observation sequence, most likely state sequence)

  • logB::Matrix

  • ϕ::Matrix

  • ψ::Matrix{Int64}

source
HiddenMarkovModels.ForwardBackwardStorageType
struct ForwardBackwardStorage{R, M<:AbstractArray{R, 2}}

Fields

Only the fields with a description are part of the public API.

  • γ::Matrix: posterior state marginals γ[i,t] = ℙ(X[t]=i | Y[1:T])

  • ξ::Vector{M} where {R, M<:AbstractMatrix{R}}: posterior transition marginals ξ[t][i,j] = ℙ(X[t]=i, X[t+1]=j | Y[1:T])

  • logL::Vector: one loglikelihood per observation sequence

  • B::Matrix

  • α::Matrix

  • c::Vector

  • β::Matrix

  • Bβ::Matrix

source

In-place versions

Forward

HiddenMarkovModels.ForwardStorageType
struct ForwardStorage{R}

Fields

Only the fields with a description are part of the public API.

  • α::Matrix: posterior last state marginals α[i] = ℙ(X[T]=i | Y[1:T])

  • logL::Vector: one loglikelihood per observation sequence

  • B::Matrix

  • c::Vector

source

Viterbi

HiddenMarkovModels.ViterbiStorageType
struct ViterbiStorage{R}

Fields

Only the fields with a description are part of the public API.

  • q::Vector{Int64}: most likely state sequence q[t] = argmaxᵢ ℙ(X[t]=i | Y[1:T])

  • logL::Vector: one joint loglikelihood per pair of observation sequence and most likely state sequence

  • logB::Matrix

  • ϕ::Matrix

  • ψ::Matrix{Int64}

source

Forward-backward

HiddenMarkovModels.ForwardBackwardStorageType
struct ForwardBackwardStorage{R, M<:AbstractArray{R, 2}}

Fields

Only the fields with a description are part of the public API.

  • γ::Matrix: posterior state marginals γ[i,t] = ℙ(X[t]=i | Y[1:T])

  • ξ::Vector{M} where {R, M<:AbstractMatrix{R}}: posterior transition marginals ξ[t][i,j] = ℙ(X[t]=i, X[t+1]=j | Y[1:T])

  • logL::Vector: one loglikelihood per observation sequence

  • B::Matrix

  • α::Matrix

  • c::Vector

  • β::Matrix

  • Bβ::Matrix

source

Baum-Welch

Misc

HiddenMarkovModels.LightDiagNormalType
struct LightDiagNormal{T1, T2, T3, V1<:AbstractArray{T1, 1}, V2<:AbstractArray{T2, 1}, V3<:AbstractArray{T3, 1}}

An HMMs-compatible implementation of a multivariate normal distribution with diagonal covariance, enabling allocation-free in-place estimation.

This is not part of the public API and is expected to change.

Fields

  • μ::AbstractVector: means

  • σ::AbstractVector: standard deviations

  • logσ::AbstractVector: log standard deviations

source
HiddenMarkovModels.LightCategoricalType
struct LightCategorical{T1, T2, V1<:AbstractArray{T1, 1}, V2<:AbstractArray{T2, 1}}

An HMMs-compatible implementation of a discrete categorical distribution, enabling allocation-free in-place estimation.

This is not part of the public API and is expected to change.

Fields

  • p::AbstractVector: class probabilities

  • logp::AbstractVector: log class probabilities

source
HiddenMarkovModels.fit_in_sequence!Function
fit_in_sequence!(dists, i, x, w)
-

Modify the i-th element of dists by fitting it to an observation sequence x with associated weight sequence w.

Default behavior:

fit!(dists[i], x, w)

Override for Distributions.jl (in the package extension)

dists[i] = fit(eltype(dists), x, w)
source

Index

+
source

Misc

HiddenMarkovModels.LightDiagNormalType
struct LightDiagNormal{T1, T2, T3, V1<:AbstractArray{T1, 1}, V2<:AbstractArray{T2, 1}, V3<:AbstractArray{T3, 1}}

An HMMs-compatible implementation of a multivariate normal distribution with diagonal covariance, enabling allocation-free in-place estimation.

This is not part of the public API and is expected to change.

Fields

  • μ::AbstractVector: means

  • σ::AbstractVector: standard deviations

  • logσ::AbstractVector: log standard deviations

source
HiddenMarkovModels.LightCategoricalType
struct LightCategorical{T1, T2, V1<:AbstractArray{T1, 1}, V2<:AbstractArray{T2, 1}}

An HMMs-compatible implementation of a discrete categorical distribution, enabling allocation-free in-place estimation.

This is not part of the public API and is expected to change.

Fields

  • p::AbstractVector: class probabilities

  • logp::AbstractVector: log class probabilities

source
HiddenMarkovModels.fit_in_sequence!Function
fit_in_sequence!(dists, i, x, w)
+

Modify the i-th element of dists by fitting it to an observation sequence x with associated weight sequence w.

Default behavior:

fit!(dists[i], x, w)

Override for Distributions.jl (in the package extension)

dists[i] = fit(eltype(dists), x, w)
source

Index

diff --git a/dev/debugging/index.html b/dev/debugging/index.html index 3ee6a9c8..554e0627 100644 --- a/dev/debugging/index.html +++ b/dev/debugging/index.html @@ -1,2 +1,2 @@ -Debugging · HiddenMarkovModels.jl

Debugging

Numerical overflow

The most frequent error you will encounter is an OverflowError during inference, telling you that some values are infinite or NaN. This can happen for a variety of reasons, so here are a few leads worth investigating:

  • Increase the duration of the sequence / the number of sequences to get more data
  • Add a prior to your transition matrix / observation distributions to avoid degenerate behavior like zero variance in a Gaussian
  • Reduce the number of states to make every one of them useful
  • Pick a better initialization to start closer to the supposed ground truth
  • Use numerically stable number types (such as LogarithmicNumbers.jl) in strategic places, but beware: these numbers don't play nicely with Distributions.jl, so you may have to roll out your own observation distributions.

Performance

If your algorithms are too slow, the general advice always applies:

  • Use BenchmarkTools.jl to establish a baseline
  • Use profiling to see where you spend most of your time
  • Use JET.jl to track down type instabilities
  • Use AllocCheck.jl to reduce allocations
+Debugging · HiddenMarkovModels.jl

Debugging

Numerical overflow

The most frequent error you will encounter is an OverflowError during inference, telling you that some values are infinite or NaN. This can happen for a variety of reasons, so here are a few leads worth investigating:

  • Increase the duration of the sequence / the number of sequences to get more data
  • Add a prior to your transition matrix / observation distributions to avoid degenerate behavior like zero variance in a Gaussian
  • Reduce the number of states to make every one of them useful
  • Pick a better initialization to start closer to the supposed ground truth
  • Use numerically stable number types (such as LogarithmicNumbers.jl) in strategic places, but beware: these numbers don't play nicely with Distributions.jl, so you may have to roll out your own observation distributions.

Performance

If your algorithms are too slow, the general advice always applies:

  • Use BenchmarkTools.jl to establish a baseline
  • Use profiling to see where you spend most of your time
  • Use JET.jl to track down type instabilities
  • Use AllocCheck.jl to reduce allocations
diff --git a/dev/examples/autodiff/index.html b/dev/examples/autodiff/index.html index 3fd7201b..4cd91aa1 100644 --- a/dev/examples/autodiff/index.html +++ b/dev/examples/autodiff/index.html @@ -41,4 +41,4 @@ Enzyme.Duplicated(params, params_shadow), ) -grad_e = params_shadow
ComponentVector{Float64}(init = [2.461586483904971, 0.15365406438011528], trans = [13.541230697414639 16.653550286760073; 13.64721549300123 13.472726825544802], means = [-1.5783255775403202, -2.098528923154286])
grad_e ≈ grad_f
true

Gradient methods

Once we have gradients of the loglikelihood, it is a natural idea to perform gradient descent in order to fit the parameters of a custom HMM. However, there are two caveats we must keep in mind.

First, computing a gradient essentially requires running the forward-backward algorithm, which means it is expensive. Given the output of forward-backward, if there is a way to perform a more accurate parameter update (like going straight to the maximum likelihood value), it is probably worth it. That is what we show in the other tutorials with the reimplementation of the fit! method.

Second, HMM parameters live in a constrained space, which calls for a projected gradient descent. Most notably, the transition matrix must be stochastic, and the orthogonal projection onto this set (the Birkhoff polytope) is not easy to obtain.

Still, first order optimization can be relevant when we lack explicit formulas for maximum likelihood.


This page was generated using Literate.jl.

+grad_e = params_shadow
ComponentVector{Float64}(init = [2.461586483904971, 0.15365406438011528], trans = [13.541230697414639 16.653550286760073; 13.64721549300123 13.472726825544802], means = [-1.5783255775403202, -2.098528923154286])
grad_e ≈ grad_f
true

Gradient methods

Once we have gradients of the loglikelihood, it is a natural idea to perform gradient descent in order to fit the parameters of a custom HMM. However, there are two caveats we must keep in mind.

First, computing a gradient essentially requires running the forward-backward algorithm, which means it is expensive. Given the output of forward-backward, if there is a way to perform a more accurate parameter update (like going straight to the maximum likelihood value), it is probably worth it. That is what we show in the other tutorials with the reimplementation of the fit! method.

Second, HMM parameters live in a constrained space, which calls for a projected gradient descent. Most notably, the transition matrix must be stochastic, and the orthogonal projection onto this set (the Birkhoff polytope) is not easy to obtain.

Still, first order optimization can be relevant when we lack explicit formulas for maximum likelihood.


This page was generated using Literate.jl.

diff --git a/dev/examples/basics/index.html b/dev/examples/basics/index.html index 80e6cf14..2e1a8e57 100644 --- a/dev/examples/basics/index.html +++ b/dev/examples/basics/index.html @@ -44,4 +44,4 @@ 0.7 0.3 0.3 0.7
map(dist -> dist.μ, hcat(obs_distributions(hmm_est_concat), obs_distributions(hmm)))
2×2 Matrix{Vector{Float64}}:
  [-0.924126, -1.03495, -1.03126]  [-1.0, -1.0, -1.0]
- [0.989244, 1.00748, 0.933342]    [1.0, 1.0, 1.0]

This page was generated using Literate.jl.

+ [0.989244, 1.00748, 0.933342] [1.0, 1.0, 1.0]

This page was generated using Literate.jl.

diff --git a/dev/examples/controlled/index.html b/dev/examples/controlled/index.html index 9ef1462d..88f2d32d 100644 --- a/dev/examples/controlled/index.html +++ b/dev/examples/controlled/index.html @@ -74,4 +74,4 @@ -0.994459 -1.0
hcat(hmm_est.dist_coeffs[2], hmm.dist_coeffs[2])
3×2 Matrix{Float64}:
  0.994041  1.0
  0.994866  1.0
- 1.02258   1.0

This page was generated using Literate.jl.

+ 1.02258 1.0

This page was generated using Literate.jl.

diff --git a/dev/examples/interfaces/index.html b/dev/examples/interfaces/index.html index eafe0f8b..41c4332a 100644 --- a/dev/examples/interfaces/index.html +++ b/dev/examples/interfaces/index.html @@ -88,4 +88,4 @@ [:, :, 2] = 0.591969 0.408031 - 0.42277 0.57723

This page was generated using Literate.jl.

+ 0.42277 0.57723

This page was generated using Literate.jl.

diff --git a/dev/examples/temporal/index.html b/dev/examples/temporal/index.html index 40e6988c..3bbcfce1 100644 --- a/dev/examples/temporal/index.html +++ b/dev/examples/temporal/index.html @@ -99,4 +99,4 @@ Distributions.Normal{Float64}(μ=-0.953169, σ=0.999476) … Distributions.Normal{Float64}(μ=-1.0, σ=1.0) Distributions.Normal{Float64}(μ=-1.96275, σ=1.00499) Distributions.Normal{Float64}(μ=-2.0, σ=1.0)
hcat(obs_distributions(hmm_est, 2), obs_distributions(hmm, 2))
2×2 Matrix{Distributions.Normal{Float64}}:
  Distributions.Normal{Float64}(μ=0.948362, σ=0.976429)  …  Distributions.Normal{Float64}(μ=1.0, σ=1.0)
- Distributions.Normal{Float64}(μ=1.99179, σ=0.994492)      Distributions.Normal{Float64}(μ=2.0, σ=1.0)

This page was generated using Literate.jl.

+ Distributions.Normal{Float64}(μ=1.99179, σ=0.994492) Distributions.Normal{Float64}(μ=2.0, σ=1.0)

This page was generated using Literate.jl.

diff --git a/dev/examples/types/index.html b/dev/examples/types/index.html index 5f7cd7ff..17c6d7ce 100644 --- a/dev/examples/types/index.html +++ b/dev/examples/types/index.html @@ -27,4 +27,4 @@ 0.215753 ⋅ 0.784247
transition_matrix(hmm)
3×3 SparseArrays.SparseMatrixCSC{Float64, Int64} with 6 stored entries:
  0.8  0.2   ⋅ 
   ⋅   0.8  0.2
- 0.2   ⋅   0.8

This page was generated using Literate.jl.

+ 0.2 ⋅ 0.8

This page was generated using Literate.jl.

diff --git a/dev/formulas/index.html b/dev/formulas/index.html index 59585ad0..9f2230a0 100644 --- a/dev/formulas/index.html +++ b/dev/formulas/index.html @@ -56,4 +56,4 @@ \frac{\partial \log \mathcal{L}}{\partial a_{i,j}} &= \sum_{t=1}^{T-1} \bar{\alpha}_{i,t} \frac{b_{j,t+1}}{m_{t+1}} \bar{\beta}_{j,t+1} \\ \frac{\partial \log \mathcal{L}}{\partial \log b_{j,1}} &= \pi_j \frac{b_{j,1}}{m_1} \bar{\beta}_{j,1} = \frac{\bar{\alpha}_{j,1} \bar{\beta}_{j,1}}{c_1} = \gamma_{j,1} \\ \frac{\partial \log \mathcal{L}}{\partial \log b_{j,t}} &= \sum_{i=1}^N \bar{\alpha}_{i,t-1} a_{i,j} \frac{b_{j,t}}{m_t} \bar{\beta}_{j,t} = \frac{\bar{\alpha}_{j,t} \bar{\beta}_{j,t}}{c_t} = \gamma_{j,t} -\end{align*}\]

Bibliography

+\end{align*}\]

Bibliography

diff --git a/dev/index.html b/dev/index.html index 604c95cd..ab5faee3 100644 --- a/dev/index.html +++ b/dev/index.html @@ -1,6 +1,6 @@ -Home · HiddenMarkovModels.jl

HiddenMarkovModels.jl

Stable Dev Build Status Coverage Code Style: Blue

DOI Aqua QA JET

A Julia package for simulation, inference and learning of Hidden Markov Models.

Getting started

This package can be installed using Julia's package manager:

pkg> add HiddenMarkovModels

Then, you can create your first model as follows:

using Distributions, HiddenMarkovModels
+Home · HiddenMarkovModels.jl

HiddenMarkovModels.jl

Stable Dev Build Status Coverage Code Style: Blue

DOI Aqua QA JET

A Julia package for simulation, inference and learning of Hidden Markov Models.

Getting started

This package can be installed using Julia's package manager:

pkg> add HiddenMarkovModels

Then, you can create your first model as follows:

using Distributions, HiddenMarkovModels
 init = [0.4, 0.6]
 trans = [0.9 0.1; 0.2 0.8]
 dists = [Normal(-1.0), Normal(1.0)]
-hmm = HMM(init, trans, dists)

Take a look at the documentation to know what to do next!

Some background

HMMs are a widely used modeling framework in signal processing, bioinformatics and plenty of other fields. They capture the distribution of an observation sequence $(Y_t)$ by assuming the existence of a latent state sequence $(X_t)$ such that:

  • the state follows a (discrete time, discrete space) Markov chain $\mathbb{P}_\theta(X_t | X_{t-1})$
  • the observation distribution is determined at each time by the state $\mathbb{P}_\theta(Y_t | X_t)$

Each of the problems below has an efficient solution algorithm which our package implements:

ProblemGoalAlgorithm
EvaluationLikelihood of the observation sequence $\mathbb{P}_\theta(Y_{1:T})$Forward
FilteringNon-anticipative state marginals $\mathbb{P}_\theta(X_t \vert Y_{1:t})$Forward
SmoothingState marginals $\mathbb{P}_\theta(X_t \vert Y_{1:T})$Forward-backward
DecodingMost likely state sequence $\underset{X_{1:T}}{\mathrm{argmax}}~\mathbb{P}_\theta(X_{1:T} \vert Y_{1:T})$Viterbi
LearningMaximum likelihood parameter $\underset{\theta}{\mathrm{argmax}}~\mathbb{P}_\theta(Y_{1:T})$Baum-Welch

Main features

This package is generic. Observations can be arbitrary Julia objects, not just scalars or arrays. Number types are not restricted to floating point, which enables automatic differentiation. Time-heterogeneous or controlled HMMs are supported out of the box.

This package is fast. All the inference functions have allocation-free versions, which leverage efficient linear algebra subroutines. We will include extensive benchmarks against Julia and Python competitors.

This package is reliable. It gives the same results as the previous reference package up to numerical accuracy. The test suite incorporates quality checks as well as type stability and allocation analysis.

Contributing

If you spot a bug or want to ask about a new feature, please open an issue on the GitHub repository. Once the issue receives positive feedback, feel free to try and fix it with a pull request that follows the BlueStyle guidelines.

Acknowledgements

A big thank you to Maxime Mouchet and Jacob Schreiber, the respective lead devs of alternative packages HMMBase.jl and pomegranate, for their help and advice. Logo by Clément Mantoux based on a portrait of Andrey Markov.

+hmm = HMM(init, trans, dists)

Take a look at the documentation to know what to do next!

Some background

Hidden Markov Models (HMMs) are a widely used modeling framework in signal processing, bioinformatics and plenty of other fields. They explain an observation sequence $(Y_t)$ by assuming the existence of a latent Markovian state sequence $(X_t)$ whose current value determines the distribution of observations. In our framework, both the state and the observation sequence are also allowed to depend on a known control sequence $(U_t)$. Each of the problems below has an efficient solution algorithm which our package implements:

ProblemGoalAlgorithm
EvaluationLikelihood of the observation sequenceForward
FilteringLast state marginalsForward
SmoothingAll state marginalsForward-backward
DecodingMost likely state sequenceViterbi
LearningMaximum likelihood parameterBaum-Welch

Main features

This package is generic. Observations can be arbitrary Julia objects, not just scalars or arrays. Number types are not restricted to floating point, which enables automatic differentiation. Time-dependent or controlled HMMs are supported out of the box.

This package is fast. All the inference functions have allocation-free versions, which leverage efficient linear algebra subroutines. We will include extensive benchmarks against Julia and Python competitors.

This package is reliable. It gives the same results as the previous reference package up to numerical accuracy. The test suite incorporates quality checks as well as type stability and allocation analysis.

Contributing

If you spot a bug or want to ask about a new feature, please open an issue on the GitHub repository. Once the issue receives positive feedback, feel free to try and fix it with a pull request that follows the BlueStyle guidelines.

Acknowledgements

A big thank you to Maxime Mouchet and Jacob Schreiber, the respective lead devs of alternative packages HMMBase.jl and pomegranate, for their help and advice. Logo by Clément Mantoux based on a portrait of Andrey Markov.

diff --git a/dev/search_index.js b/dev/search_index.js index a44581c5..68915c2b 100644 --- a/dev/search_index.js +++ b/dev/search_index.js @@ -1,3 +1,3 @@ var documenterSearchIndex = {"docs": -[{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"EditURL = \"../../../examples/controlled.jl\"","category":"page"},{"location":"examples/controlled/#Control-dependency","page":"Control dependency","title":"Control dependency","text":"","category":"section"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"Here, we give a example of controlled HMM (also called input-output HMM), in the special case of Markov switching regression.","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"using Distributions\nusing HiddenMarkovModels\nimport HiddenMarkovModels as HMMs\nusing LinearAlgebra\nusing Random\nusing SimpleUnPack\nusing StatsAPI","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"rng = Random.default_rng()\nRandom.seed!(rng, 63);\nnothing #hide","category":"page"},{"location":"examples/controlled/#Model","page":"Control dependency","title":"Model","text":"","category":"section"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"A Markov switching regression is like a classical regression, except that the weights depend on the unobserved state of an HMM. We can represent it with the following subtype of AbstractHMM, which has one vector of coefficients beta_i per state.","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"struct ControlledGaussianHMM{T} <: AbstractHMM\n init::Vector{T}\n trans::Matrix{T}\n dist_coeffs::Vector{Vector{T}}\nend","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"In state i with a vector of controls u, our observation is given by the linear model y sim mathcalN(beta_i^top u 1).","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"function HMMs.initialization(hmm::ControlledGaussianHMM)\n return hmm.init\nend\n\nfunction HMMs.transition_matrix(hmm::ControlledGaussianHMM)\n return hmm.trans\nend\n\nfunction HMMs.obs_distributions(hmm::ControlledGaussianHMM, control::AbstractVector)\n return [Normal(dot(hmm.dist_coeffs[i], control), 1.0) for i in 1:length(hmm)]\nend","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"In this case, the transition matrix does not depend on the control.","category":"page"},{"location":"examples/controlled/#Simulation","page":"Control dependency","title":"Simulation","text":"","category":"section"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"d = 3\ninit = [0.8, 0.2]\ntrans = [0.7 0.3; 0.3 0.7]\ndist_coeffs = [-ones(d), ones(d)]\nhmm = ControlledGaussianHMM(init, trans, dist_coeffs);\nnothing #hide","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"Simulation requires a vector of controls, each being a vector itself with the right dimension.","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"Let us build several sequences of variable lengths.","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"control_seqs = [[randn(rng, d) for t in 1:rand(100:200)] for k in 1:100];\nobs_seqs = [rand(rng, hmm, control_seq).obs_seq for control_seq in control_seqs];\n\nobs_seq = reduce(vcat, obs_seqs)\ncontrol_seq = reduce(vcat, control_seqs)\nseq_ends = cumsum(length.(obs_seqs));\nnothing #hide","category":"page"},{"location":"examples/controlled/#Inference","page":"Control dependency","title":"Inference","text":"","category":"section"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"Not much changes from the case with simple time dependency.","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"best_state_seq, _ = viterbi(hmm, obs_seq; control_seq, seq_ends)","category":"page"},{"location":"examples/controlled/#Learning","page":"Control dependency","title":"Learning","text":"","category":"section"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"Once more, we override the fit! function. The state-related parameters are estimated in the standard way. Meanwhile, the observation coefficients are given by the formula for weighted least squares.","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"function StatsAPI.fit!(\n hmm::ControlledGaussianHMM{T},\n fb_storage::HMMs.ForwardBackwardStorage,\n obs_seq::AbstractVector;\n control_seq::AbstractVector,\n seq_ends::AbstractVector{Int},\n) where {T}\n @unpack γ, ξ = fb_storage\n N = length(hmm)\n\n hmm.init .= 0\n hmm.trans .= 0\n for k in eachindex(seq_ends)\n t1, t2 = HMMs.seq_limits(seq_ends, k)\n hmm.init .+= γ[:, t1]\n hmm.trans .+= sum(ξ[t1:t2])\n end\n hmm.init ./= sum(hmm.init)\n for row in eachrow(hmm.trans)\n row ./= sum(row)\n end\n\n U = reduce(hcat, control_seq)'\n y = obs_seq\n for i in 1:N\n W = sqrt.(Diagonal(γ[i, :]))\n hmm.dist_coeffs[i] = (W * U) \\ (W * y)\n end\nend","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"Now we put it to the test.","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"init_guess = [0.7, 0.3]\ntrans_guess = [0.6 0.4; 0.4 0.6]\ndist_coeffs_guess = [-0.7 * ones(d), 0.7 * ones(d)]\nhmm_guess = ControlledGaussianHMM(init_guess, trans_guess, dist_coeffs_guess);\nnothing #hide","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"hmm_est, loglikelihood_evolution = baum_welch(hmm_guess, obs_seq; control_seq, seq_ends)\nfirst(loglikelihood_evolution), last(loglikelihood_evolution)","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"How did we perform?","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"cat(transition_matrix(hmm_est), transition_matrix(hmm); dims=3)","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"hcat(hmm_est.dist_coeffs[1], hmm.dist_coeffs[1])","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"hcat(hmm_est.dist_coeffs[2], hmm.dist_coeffs[2])","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"This page was generated using Literate.jl.","category":"page"},{"location":"debugging/#Debugging","page":"Debugging","title":"Debugging","text":"","category":"section"},{"location":"debugging/#Numerical-overflow","page":"Debugging","title":"Numerical overflow","text":"","category":"section"},{"location":"debugging/","page":"Debugging","title":"Debugging","text":"The most frequent error you will encounter is an OverflowError during inference, telling you that some values are infinite or NaN. This can happen for a variety of reasons, so here are a few leads worth investigating:","category":"page"},{"location":"debugging/","page":"Debugging","title":"Debugging","text":"Increase the duration of the sequence / the number of sequences to get more data\nAdd a prior to your transition matrix / observation distributions to avoid degenerate behavior like zero variance in a Gaussian\nReduce the number of states to make every one of them useful\nPick a better initialization to start closer to the supposed ground truth\nUse numerically stable number types (such as LogarithmicNumbers.jl) in strategic places, but beware: these numbers don't play nicely with Distributions.jl, so you may have to roll out your own observation distributions.","category":"page"},{"location":"debugging/#Performance","page":"Debugging","title":"Performance","text":"","category":"section"},{"location":"debugging/","page":"Debugging","title":"Debugging","text":"If your algorithms are too slow, the general advice always applies:","category":"page"},{"location":"debugging/","page":"Debugging","title":"Debugging","text":"Use BenchmarkTools.jl to establish a baseline\nUse profiling to see where you spend most of your time\nUse JET.jl to track down type instabilities\nUse AllocCheck.jl to reduce allocations","category":"page"},{"location":"api/#API-reference","page":"API reference","title":"API reference","text":"","category":"section"},{"location":"api/","page":"API reference","title":"API reference","text":"HiddenMarkovModels","category":"page"},{"location":"api/#HiddenMarkovModels","page":"API reference","title":"HiddenMarkovModels","text":"HiddenMarkovModels\n\nA Julia package for HMM modeling, simulation, inference and learning.\n\n\n\n\n\n","category":"module"},{"location":"api/#Types","page":"API reference","title":"Types","text":"","category":"section"},{"location":"api/","page":"API reference","title":"API reference","text":"AbstractHMM\nHMM","category":"page"},{"location":"api/#HiddenMarkovModels.AbstractHMM","page":"API reference","title":"HiddenMarkovModels.AbstractHMM","text":"AbstractHMM\n\nAbstract supertype for an HMM amenable to simulation, inference and learning.\n\nInterface\n\nTo create your own subtype of AbstractHMM, you need to implement the following methods for inference:\n\ninitialization(hmm)\ntransition_matrix(hmm, control)\nobs_distributions(hmm, control)\n\nFor learning, two more methods must be implemented:\n\nApplicable functions\n\nAny AbstractHMM which satisfies the inference interface can be given to the following functions:\n\nrand(rng, hmm, control_seq)\nlogdensityof(hmm, obs_seq; control_seq, seq_ends)\nforward(hmm, obs_seq; control_seq, seq_ends)\nviterbi(hmm, obs_seq; control_seq, seq_ends)\nforward_backward(hmm, obs_seq; control_seq, seq_ends)\n\nIf it satisfies the learning interface, the following function also applies:\n\n[baum_welch(hmm, hmm_guess, obs_seq; control_seq, seq_ends)]\n\n\n\n\n\n","category":"type"},{"location":"api/#HiddenMarkovModels.HMM","page":"API reference","title":"HiddenMarkovModels.HMM","text":"struct HMM{V<:(AbstractVector), M<:(AbstractMatrix), VD<:(AbstractVector)} <: AbstractHMM\n\nBasic implementation of an HMM.\n\nFields\n\ninit::AbstractVector: initial state probabilities\ntrans::AbstractMatrix: state transition matrix\ndists::AbstractVector: observation distributions\n\n\n\n\n\n","category":"type"},{"location":"api/#Interface","page":"API reference","title":"Interface","text":"","category":"section"},{"location":"api/","page":"API reference","title":"API reference","text":"initialization\ntransition_matrix\nobs_distributions","category":"page"},{"location":"api/#HiddenMarkovModels.initialization","page":"API reference","title":"HiddenMarkovModels.initialization","text":"initialization(hmm)\n\nReturn the vector of initial state probabilities for hmm.\n\n\n\n\n\n","category":"function"},{"location":"api/#HiddenMarkovModels.transition_matrix","page":"API reference","title":"HiddenMarkovModels.transition_matrix","text":"transition_matrix(hmm)\ntransition_matrix(hmm, control)\n\nReturn the matrix of state transition probabilities for hmm (when control is applied).\n\n\n\n\n\n","category":"function"},{"location":"api/#HiddenMarkovModels.obs_distributions","page":"API reference","title":"HiddenMarkovModels.obs_distributions","text":"obs_distributions(hmm)\nobs_distributions(hmm, control)\n\nReturn a vector of observation distributions, one for each state of hmm (when control is applied).\n\nThese distribution objects should implement\n\nRandom.rand(rng, dist) for sampling\nDensityInterface.logdensityof(dist, obs) for inference\nStatsAPI.fit!(dist, obs_seq, weight_seq) for learning\n\n\n\n\n\n","category":"function"},{"location":"api/#Utils","page":"API reference","title":"Utils","text":"","category":"section"},{"location":"api/","page":"API reference","title":"API reference","text":"length\nrand\neltype\nseq_limits","category":"page"},{"location":"api/#Base.length","page":"API reference","title":"Base.length","text":"length(hmm)\n\nReturn the number of states of hmm.\n\n\n\n\n\n","category":"function"},{"location":"api/#Base.rand","page":"API reference","title":"Base.rand","text":"rand([rng,] hmm, T)\nrand([rng,] hmm, control_seq)\n\nSimulate hmm for T time steps, or when the sequence control_seq is applied.\n\nReturn a named tuple (; state_seq, obs_seq).\n\n\n\n\n\n","category":"function"},{"location":"api/#Base.eltype","page":"API reference","title":"Base.eltype","text":"eltype(hmm, obs, control)\n\nReturn a type that can accommodate forward-backward computations for hmm on observations similar to obs.\n\nIt is typically a promotion between the element type of the initialization, the element type of the transition matrix, and the type of an observation logdensity evaluated at obs.\n\n\n\n\n\n","category":"function"},{"location":"api/#HiddenMarkovModels.seq_limits","page":"API reference","title":"HiddenMarkovModels.seq_limits","text":"seq_limits(seq_ends, k)\n\n\nReturn a tuple (t1, t2) giving the begin and end indices of subsequence k within a set of sequences ending at seq_ends.\n\n\n\n\n\n","category":"function"},{"location":"api/#Inference","page":"API reference","title":"Inference","text":"","category":"section"},{"location":"api/","page":"API reference","title":"API reference","text":"logdensityof\nforward\nviterbi\nforward_backward","category":"page"},{"location":"api/#DensityInterface.logdensityof","page":"API reference","title":"DensityInterface.logdensityof","text":"logdensityof(hmm)\n\nReturn the prior loglikelihood associated with the parameters of hmm.\n\n\n\n\n\nlogdensityof(hmm, obs_seq; control_seq, seq_ends)\n\n\nRun the forward algorithm to compute the loglikelihood of obs_seq for hmm, integrating over all possible state sequences.\n\nKeyword arguments\n\ncontrol_seq: a control sequence with the same length as obs_seq\nseq_ends: in the case where obs_seq and control_seq are concatenations of multiple sequences, seq_ends contains the indices at which each of those sequences ends\n\n\n\n\n\nlogdensityof(hmm, obs_seq, state_seq; control_seq, seq_ends)\n\n\nRun the forward algorithm to compute the the joint loglikelihood of obs_seq and state_seq for hmm.\n\nKeyword arguments\n\ncontrol_seq: a control sequence with the same length as obs_seq\nseq_ends: in the case where obs_seq and control_seq are concatenations of multiple sequences, seq_ends contains the indices at which each of those sequences ends\n\n\n\n\n\n","category":"function"},{"location":"api/#HiddenMarkovModels.forward","page":"API reference","title":"HiddenMarkovModels.forward","text":"forward(hmm, obs_seq; control_seq, seq_ends)\n\n\nApply the forward algorithm to infer the current state after sequence obs_seq for hmm.\n\nReturn a tuple (storage.α, sum(storage.logL)) where storage is of type ForwardStorage.\n\nKeyword arguments\n\ncontrol_seq: a control sequence with the same length as obs_seq\nseq_ends: in the case where obs_seq and control_seq are concatenations of multiple sequences, seq_ends contains the indices at which each of those sequences ends\n\n\n\n\n\n","category":"function"},{"location":"api/#HiddenMarkovModels.viterbi","page":"API reference","title":"HiddenMarkovModels.viterbi","text":"viterbi(hmm, obs_seq; control_seq, seq_ends)\n\n\nApply the Viterbi algorithm to infer the most likely state sequence corresponding to obs_seq for hmm.\n\nReturn a tuple (storage.q, sum(storage.logL)) where storage is of type ViterbiStorage.\n\nKeyword arguments\n\ncontrol_seq: a control sequence with the same length as obs_seq\nseq_ends: in the case where obs_seq and control_seq are concatenations of multiple sequences, seq_ends contains the indices at which each of those sequences ends\n\n\n\n\n\n","category":"function"},{"location":"api/#HiddenMarkovModels.forward_backward","page":"API reference","title":"HiddenMarkovModels.forward_backward","text":"forward_backward(hmm, obs_seq; control_seq, seq_ends)\n\n\nApply the forward-backward algorithm to infer the posterior state and transition marginals during sequence obs_seq for hmm.\n\nReturn a tuple (storage.γ, sum(storage.logL)) where storage is of type ForwardBackwardStorage.\n\ncontrol_seq: a control sequence with the same length as obs_seq\nseq_ends: in the case where obs_seq and control_seq are concatenations of multiple sequences, seq_ends contains the indices at which each of those sequences ends\n\n\n\n\n\n","category":"function"},{"location":"api/#Learning","page":"API reference","title":"Learning","text":"","category":"section"},{"location":"api/","page":"API reference","title":"API reference","text":"baum_welch\nfit!","category":"page"},{"location":"api/#HiddenMarkovModels.baum_welch","page":"API reference","title":"HiddenMarkovModels.baum_welch","text":"baum_welch(\n hmm_guess,\n obs_seq;\n control_seq,\n seq_ends,\n atol,\n max_iterations,\n loglikelihood_increasing\n)\n\n\nApply the Baum-Welch algorithm to estimate the parameters of an HMM on obs_seq, starting from hmm_guess.\n\nReturn a tuple (hmm_est, loglikelihood_evolution) where hmm_est is the estimated HMM and loglikelihood_evolution is a vector of loglikelihood values, one per iteration of the algorithm.\n\nKeyword arguments\n\ncontrol_seq: a control sequence with the same length as obs_seq\nseq_ends: in the case where obs_seq and control_seq are concatenations of multiple sequences, seq_ends contains the indices at which each of those sequences ends\natol: minimum loglikelihood increase at an iteration of the algorithm (otherwise the algorithm is deemed to have converged)\nmax_iterations: maximum number of iterations of the algorithm\nloglikelihood_increasing: whether to throw an error if the loglikelihood decreases\n\n\n\n\n\n","category":"function"},{"location":"api/#StatsAPI.fit!","page":"API reference","title":"StatsAPI.fit!","text":"fit!(\n hmm::AbstractHMM,\n fb_storage::ForwardBackwardStorage,\n obs_seq::AbstractVector;\n control_seq::AbstractVector,\n seq_ends::AbstractVector{Int},\n)\n\nUpdate hmm in-place based on information generated during forward-backward.\n\n\n\n\n\n","category":"function"},{"location":"api/#In-place-versions","page":"API reference","title":"In-place versions","text":"","category":"section"},{"location":"api/","page":"API reference","title":"API reference","text":"HiddenMarkovModels.ForwardStorage\nHiddenMarkovModels.initialize_forward\nHiddenMarkovModels.forward!\nHiddenMarkovModels.ViterbiStorage\nHiddenMarkovModels.initialize_viterbi\nHiddenMarkovModels.viterbi!\nHiddenMarkovModels.ForwardBackwardStorage\nHiddenMarkovModels.initialize_forward_backward\nHiddenMarkovModels.forward_backward!\nHiddenMarkovModels.baum_welch!","category":"page"},{"location":"api/#HiddenMarkovModels.ForwardStorage","page":"API reference","title":"HiddenMarkovModels.ForwardStorage","text":"struct ForwardStorage{R}\n\nFields\n\nOnly the fields with a description are part of the public API.\n\nα::Matrix: posterior last state marginals `α[i] = ℙ(X[T]=i | Y[1:T])\nlogL::Vector: one loglikelihood per observation sequence\nB::Matrix\nc::Vector\n\n\n\n\n\n","category":"type"},{"location":"api/#HiddenMarkovModels.initialize_forward","page":"API reference","title":"HiddenMarkovModels.initialize_forward","text":"initialize_forward(hmm, obs_seq; control_seq, seq_ends)\n\n\n\n\n\n\n","category":"function"},{"location":"api/#HiddenMarkovModels.forward!","page":"API reference","title":"HiddenMarkovModels.forward!","text":"forward!(storage, hmm, obs_seq; control_seq, seq_ends)\n\n\n\n\n\n\n","category":"function"},{"location":"api/#HiddenMarkovModels.ViterbiStorage","page":"API reference","title":"HiddenMarkovModels.ViterbiStorage","text":"struct ViterbiStorage{R}\n\nFields\n\nOnly the fields with a description are part of the public API.\n\nq::Vector{Int64}: most likely state sequence q[t] = argmaxᵢ ℙ(X[t]=i | Y[1:T])\nlogL::Vector: one joint loglikelihood per pair (observation sequence, most likely state sequence)\nlogB::Matrix\nϕ::Matrix\nψ::Matrix{Int64}\n\n\n\n\n\n","category":"type"},{"location":"api/#HiddenMarkovModels.initialize_viterbi","page":"API reference","title":"HiddenMarkovModels.initialize_viterbi","text":"initialize_viterbi(hmm, obs_seq; control_seq, seq_ends)\n\n\n\n\n\n\n","category":"function"},{"location":"api/#HiddenMarkovModels.viterbi!","page":"API reference","title":"HiddenMarkovModels.viterbi!","text":"viterbi!(storage, hmm, obs_seq; control_seq, seq_ends)\n\n\n\n\n\n\n","category":"function"},{"location":"api/#HiddenMarkovModels.ForwardBackwardStorage","page":"API reference","title":"HiddenMarkovModels.ForwardBackwardStorage","text":"struct ForwardBackwardStorage{R, M<:AbstractArray{R, 2}}\n\nFields\n\nOnly the fields with a description are part of the public API.\n\nγ::Matrix: posterior state marginals γ[i,t] = ℙ(X[t]=i | Y[1:T])\nξ::Vector{M} where {R, M<:AbstractMatrix{R}}: posterior transition marginals ξ[t][i,j] = ℙ(X[t]=i, X[t+1]=j | Y[1:T])\nlogL::Vector: one loglikelihood per observation sequence\nB::Matrix\nα::Matrix\nc::Vector\nβ::Matrix\nBβ::Matrix\n\n\n\n\n\n","category":"type"},{"location":"api/#HiddenMarkovModels.initialize_forward_backward","page":"API reference","title":"HiddenMarkovModels.initialize_forward_backward","text":"initialize_forward_backward(\n hmm,\n obs_seq;\n control_seq,\n seq_ends,\n transition_marginals\n)\n\n\n\n\n\n\n","category":"function"},{"location":"api/#HiddenMarkovModels.forward_backward!","page":"API reference","title":"HiddenMarkovModels.forward_backward!","text":"forward_backward!(\n storage,\n hmm,\n obs_seq;\n control_seq,\n seq_ends,\n transition_marginals\n)\n\n\n\n\n\n\n","category":"function"},{"location":"api/#HiddenMarkovModels.baum_welch!","page":"API reference","title":"HiddenMarkovModels.baum_welch!","text":"baum_welch!(\n fb_storage,\n logL_evolution,\n hmm,\n obs_seq;\n control_seq,\n seq_ends,\n atol,\n max_iterations,\n loglikelihood_increasing\n)\n\n\n\n\n\n\n","category":"function"},{"location":"api/#Misc","page":"API reference","title":"Misc","text":"","category":"section"},{"location":"api/","page":"API reference","title":"API reference","text":"HiddenMarkovModels.rand_prob_vec\nHiddenMarkovModels.rand_trans_mat\nHiddenMarkovModels.LightDiagNormal\nHiddenMarkovModels.LightCategorical\nHiddenMarkovModels.fit_in_sequence!","category":"page"},{"location":"api/#HiddenMarkovModels.rand_prob_vec","page":"API reference","title":"HiddenMarkovModels.rand_prob_vec","text":"rand_prob_vec([rng, ::Type{R},] N)\n\nGenerate a random probability distribution of size N with normalized uniform entries.\n\n\n\n\n\n","category":"function"},{"location":"api/#HiddenMarkovModels.rand_trans_mat","page":"API reference","title":"HiddenMarkovModels.rand_trans_mat","text":"rand_trans_mat([rng, ::Type{R},] N)\n\nGenerate a random transition matrix of size (N, N) with normalized uniform entries.\n\n\n\n\n\n","category":"function"},{"location":"api/#HiddenMarkovModels.LightDiagNormal","page":"API reference","title":"HiddenMarkovModels.LightDiagNormal","text":"struct LightDiagNormal{T1, T2, T3, V1<:AbstractArray{T1, 1}, V2<:AbstractArray{T2, 1}, V3<:AbstractArray{T3, 1}}\n\nAn HMMs-compatible implementation of a multivariate normal distribution with diagonal covariance, enabling allocation-free in-place estimation.\n\nThis is not part of the public API and is expected to change.\n\nFields\n\nμ::AbstractVector: means\nσ::AbstractVector: standard deviations\nlogσ::AbstractVector: log standard deviations\n\n\n\n\n\n","category":"type"},{"location":"api/#HiddenMarkovModels.LightCategorical","page":"API reference","title":"HiddenMarkovModels.LightCategorical","text":"struct LightCategorical{T1, T2, V1<:AbstractArray{T1, 1}, V2<:AbstractArray{T2, 1}}\n\nAn HMMs-compatible implementation of a discrete categorical distribution, enabling allocation-free in-place estimation.\n\nThis is not part of the public API and is expected to change.\n\nFields\n\np::AbstractVector: class probabilities\nlogp::AbstractVector: log class probabilities\n\n\n\n\n\n","category":"type"},{"location":"api/#HiddenMarkovModels.fit_in_sequence!","page":"API reference","title":"HiddenMarkovModels.fit_in_sequence!","text":"fit_in_sequence!(dists, i, x, w)\n\n\nModify the i-th element of dists by fitting it to an observation sequence x with associated weight sequence w.\n\nDefault behavior:\n\nfit!(dists[i], x, w)\n\nOverride for Distributions.jl (in the package extension)\n\ndists[i] = fit(eltype(dists), x, w)\n\n\n\n\n\n","category":"function"},{"location":"api/#Index","page":"API reference","title":"Index","text":"","category":"section"},{"location":"api/","page":"API reference","title":"API reference","text":"","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"EditURL = \"../../../examples/temporal.jl\"","category":"page"},{"location":"examples/temporal/#Time-dependency","page":"Time dependency","title":"Time dependency","text":"","category":"section"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"Here, we demonstrate what to do transition and observation laws depend on the current time. This time-dependent HMM is implemented as a particular case of controlled HMM.","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"using Distributions\nusing HiddenMarkovModels\nimport HiddenMarkovModels as HMMs\nusing Random\nusing SimpleUnPack\nusing StatsAPI","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"rng = Random.default_rng()\nRandom.seed!(rng, 63);\nnothing #hide","category":"page"},{"location":"examples/temporal/#Model","page":"Time dependency","title":"Model","text":"","category":"section"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"We focus on the particular case of a periodic HMM with period L. It has only one initialization vector, but L transition matrices and L vectors of observation distributions. Once again we need to subtype AbstractHMM.","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"struct PeriodicHMM{T<:Number,D,L} <: AbstractHMM\n init::Vector{T}\n trans_per::NTuple{L,Matrix{T}}\n dists_per::NTuple{L,Vector{D}}\nend","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"The interface definition is almost the same as in the homogeneous case, but we give the control variable (here the time) as an additional argument to transition_matrix and obs_distributions.","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"period(::PeriodicHMM{T,D,L}) where {T,D,L} = L\n\nfunction HMMs.initialization(hmm::PeriodicHMM)\n return hmm.init\nend\n\nfunction HMMs.transition_matrix(hmm::PeriodicHMM, t::Integer)\n l = (t - 1) % period(hmm) + 1\n return hmm.trans_per[l]\nend\n\nfunction HMMs.obs_distributions(hmm::PeriodicHMM, t::Integer)\n l = (t - 1) % period(hmm) + 1\n return hmm.dists_per[l]\nend","category":"page"},{"location":"examples/temporal/#Simulation","page":"Time dependency","title":"Simulation","text":"","category":"section"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"init = [0.8, 0.2]\ntrans_per = ([0.7 0.3; 0.3 0.7], [0.3 0.7; 0.7 0.3])\ndists_per = ([Normal(-1.0), Normal(-2.0)], [Normal(+1.0), Normal(+2.0)])\nhmm = PeriodicHMM(init, trans_per, dists_per);\nnothing #hide","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"Since the behavior of the model depends on control variables, we need to pass these to the simulation routine (instead of just the number of time steps T).","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"control_seq = 1:10\nstate_seq, obs_seq = rand(rng, hmm, control_seq);\nobs_seq'","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"The observations mostly alternate between positive and negative values, which is coherent with negative observation means at odd times and positive observation means at even times.","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"We now generate several sequences of variable lengths, for inference and learning tasks.","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"control_seqs = [1:rand(100:200) for k in 1:1000]\nobs_seqs = [rand(rng, hmm, control_seqs[k]).obs_seq for k in eachindex(control_seqs)];\n\nobs_seq = reduce(vcat, obs_seqs)\ncontrol_seq = reduce(vcat, control_seqs)\nseq_ends = cumsum(length.(obs_seqs));\nnothing #hide","category":"page"},{"location":"examples/temporal/#Inference","page":"Time dependency","title":"Inference","text":"","category":"section"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"All three inference algorithms work in the same way, except that we need to provide the control sequence as a keyword argument.","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"best_state_seq, _ = viterbi(hmm, obs_seq; control_seq, seq_ends)","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"For Viterbi, unsurprisingly, the most likely state sequence aligns with the sign of the observations.","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"vcat(obs_seq', best_state_seq')","category":"page"},{"location":"examples/temporal/#Learning","page":"Time dependency","title":"Learning","text":"","category":"section"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"When estimating parameters for a custom subtype of AbstractHMM, we have to override the fitting procedure after forward-backward (with an additional control_seq keyword argument). The key is to split the observations according to which periodic parameter they belong to.","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"function StatsAPI.fit!(\n hmm::PeriodicHMM{T},\n fb_storage::HMMs.ForwardBackwardStorage,\n obs_seq::AbstractVector;\n control_seq::AbstractVector,\n seq_ends::AbstractVector{Int},\n) where {T}\n @unpack γ, ξ = fb_storage\n L, N = period(hmm), length(hmm)\n\n hmm.init .= zero(T)\n for l in 1:L\n hmm.trans_per[l] .= zero(T)\n end\n for k in eachindex(seq_ends)\n t1, t2 = HMMs.seq_limits(seq_ends, k)\n hmm.init .+= γ[:, t1]\n for l in 1:L\n hmm.trans_per[l] .+= sum(ξ[(t1 + l - 1):L:t2])\n end\n end\n hmm.init ./= sum(hmm.init)\n for l in 1:L, row in eachrow(hmm.trans_per[l])\n row ./= sum(row)\n end\n\n for l in 1:L\n times_l = Int[]\n for k in eachindex(seq_ends)\n t1, t2 = HMMs.seq_limits(seq_ends, k)\n append!(times_l, (t1 + l - 1):L:t2)\n end\n for i in 1:N\n HMMs.fit_in_sequence!(hmm.dists_per[l], i, obs_seq[times_l], γ[i, times_l])\n end\n end\n\n for l in 1:L\n HMMs.check_hmm(hmm; control=l)\n end\n return nothing\nend","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"Now let's test our procedure with a reasonable guess.","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"init_guess = [0.7, 0.3]\ntrans_per_guess = ([0.6 0.4; 0.4 0.6], [0.4 0.6; 0.6 0.4])\ndists_per_guess = ([Normal(-0.7), Normal(-1.7)], [Normal(+0.7), Normal(+1.7)])\nhmm_guess = PeriodicHMM(init_guess, trans_per_guess, dists_per_guess);\nnothing #hide","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"Naturally, Baum-Welch also requires knowing control_seq.","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"hmm_est, loglikelihood_evolution = baum_welch(hmm_guess, obs_seq; control_seq, seq_ends);\nfirst(loglikelihood_evolution), last(loglikelihood_evolution)","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"Did we do well?","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"cat(transition_matrix(hmm_est, 1), transition_matrix(hmm, 1); dims=3)","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"cat(transition_matrix(hmm_est, 2), transition_matrix(hmm, 2); dims=3)","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"hcat(obs_distributions(hmm_est, 1), obs_distributions(hmm, 1))","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"hcat(obs_distributions(hmm_est, 2), obs_distributions(hmm, 2))","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"This page was generated using Literate.jl.","category":"page"},{"location":"alternatives/#Competitors","page":"Alternatives","title":"Competitors","text":"","category":"section"},{"location":"alternatives/#Julia","page":"Alternatives","title":"Julia","text":"","category":"section"},{"location":"alternatives/","page":"Alternatives","title":"Alternatives","text":"We compare features among the following Julia packages:","category":"page"},{"location":"alternatives/","page":"Alternatives","title":"Alternatives","text":"HiddenMarkovModels.jl (abbreviated to HMMs.jl)\nHMMBase.jl\nHMMGradients.jl","category":"page"},{"location":"alternatives/","page":"Alternatives","title":"Alternatives","text":"We discard MarkovModels.jl because its focus is GPU computation. There are also more generic packages for probabilistic programming, which are able to perform MCMC or variational inference (eg. Turing.jl) but we leave those aside.","category":"page"},{"location":"alternatives/","page":"Alternatives","title":"Alternatives","text":" HMMs.jl HMMBase.jl HMMGradients.jl\nAlgorithms Sim, FB, Vit, BW Sim, FB, Vit, BW FB\nObservation types anything Number / Vector anything\nObservation distributions DensityInterface.jl Distributions.jl manual\nMultiple sequences yes no yes\nPriors / structures possible no possible\nTemporal dependency yes no no\nControl dependency yes no no\nNumber types anything Float64 AbstractFloat\nAutomatic differentiation yes no yes\nLinear algebra yes yes no\nLogarithmic probabilities halfway halfway yes","category":"page"},{"location":"alternatives/","page":"Alternatives","title":"Alternatives","text":"Sim = Simulation, FB = Forward-Backward, Vit = Viterbi, BW = Baum-Welch","category":"page"},{"location":"alternatives/#Python","page":"Alternatives","title":"Python","text":"","category":"section"},{"location":"alternatives/","page":"Alternatives","title":"Alternatives","text":"hmmlearn (based on NumPy)\npomegrnate (based on PyTorch)\ndynamax (based on JAX)","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"EditURL = \"../../../examples/autodiff.jl\"","category":"page"},{"location":"examples/autodiff/#Autodiff","page":"Autodiff","title":"Autodiff","text":"","category":"section"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"Here we show how to compute gradients of the observation sequence loglikelihood with respect to various parameters.","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"using ComponentArrays\nusing DensityInterface\nusing Distributions\nusing Enzyme: Enzyme\nusing ForwardDiff: ForwardDiff\nusing HiddenMarkovModels\nimport HiddenMarkovModels as HMMs\nusing LinearAlgebra\nusing Random: Random, AbstractRNG\nusing StatsAPI\nusing Zygote: Zygote","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"rng = Random.default_rng()\nRandom.seed!(rng, 63);\nnothing #hide","category":"page"},{"location":"examples/autodiff/#Data-generation","page":"Autodiff","title":"Data generation","text":"","category":"section"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"init = [0.8, 0.2]\ntrans = [0.7 0.3; 0.3 0.7]\nmeans = [-1.0, 1.0]\ndists = Normal.(means)\nhmm = HMM(init, trans, dists);\nnothing #hide","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"obs_seqs = [rand(rng, hmm, 10).obs_seq, rand(rng, hmm, 20).obs_seq];\nobs_seq = reduce(vcat, obs_seqs)\nseq_ends = cumsum(length.(obs_seqs));\nnothing #hide","category":"page"},{"location":"examples/autodiff/#Forward-mode","page":"Autodiff","title":"Forward mode","text":"","category":"section"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"Since all of our code is type-generic, it is amenable to forward-mode automatic differentiation with ForwardDiff.jl.","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"Because this backend only accepts a single vector input, we wrap all parameters with ComponentArrays.jl, and define a new function to differentiate.","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"params = ComponentVector(; init, trans, means)\n\nfunction f(params::ComponentVector)\n new_hmm = HMM(params.init, params.trans, Normal.(params.means))\n return logdensityof(new_hmm, obs_seq; seq_ends)\nend;\nnothing #hide","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"The gradient computation is now straightforward. We will use this value as a source of truth to compare with reverse mode.","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"grad_f = ForwardDiff.gradient(f, params)","category":"page"},{"location":"examples/autodiff/#Reverse-mode","page":"Autodiff","title":"Reverse mode","text":"","category":"section"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"In the presence of many parameters, reverse mode automatic differentiation of the loglikelihood will be much more efficient. The package includes a chain rule for logdensityof, which means backends like Zygote.jl can be used out of the box.","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"grad_z = Zygote.gradient(f, params)[1]","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"grad_f ≈ grad_z","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"For increased efficiency, one can use Enzyme.jl and provide temporary storage. This requires going one level deeper, to the mutating HiddenMarkovModels.forward! function.","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"control_seq = fill(nothing, length(obs_seq))\n\nfunction f!(storage::HMMs.ForwardStorage, params::ComponentVector)\n new_hmm = HMM(params.init, params.trans, Normal.(params.means))\n HMMs.forward!(storage, new_hmm, obs_seq; control_seq, seq_ends)\n return sum(storage.logL)\nend\n\nstorage = HMMs.initialize_forward(hmm, obs_seq; control_seq, seq_ends)\nstorage_shadow = HMMs.initialize_forward(hmm, obs_seq; control_seq, seq_ends)\nparams_shadow = zero(params)\n\nEnzyme.autodiff(\n Enzyme.Reverse,\n f!,\n Enzyme.Active,\n Enzyme.Duplicated(storage, storage_shadow),\n Enzyme.Duplicated(params, params_shadow),\n)\n\ngrad_e = params_shadow","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"grad_e ≈ grad_f","category":"page"},{"location":"examples/autodiff/#Gradient-methods","page":"Autodiff","title":"Gradient methods","text":"","category":"section"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"Once we have gradients of the loglikelihood, it is a natural idea to perform gradient descent in order to fit the parameters of a custom HMM. However, there are two caveats we must keep in mind.","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"First, computing a gradient essentially requires running the forward-backward algorithm, which means it is expensive. Given the output of forward-backward, if there is a way to perform a more accurate parameter update (like going straight to the maximum likelihood value), it is probably worth it. That is what we show in the other tutorials with the reimplementation of the fit! method.","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"Second, HMM parameters live in a constrained space, which calls for a projected gradient descent. Most notably, the transition matrix must be stochastic, and the orthogonal projection onto this set (the Birkhoff polytope) is not easy to obtain.","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"Still, first order optimization can be relevant when we lack explicit formulas for maximum likelihood.","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"This page was generated using Literate.jl.","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"EditURL = \"../../../examples/types.jl\"","category":"page"},{"location":"examples/types/#Types","page":"Types","title":"Types","text":"","category":"section"},{"location":"examples/types/","page":"Types","title":"Types","text":"Here we explain why playing with different number and array types can be useful in an HMM.","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"using Distributions\nusing HiddenMarkovModels\nusing LinearAlgebra\nusing LogarithmicNumbers\nusing Random\nusing SparseArrays","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"rng = Random.default_rng()\nRandom.seed!(rng, 63);\nnothing #hide","category":"page"},{"location":"examples/types/#Logarithmic-numbers","page":"Types","title":"Logarithmic numbers","text":"","category":"section"},{"location":"examples/types/","page":"Types","title":"Types","text":"warning: Warning\nWork in progress","category":"page"},{"location":"examples/types/#Sparse-arrays","page":"Types","title":"Sparse arrays","text":"","category":"section"},{"location":"examples/types/","page":"Types","title":"Types","text":"Using sparse matrices is very useful for large models, because it means the memory and computational requirements will scale as the number of possible transitions. In general, this number is much smaller than the square of the number of states.","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"We can easily construct an HMM with a sparse transition matrix, where some transitions are structurally forbidden.","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"init = [0.2, 0.6, 0.2]\ntrans = sparse([\n 0.8 0.2 0.0\n 0.0 0.8 0.2\n 0.2 0.0 0.8\n])\ndists = [Normal(-2.0), Normal(0.0), Normal(+2.0)]\nhmm = HMM(init, trans, dists);\nnothing #hide","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"When we simulate it, the transitions outside of the nonzero coefficients simply cannot happen.","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"state_seq, obs_seq = rand(rng, hmm, 1000)\nstate_transitions = collect(zip(state_seq[1:(end - 1)], state_seq[2:end]));\nnothing #hide","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"count(isequal((2, 2)), state_transitions)","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"count(isequal((2, 1)), state_transitions)","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"Now we apply Baum-Welch from a guess with the right sparsity pattern.","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"init_guess = [0.3, 0.4, 0.3]\ntrans_guess = sparse([\n 0.7 0.3 0.0\n 0.0 0.7 0.3\n 0.3 0.0 0.7\n])\ndists_guess = [Normal(-1.5), Normal(0.0), Normal(+1.5)]\nhmm_guess = HMM(init_guess, trans_guess, dists_guess);\nnothing #hide","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"hmm_est, loglikelihood_evolution = baum_welch(hmm_guess, obs_seq);\nfirst(loglikelihood_evolution), last(loglikelihood_evolution)","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"The estimated model has kept the same sparsity pattern.","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"transition_matrix(hmm_est)","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"transition_matrix(hmm)","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"This page was generated using Literate.jl.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"EditURL = \"../../../examples/interfaces.jl\"","category":"page"},{"location":"examples/interfaces/#Interfaces","page":"Interfaces","title":"Interfaces","text":"","category":"section"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"Here we discuss how to extend the observation distributions or model fitting to satisfy specific needs.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"using DensityInterface\nusing Distributions\nusing HiddenMarkovModels\nimport HiddenMarkovModels as HMMs\nusing LinearAlgebra\nusing Random: Random, AbstractRNG\nusing StatsAPI","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"rng = Random.default_rng()\nRandom.seed!(rng, 63);\nnothing #hide","category":"page"},{"location":"examples/interfaces/#Custom-distributions","page":"Interfaces","title":"Custom distributions","text":"","category":"section"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"In an HMM object, the observation distributions do not need to come from Distributions.jl. They only need to implement three methods:","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"Random.rand(rng, dist) for sampling\nDensityInterface.logdensityof(dist, obs) for inference\nStatsAPI.fit!(dist, obs_seq, weight_seq) for learning","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"In addition, the observation can be arbitrary Julia types. So let's construct a distribution that generates stuff.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"struct Stuff{T}\n quantity::T\nend","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"The distribution will only be a wrapper for a normal distribution on the quantity.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"mutable struct StuffDist{T}\n quantity_mean::T\nend","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"Simulation is fairly easy.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"function Random.rand(rng::AbstractRNG, dist::StuffDist)\n quantity = dist.quantity_mean + randn(rng)\n return Stuff(quantity)\nend","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"It is important to declare to DensityInterface.jl that the custom distribution has a density, thanks to the following trait. The logdensity itself can be computed up to an additive constant without issue.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"DensityInterface.DensityKind(::StuffDist) = HasDensity()\n\nfunction DensityInterface.logdensityof(dist::StuffDist, obs::Stuff)\n return -abs2(obs.quantity - dist.quantity_mean) / 2\nend","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"Finally, the fitting procedure must happen in place, and take a sequence of weighted samples.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"function StatsAPI.fit!(\n dist::StuffDist, obs_seq::AbstractVector{<:Stuff}, weight_seq::AbstractVector{<:Real}\n)\n dist.quantity_mean =\n sum(weight_seq[k] * obs_seq[k].quantity for k in eachindex(obs_seq, weight_seq)) /\n sum(weight_seq)\n return nothing\nend","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"Let's put it to the test.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"init = [0.8, 0.2]\ntrans = [0.7 0.3; 0.3 0.7]\ndists = [StuffDist(-1.0), StuffDist(+1.0)]\nhmm = HMM(init, trans, dists);\nnothing #hide","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"When we sample an observation sequence, we get a vector of Stuff.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"state_seq, obs_seq = rand(rng, hmm, 100)\neltype(obs_seq)","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"And we can pass these observations to all of our inference algorithms.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"viterbi(hmm, obs_seq)","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"If we implement fit!, Baum-Welch also works seamlessly.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"init_guess = [0.7, 0.3]\ntrans_guess = [0.6 0.4; 0.4 0.6]\ndists_guess = [StuffDist(-0.5), StuffDist(+0.5)]\nhmm_guess = HMM(init_guess, trans_guess, dists_guess);\nnothing #hide","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"hmm_est, loglikelihood_evolution = baum_welch(hmm, obs_seq)\nfirst(loglikelihood_evolution), last(loglikelihood_evolution)","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"obs_distributions(hmm_est)","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"transition_matrix(hmm_est)","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"If you want more sophisticated examples, check out HiddenMarkovModels.LightDiagNormal and HiddenMarkovModels.LightCategorical, which are designed to be fast and allocation-free.","category":"page"},{"location":"examples/interfaces/#Custom-HMM-structures","page":"Interfaces","title":"Custom HMM structures","text":"","category":"section"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"In some scenarios, the vanilla Baum-Welch algorithm is not exactly what we want. For instance, we might have a prior on the parameters of our model, which we want to apply during the fitting step of the iterative procedure. Then we need to create a new type that satisfies the AbstractHMM interface.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"Let's make a simpler version of the built-in HMM, with a prior saying that each transition has already been observed a certain number of times. Such a prior can be very useful to regularize estimation and avoid numerical instabilities. It amounts to drawing every row of the transition matrix from a Dirichlet distribution, where each Dirichlet parameter is one plus the number of times the corresponding transition has been observed.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"struct PriorHMM{T,D} <: AbstractHMM\n init::Vector{T}\n trans::Matrix{T}\n dists::Vector{D}\n trans_prior_count::Int\nend","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"The basic requirements for AbstractHMM are the following three functions.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"While initialization will always have the same signature, transition_matrix and obs_distributions can accept an additional control argument, as we will see later on.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"HiddenMarkovModels.initialization(hmm::PriorHMM) = hmm.init\nHiddenMarkovModels.transition_matrix(hmm::PriorHMM) = hmm.trans\nHiddenMarkovModels.obs_distributions(hmm::PriorHMM) = hmm.dists","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"It is also possible to override logdensityof(hmm) and specify a prior loglikelihood for the model itself. If we forget to implement this, the loglikelihood computed in Baum-Welch will be missing a term, and thus it might decrease.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"function DensityInterface.logdensityof(hmm::PriorHMM)\n prior = Dirichlet(fill(hmm.trans_prior_count + 1, length(hmm)))\n return sum(logdensityof(prior, row) for row in eachrow(transition_matrix(hmm)))\nend","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"Finally, we must redefine the specific method of fit! that is used during Baum-Welch re-estimation. This function takes as inputs:","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"the hmm itself\na fb_storage of type HiddenMarkovModels.ForwardBackwardStorage containing the results of the forward-backward algorithm.\nthe same inputs as baum_welch for multiple sequences (we haven't encountered control_seq yet but its role will become clear in other tutorials)","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"The goal is to modify hmm in-place, updating parameters with their maximum likelihood estimates given current inference results. We will make use of the fields fb_storage.γ and fb_storage.ξ, which contain the state and transition marginals γ[i, t] and ξ[t][i, j] at each time step.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"function StatsAPI.fit!(\n hmm::PriorHMM,\n fb_storage::HiddenMarkovModels.ForwardBackwardStorage,\n obs_seq::AbstractVector;\n control_seq::AbstractVector,\n seq_ends::AbstractVector{Int},\n)\n # initialize to defaults without observations\n hmm.init .= 0\n hmm.trans .= hmm.trans_prior_count # our prior comes into play, otherwise 0\n # iterate over observation sequences\n for k in eachindex(seq_ends)\n # get sequence endpoints\n t1, t2 = seq_limits(seq_ends, k)\n # add estimated number of initializations in each state\n hmm.init .+= fb_storage.γ[:, t1]\n # add estimated number of transitions between each pair of states\n hmm.trans .+= sum(fb_storage.ξ[t1:t2])\n end\n # normalize\n hmm.init ./= sum(hmm.init)\n hmm.trans ./= sum(hmm.trans; dims=2)\n\n for i in 1:length(hmm)\n # weigh each sample by the marginal probability of being in state i\n weight_seq = fb_storage.γ[i, :]\n # fit observation distribution i using those weights\n fit!(hmm.dists[i], obs_seq, weight_seq)\n end\n\n # perform a few checks on the model\n HMMs.check_hmm(hmm)\n return nothing\nend","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"Note that some distributions, such as those from Distributions.jl:","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"do not support in-place fitting\nmight expect different input formats, e.g. higher-order arrays instead of a vector of objects","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"The function HiddenMarkovModels.fit_in_sequence! is a replacement for fit!, designed to handle Distributions.jl. You can overload it for your own objects too if needed.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"Now let's see that everything works.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"trans_prior_count = 10\nprior_hmm_guess = PriorHMM(init_guess, trans_guess, dists_guess, trans_prior_count);\nnothing #hide","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"prior_hmm_est, prior_logl_evolution = baum_welch(prior_hmm_guess, obs_seq)\nfirst(prior_logl_evolution), last(prior_logl_evolution)","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"As we can see, the transition matrix for our Bayesian version is slightly more spread out, although this effect would nearly disappear with enough data.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"cat(transition_matrix(hmm_est), transition_matrix(prior_hmm_est); dims=3)","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"This page was generated using Literate.jl.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"EditURL = \"../../../examples/basics.jl\"","category":"page"},{"location":"examples/basics/#Basics","page":"Basics","title":"Basics","text":"","category":"section"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"Here we show how to use the essential ingredients of the package.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"using Distributions\nusing HiddenMarkovModels\nusing LinearAlgebra\nusing Random","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"rng = Random.default_rng()\nRandom.seed!(rng, 63);\nnothing #hide","category":"page"},{"location":"examples/basics/#Model","page":"Basics","title":"Model","text":"","category":"section"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"The package provides a versatile HMM type with three attributes:","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"a vector of state initialization probabilities\na matrix of state transition probabilities\na vector of observation distributions, one for each state","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"We keep it simple for now by leveraging Distributions.jl.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"d = 3\ninit = [0.8, 0.2]\ntrans = [0.7 0.3; 0.3 0.7]\ndists = [MvNormal(-1.0 * ones(d), I), MvNormal(+1.0 * ones(d), I)]\nhmm = HMM(init, trans, dists);\nnothing #hide","category":"page"},{"location":"examples/basics/#Simulation","page":"Basics","title":"Simulation","text":"","category":"section"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"You can simulate a pair of state and observation sequences with rand by specifying how long you want them to be.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"state_seq, obs_seq = rand(rng, hmm, 20);\nnothing #hide","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"Note that the observation sequence is a vector, whose elements have whatever type an observation distribution returns when sampled.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"state_seq[1], obs_seq[1]","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"In practical applications, the state sequence is not known, which is why we need inference algorithms to gather information about it.","category":"page"},{"location":"examples/basics/#Inference","page":"Basics","title":"Inference","text":"","category":"section"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"The Viterbi algorithm (viterbi) returns the most likely state sequence hatX_1T = undersetX_1TmathrmargmaxmathbbP(X_1T vert Y_1T), along with the joint loglikelihood mathbbP(hatX_1T Y_1T).","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"best_state_seq, best_joint_loglikelihood = viterbi(hmm, obs_seq);\nnothing #hide","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"As we can see, it is very close to the true state sequence, but not necessarily equal.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"vcat(state_seq', best_state_seq')","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"The forward algorithm (forward) returns a matrix of filtered state marginals alphai t = mathbbP(X_t = i Y_1t), along with the loglikelihood mathbbP(Y_1T) of the observation sequence.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"filtered_state_marginals, obs_seq_loglikelihood1 = forward(hmm, obs_seq);\nnothing #hide","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"At each time t, it takes only the observations up to time t into account. This is particularly useful to infer the marginal distribution of the last state.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"filtered_state_marginals[:, end]","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"Conversely, the forward-backward algorithm (forward_backward) returns a matrix of smoothed state marginals gammai t = mathbbP(X_t = i Y_1T), along with the loglikelihood mathbbP(Y_1T) of the observation sequence.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"smoothed_state_marginals, obs_seq_loglikelihood2 = forward_backward(hmm, obs_seq);\nnothing #hide","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"At each time t, it takes all observations up to time T into account. This is particularly useful during learning. Note that forward and forward-backward only coincide at the last time step.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"collect(zip(filtered_state_marginals, smoothed_state_marginals))","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"Finally, we provide a thin wrapper (logdensityof) around the forward algorithm for observation sequence loglikelihoods mathbbP(Y_1T).","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"logdensityof(hmm, obs_seq)","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"The same function can also compute joint loglikelihoods mathbbP(X_1T Y_1T) that take the states into account.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"logdensityof(hmm, obs_seq, state_seq)","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"For instance, we can check that the output of Viterbi is at least as likely as the true state sequence.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"logdensityof(hmm, obs_seq, best_state_seq)","category":"page"},{"location":"examples/basics/#Learning","page":"Basics","title":"Learning","text":"","category":"section"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"The Baum-Welch algorithm (baum_welch) is a variant of Expectation-Maximization, designed specifically to estimate HMM parameters. Since it is a local optimization procedure, it requires a starting point that is close enough to the true model.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"init_guess = [0.7, 0.3]\ntrans_guess = [0.6 0.4; 0.4 0.6]\ndists_guess = [MvNormal(-0.7 * ones(d), I), MvNormal(+0.7 * ones(d), I)]\nhmm_guess = HMM(init_guess, trans_guess, dists_guess);\nnothing #hide","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"Let's estimate parameters based on a slightly longer sequence.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"_, long_obs_seq = rand(rng, hmm, 100)\nhmm_est, loglikelihood_evolution = baum_welch(hmm_guess, long_obs_seq);\nnothing #hide","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"An essential guarantee of this algorithm is that the loglikelihood of the observation sequence keeps increasing as the model improves.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"first(loglikelihood_evolution), last(loglikelihood_evolution)","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"We can check that the transition matrix estimate has improved.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"cat(transition_matrix(hmm_est), transition_matrix(hmm); dims=3)","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"And so have the estimates for the observation distributions.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"map(dist -> dist.μ, hcat(obs_distributions(hmm_est), obs_distributions(hmm)))","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"On the other hand, the initialization is concentrated on one state. This effect can be mitigated by learning from several independent sequences.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"hcat(initialization(hmm_est), initialization(hmm))","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"Since HMMs are not identifiable up to a permutation of the states, there is no guarantee that state i in the true model will correspond to state i in the estimated model. This is important to keep in mind when testing new models.","category":"page"},{"location":"examples/basics/#Multiple-sequences","page":"Basics","title":"Multiple sequences","text":"","category":"section"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"In many applications, we have access to various observation sequences of different lengths.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"_, long_obs_seq2 = rand(rng, hmm, 300)\n_, long_obs_seq3 = rand(rng, hmm, 200)\nlong_obs_seqs = [long_obs_seq, long_obs_seq2, long_obs_seq3];\nnothing #hide","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"Every algorithm in the package accepts multiple sequences in a concatenated form. The user must also specify where each sequence ends in the concatenated vector, by passing seq_ends as a keyword argument. Otherwise, the input will be treated as a unique observation sequence, which is mathematically incorrect.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"long_obs_seq_concat = reduce(vcat, long_obs_seqs)\nseq_ends = cumsum(length.(long_obs_seqs))","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"The outputs of inference algorithms are then concatenated, and the associated loglikelihoods are summed over all sequences.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"best_state_seq_concat, _ = viterbi(hmm, long_obs_seq_concat; seq_ends);\nlength(best_state_seq_concat)","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"The function seq_limits returns the begin and end of a given sequence in the concatenated vector. It can be used to untangle the results.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"start2, stop2 = seq_limits(seq_ends, 2)","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"best_state_seq_concat[start2:stop2] == first(viterbi(hmm, long_obs_seq2))","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"While inference algorithms can also be run separately on each sequence without changing the results, considering multiple sequences together is nontrivial for Baum-Welch. That is why the package takes care of it automatically.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"hmm_est_concat, _ = baum_welch(hmm_guess, long_obs_seq_concat; seq_ends);\nnothing #hide","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"Our estimate should be a little better.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"cat(transition_matrix(hmm_est_concat), transition_matrix(hmm); dims=3)","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"map(dist -> dist.μ, hcat(obs_distributions(hmm_est_concat), obs_distributions(hmm)))","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"This page was generated using Literate.jl.","category":"page"},{"location":"formulas/#Formulas","page":"Formulas","title":"Formulas","text":"","category":"section"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"Suppose we are given observations Y_1 Y_T, with hidden states X_1 X_T. Following (Rabiner, 1989), we use the following notations:","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"let pi in mathbbR^N be the initial state distribution pi_i = mathbbP(X_1 = i)\nlet A in mathbbR^N times N be the transition matrix a_ij = mathbbP(X_t+1=j X_t = i)\nlet B in mathbbR^N times T be the matrix of statewise observation likelihoods b_it = mathbbP(Y_t X_t = i)","category":"page"},{"location":"formulas/#Vanilla-forward-backward","page":"Formulas","title":"Vanilla forward-backward","text":"","category":"section"},{"location":"formulas/#Recursion","page":"Formulas","title":"Recursion","text":"","category":"section"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"The forward and backward variables are defined by","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"beginalign*\nalpha_it = mathbbP(Y_1t X_t=i) \nbeta_it = mathbbP(Y_t+1T X_t=i)\nendalign*","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"They are initialized with","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"beginalign*\nalpha_i1 = pi_i b_i1 \nbeta_iT = 1\nendalign*","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"and satisfy the dynamic programming equations","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"beginalign*\nalpha_jt+1 = left(sum_i=1^N alpha_it a_ijright) b_jt+1 \nbeta_it = sum_j=1^N a_ij b_jt+1 beta_jt+1\nendalign*","category":"page"},{"location":"formulas/#Likelihood","page":"Formulas","title":"Likelihood","text":"","category":"section"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"The likelihood of the whole sequence of observations is given by","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"mathcalL = mathbbP(Y_1T) = sum_i=1^N alpha_iT","category":"page"},{"location":"formulas/#Marginals","page":"Formulas","title":"Marginals","text":"","category":"section"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"We notice that","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"beginalign*\nalpha_it beta_it = mathbbP(Y_1T X_t=i) \nalpha_it a_ij b_jt+1 beta_jt+1 = mathbbP(Y_1T X_t=i X_t+1=j)\nendalign*","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"Thus we deduce the one-state and two-state marginals","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"beginalign*\ngamma_it = mathbbP(X_t=i Y_1T) = frac1mathcalL alpha_it beta_it \nxi_ijt = mathbbP(X_t=i X_t+1=j Y_1T) = frac1mathcalL alpha_it a_ij b_jt+1 beta_jt+1\nendalign*","category":"page"},{"location":"formulas/#Derivatives","page":"Formulas","title":"Derivatives","text":"","category":"section"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"According to (Qin et al., 2000), derivatives of the likelihood can be obtained as follows:","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"beginalign*\nfracpartial mathcalLpartial pi_i = beta_i1 b_i1 \nfracpartial mathcalLpartial a_ij = sum_t=1^T-1 alpha_it b_jt+1 beta_jt+1 \nfracpartial mathcalLpartial b_j1 = pi_j beta_j1 \nfracpartial mathcalLpartial b_jt = left(sum_i=1^N alpha_it-1 a_ijright) beta_jt \nendalign*","category":"page"},{"location":"formulas/#Scaled-forward-backward","page":"Formulas","title":"Scaled forward-backward","text":"","category":"section"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"In this package, we use a slightly different version of the algorithm, including both the traditional scaling of (Rabiner, 1989) and a normalization of B using m_t = max_i b_it.","category":"page"},{"location":"formulas/#Recursion-2","page":"Formulas","title":"Recursion","text":"","category":"section"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"The variables are initialized with","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"beginalign*\nhatalpha_i1 = pi_i fracb_i1m_1 c_1 = frac1sum_i hatalpha_i1 baralpha_i1 = c_1 hatalpha_i1 \nhatbeta_iT = 1 barbeta_1T = c_T hatbeta_1T\nendalign*","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"and satisfy the dynamic programming equations","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"beginalign*\nhatalpha_jt+1 = left(sum_i=1^N baralpha_it a_ijright) fracb_jt+1m_t+1 c_t+1 = frac1sum_j hatalpha_jt+1 baralpha_jt+1 = c_t+1 hatalpha_jt+1 \nhatbeta_it = sum_j=1^N a_ij fracb_jt+1m_t+1 barbeta_jt+1 barbeta_jt = c_t hatbeta_jt\nendalign*","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"In terms of the original variables, we find","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"beginalign*\nbaralpha_it = alpha_it left(prod_s=1^t fracc_sm_sright) \nbarbeta_it = beta_it left(c_t prod_s=t+1^T fracc_sm_sright)\nendalign*","category":"page"},{"location":"formulas/#Likelihood-2","page":"Formulas","title":"Likelihood","text":"","category":"section"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"Since we normalized baralpha at each time step,","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"1 = sum_i=1^N baralpha_iT = left(sum_i=1^N alpha_iTright) left(prod_s=1^T fracc_sm_sright) ","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"which means","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"mathcalL = sum_i=1^N alpha_iT = prod_s=1^T fracm_sc_s","category":"page"},{"location":"formulas/#Marginals-2","page":"Formulas","title":"Marginals","text":"","category":"section"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"We can now express the marginals using scaled variables:","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"beginalign*\ngamma_it = frac1mathcalL alpha_it beta_it = frac1mathcalL left(baralpha_it prod_s=1^t fracm_sc_sright) left(barbeta_it frac1c_t prod_s=t+1^T fracm_sc_sright) \n= frac1mathcalL fracbaralpha_it barbeta_itc_t left(prod_s=1^T fracm_sc_sright) = fracbaralpha_it barbeta_itc_t\nendalign*","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"beginalign*\nxi_ijt = frac1mathcalL alpha_it a_ij b_jt+1 beta_jt+1 \n= frac1mathcalL left(baralpha_it prod_s=1^t fracm_sc_sright) a_ij b_jt+1 left(barbeta_jt+1 frac1c_t+1 prod_s=t+2^T fracm_sc_sright) \n= frac1mathcalL baralpha_it a_ij fracb_jt+1m_t+1 barbeta_jt+1 left(prod_s=1^T fracm_sc_sright) \n= baralpha_it a_ij fracb_jt+1m_t+1 barbeta_jt+1\nendalign*","category":"page"},{"location":"formulas/#Derivatives-2","page":"Formulas","title":"Derivatives","text":"","category":"section"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"And we also need to adapt the derivatives. For the initial distribution,","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"beginalign*\nfracpartial mathcalLpartial pi_i = beta_i1 b_i1 = left(barbeta_i1 frac1c_1 prod_s=2^T fracm_sc_s right) b_i1 \n= left(prod_s=1^T fracm_sc_sright) barbeta_i1 fracb_i1m_1 = mathcalL barbeta_i1 fracb_i1m_1 \nendalign*","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"For the transition matrix,","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"beginalign*\nfracpartial mathcalLpartial a_ij = sum_t=1^T-1 alpha_it b_jt+1 beta_jt+1 \n= sum_t=1^T-1 left(baralpha_it prod_s=1^t fracm_sc_s right) b_jt+1 left(barbeta_jt+1 frac1c_t+1 prod_s=t+2^T fracm_sc_s right) \n= sum_t=1^T-1 baralpha_it fracb_jt+1m_t+1 barbeta_jt+1 left(prod_s=1^T fracm_sc_s right) \n= mathcalL sum_t=1^T-1 baralpha_it fracb_jt+1m_t+1 barbeta_jt+1 \nendalign*","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"And for the statewise observation likelihoods,","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"beginalign*\nfracpartial mathcalLpartial b_j1 = pi_j beta_j1 = pi_j barbeta_j1 frac1c_1 prod_s=2^T fracm_sc_s = mathcalL pi_j barbeta_j1 frac1m_1\nendalign*","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"beginalign*\nfracpartial mathcalLpartial b_jt = left(sum_i=1^N alpha_it-1 a_ijright) beta_jt \n= sum_i=1^N left(baralpha_it-1 prod_s=1^t-1 fracm_sc_sright) a_ij left(barbeta_jt frac1c_t prod_s=t+1^T fracm_sc_s right) \n= sum_i=1^N baralpha_it-1 a_ij barbeta_jt frac1m_t left(prod_s=1^T fracm_sc_sright) \n= mathcalL sum_i=1^N baralpha_it-1 a_ij barbeta_jt frac1m_t \nendalign*","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"Finally, we note that","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"fracpartial log mathcalLpartial log b_jt = fracpartial log mathcalLpartial b_jt b_jt","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"To sum up,","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"beginalign*\nfracpartial log mathcalLpartial pi_i = fracb_i1m_1 barbeta_i1 \nfracpartial log mathcalLpartial a_ij = sum_t=1^T-1 baralpha_it fracb_jt+1m_t+1 barbeta_jt+1 \nfracpartial log mathcalLpartial log b_j1 = pi_j fracb_j1m_1 barbeta_j1 = fracbaralpha_j1 barbeta_j1c_1 = gamma_j1 \nfracpartial log mathcalLpartial log b_jt = sum_i=1^N baralpha_it-1 a_ij fracb_jtm_t barbeta_jt = fracbaralpha_jt barbeta_jtc_t = gamma_jt\nendalign*","category":"page"},{"location":"formulas/#Bibliography","page":"Formulas","title":"Bibliography","text":"","category":"section"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"Qin, F.; Auerbach, A. and Sachs, F. (2000). A Direct Optimization Approach to Hidden Markov Modeling for Single Channel Kinetics. Biophysical Journal 79, 1915–1927. Accessed on Sep 10, 2023.\n\n\n\nRabiner, L. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77, 257–286. Accessed on Sep 10, 2023.\n\n\n\n","category":"page"},{"location":"","page":"Home","title":"Home","text":"EditURL = \"https://github.com/gdalle/HiddenMarkovModels.jl/blob/main/README.md\"","category":"page"},{"location":"#HiddenMarkovModels.jl","page":"Home","title":"HiddenMarkovModels.jl","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"(Image: Stable) (Image: Dev) (Image: Build Status) (Image: Coverage) (Image: Code Style: Blue)","category":"page"},{"location":"","page":"Home","title":"Home","text":"(Image: DOI) (Image: Aqua QA) (Image: JET)","category":"page"},{"location":"","page":"Home","title":"Home","text":"A Julia package for simulation, inference and learning of Hidden Markov Models.","category":"page"},{"location":"#Getting-started","page":"Home","title":"Getting started","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"This package can be installed using Julia's package manager:","category":"page"},{"location":"","page":"Home","title":"Home","text":"pkg> add HiddenMarkovModels","category":"page"},{"location":"","page":"Home","title":"Home","text":"Then, you can create your first model as follows:","category":"page"},{"location":"","page":"Home","title":"Home","text":"using Distributions, HiddenMarkovModels\ninit = [0.4, 0.6]\ntrans = [0.9 0.1; 0.2 0.8]\ndists = [Normal(-1.0), Normal(1.0)]\nhmm = HMM(init, trans, dists)","category":"page"},{"location":"","page":"Home","title":"Home","text":"Take a look at the documentation to know what to do next!","category":"page"},{"location":"#Some-background","page":"Home","title":"Some background","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"HMMs are a widely used modeling framework in signal processing, bioinformatics and plenty of other fields. They capture the distribution of an observation sequence (Y_t) by assuming the existence of a latent state sequence (X_t) such that:","category":"page"},{"location":"","page":"Home","title":"Home","text":"the state follows a (discrete time, discrete space) Markov chain mathbbP_theta(X_t X_t-1)\nthe observation distribution is determined at each time by the state mathbbP_theta(Y_t X_t)","category":"page"},{"location":"","page":"Home","title":"Home","text":"Each of the problems below has an efficient solution algorithm which our package implements:","category":"page"},{"location":"","page":"Home","title":"Home","text":"Problem Goal Algorithm\nEvaluation Likelihood of the observation sequence mathbbP_theta(Y_1T) Forward\nFiltering Non-anticipative state marginals mathbbP_theta(X_t vert Y_1t) Forward\nSmoothing State marginals mathbbP_theta(X_t vert Y_1T) Forward-backward\nDecoding Most likely state sequence undersetX_1TmathrmargmaxmathbbP_theta(X_1T vert Y_1T) Viterbi\nLearning Maximum likelihood parameter undersetthetamathrmargmaxmathbbP_theta(Y_1T) Baum-Welch","category":"page"},{"location":"#Main-features","page":"Home","title":"Main features","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"This package is generic. Observations can be arbitrary Julia objects, not just scalars or arrays. Number types are not restricted to floating point, which enables automatic differentiation. Time-heterogeneous or controlled HMMs are supported out of the box.","category":"page"},{"location":"","page":"Home","title":"Home","text":"This package is fast. All the inference functions have allocation-free versions, which leverage efficient linear algebra subroutines. We will include extensive benchmarks against Julia and Python competitors.","category":"page"},{"location":"","page":"Home","title":"Home","text":"This package is reliable. It gives the same results as the previous reference package up to numerical accuracy. The test suite incorporates quality checks as well as type stability and allocation analysis.","category":"page"},{"location":"#Contributing","page":"Home","title":"Contributing","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"If you spot a bug or want to ask about a new feature, please open an issue on the GitHub repository. Once the issue receives positive feedback, feel free to try and fix it with a pull request that follows the BlueStyle guidelines.","category":"page"},{"location":"#Acknowledgements","page":"Home","title":"Acknowledgements","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"A big thank you to Maxime Mouchet and Jacob Schreiber, the respective lead devs of alternative packages HMMBase.jl and pomegranate, for their help and advice. Logo by Clément Mantoux based on a portrait of Andrey Markov.","category":"page"}] +[{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"EditURL = \"../../../examples/controlled.jl\"","category":"page"},{"location":"examples/controlled/#Control-dependency","page":"Control dependency","title":"Control dependency","text":"","category":"section"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"Here, we give a example of controlled HMM (also called input-output HMM), in the special case of Markov switching regression.","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"using Distributions\nusing HiddenMarkovModels\nimport HiddenMarkovModels as HMMs\nusing LinearAlgebra\nusing Random\nusing SimpleUnPack\nusing StatsAPI","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"rng = Random.default_rng()\nRandom.seed!(rng, 63);\nnothing #hide","category":"page"},{"location":"examples/controlled/#Model","page":"Control dependency","title":"Model","text":"","category":"section"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"A Markov switching regression is like a classical regression, except that the weights depend on the unobserved state of an HMM. We can represent it with the following subtype of AbstractHMM, which has one vector of coefficients beta_i per state.","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"struct ControlledGaussianHMM{T} <: AbstractHMM\n init::Vector{T}\n trans::Matrix{T}\n dist_coeffs::Vector{Vector{T}}\nend","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"In state i with a vector of controls u, our observation is given by the linear model y sim mathcalN(beta_i^top u 1).","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"function HMMs.initialization(hmm::ControlledGaussianHMM)\n return hmm.init\nend\n\nfunction HMMs.transition_matrix(hmm::ControlledGaussianHMM)\n return hmm.trans\nend\n\nfunction HMMs.obs_distributions(hmm::ControlledGaussianHMM, control::AbstractVector)\n return [Normal(dot(hmm.dist_coeffs[i], control), 1.0) for i in 1:length(hmm)]\nend","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"In this case, the transition matrix does not depend on the control.","category":"page"},{"location":"examples/controlled/#Simulation","page":"Control dependency","title":"Simulation","text":"","category":"section"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"d = 3\ninit = [0.8, 0.2]\ntrans = [0.7 0.3; 0.3 0.7]\ndist_coeffs = [-ones(d), ones(d)]\nhmm = ControlledGaussianHMM(init, trans, dist_coeffs);\nnothing #hide","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"Simulation requires a vector of controls, each being a vector itself with the right dimension.","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"Let us build several sequences of variable lengths.","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"control_seqs = [[randn(rng, d) for t in 1:rand(100:200)] for k in 1:100];\nobs_seqs = [rand(rng, hmm, control_seq).obs_seq for control_seq in control_seqs];\n\nobs_seq = reduce(vcat, obs_seqs)\ncontrol_seq = reduce(vcat, control_seqs)\nseq_ends = cumsum(length.(obs_seqs));\nnothing #hide","category":"page"},{"location":"examples/controlled/#Inference","page":"Control dependency","title":"Inference","text":"","category":"section"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"Not much changes from the case with simple time dependency.","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"best_state_seq, _ = viterbi(hmm, obs_seq; control_seq, seq_ends)","category":"page"},{"location":"examples/controlled/#Learning","page":"Control dependency","title":"Learning","text":"","category":"section"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"Once more, we override the fit! function. The state-related parameters are estimated in the standard way. Meanwhile, the observation coefficients are given by the formula for weighted least squares.","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"function StatsAPI.fit!(\n hmm::ControlledGaussianHMM{T},\n fb_storage::HMMs.ForwardBackwardStorage,\n obs_seq::AbstractVector;\n control_seq::AbstractVector,\n seq_ends::AbstractVector{Int},\n) where {T}\n @unpack γ, ξ = fb_storage\n N = length(hmm)\n\n hmm.init .= 0\n hmm.trans .= 0\n for k in eachindex(seq_ends)\n t1, t2 = HMMs.seq_limits(seq_ends, k)\n hmm.init .+= γ[:, t1]\n hmm.trans .+= sum(ξ[t1:t2])\n end\n hmm.init ./= sum(hmm.init)\n for row in eachrow(hmm.trans)\n row ./= sum(row)\n end\n\n U = reduce(hcat, control_seq)'\n y = obs_seq\n for i in 1:N\n W = sqrt.(Diagonal(γ[i, :]))\n hmm.dist_coeffs[i] = (W * U) \\ (W * y)\n end\nend","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"Now we put it to the test.","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"init_guess = [0.7, 0.3]\ntrans_guess = [0.6 0.4; 0.4 0.6]\ndist_coeffs_guess = [-0.7 * ones(d), 0.7 * ones(d)]\nhmm_guess = ControlledGaussianHMM(init_guess, trans_guess, dist_coeffs_guess);\nnothing #hide","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"hmm_est, loglikelihood_evolution = baum_welch(hmm_guess, obs_seq; control_seq, seq_ends)\nfirst(loglikelihood_evolution), last(loglikelihood_evolution)","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"How did we perform?","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"cat(transition_matrix(hmm_est), transition_matrix(hmm); dims=3)","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"hcat(hmm_est.dist_coeffs[1], hmm.dist_coeffs[1])","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"hcat(hmm_est.dist_coeffs[2], hmm.dist_coeffs[2])","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"This page was generated using Literate.jl.","category":"page"},{"location":"debugging/#Debugging","page":"Debugging","title":"Debugging","text":"","category":"section"},{"location":"debugging/#Numerical-overflow","page":"Debugging","title":"Numerical overflow","text":"","category":"section"},{"location":"debugging/","page":"Debugging","title":"Debugging","text":"The most frequent error you will encounter is an OverflowError during inference, telling you that some values are infinite or NaN. This can happen for a variety of reasons, so here are a few leads worth investigating:","category":"page"},{"location":"debugging/","page":"Debugging","title":"Debugging","text":"Increase the duration of the sequence / the number of sequences to get more data\nAdd a prior to your transition matrix / observation distributions to avoid degenerate behavior like zero variance in a Gaussian\nReduce the number of states to make every one of them useful\nPick a better initialization to start closer to the supposed ground truth\nUse numerically stable number types (such as LogarithmicNumbers.jl) in strategic places, but beware: these numbers don't play nicely with Distributions.jl, so you may have to roll out your own observation distributions.","category":"page"},{"location":"debugging/#Performance","page":"Debugging","title":"Performance","text":"","category":"section"},{"location":"debugging/","page":"Debugging","title":"Debugging","text":"If your algorithms are too slow, the general advice always applies:","category":"page"},{"location":"debugging/","page":"Debugging","title":"Debugging","text":"Use BenchmarkTools.jl to establish a baseline\nUse profiling to see where you spend most of your time\nUse JET.jl to track down type instabilities\nUse AllocCheck.jl to reduce allocations","category":"page"},{"location":"api/#API-reference","page":"API reference","title":"API reference","text":"","category":"section"},{"location":"api/","page":"API reference","title":"API reference","text":"HiddenMarkovModels","category":"page"},{"location":"api/#HiddenMarkovModels","page":"API reference","title":"HiddenMarkovModels","text":"HiddenMarkovModels\n\nA Julia package for HMM modeling, simulation, inference and learning.\n\n\n\n\n\n","category":"module"},{"location":"api/#Sequence-formatting","page":"API reference","title":"Sequence formatting","text":"","category":"section"},{"location":"api/","page":"API reference","title":"API reference","text":"Most algorithms below ingest the data with three (keyword) arguments: obs_seq, control_seq and seq_ends.","category":"page"},{"location":"api/","page":"API reference","title":"API reference","text":"If the data consists of a single sequence, obs_seq and control_seq are the corresponding vectors of observations and controls, and you don't need to provide seq_ends.\nIf the data consists of multiple sequences, obs_seq and control_seq are concatenations of several vectors, whose end indices are given by seq_ends. Starting from separate sequences obs_seqs and control_seqs, you can run the following snippet:","category":"page"},{"location":"api/","page":"API reference","title":"API reference","text":"obs_seq = reduce(vcat, obs_seqs)\ncontrol_seq = reduce(vcat, control_seqs)\nseq_ends = cumsum(length.(obs_seqs))","category":"page"},{"location":"api/#Types","page":"API reference","title":"Types","text":"","category":"section"},{"location":"api/","page":"API reference","title":"API reference","text":"AbstractHMM\nHMM","category":"page"},{"location":"api/#HiddenMarkovModels.AbstractHMM","page":"API reference","title":"HiddenMarkovModels.AbstractHMM","text":"AbstractHMM\n\nAbstract supertype for an HMM amenable to simulation, inference and learning.\n\nInterface\n\nTo create your own subtype of AbstractHMM, you need to implement the following methods:\n\ninitialization\ntransition_matrix\nobs_distributions\nfit! (for learning)\n\nApplicable functions\n\nAny AbstractHMM which satisfies the interface can be given to the following functions:\n\nrand\nlogdensityof\nforward\nviterbi\nforward_backward\nbaum_welch (if fit! is implemented)\n\n\n\n\n\n","category":"type"},{"location":"api/#HiddenMarkovModels.HMM","page":"API reference","title":"HiddenMarkovModels.HMM","text":"struct HMM{V<:(AbstractVector), M<:(AbstractMatrix), VD<:(AbstractVector)} <: AbstractHMM\n\nBasic implementation of an HMM.\n\nFields\n\ninit::AbstractVector: initial state probabilities\ntrans::AbstractMatrix: state transition matrix\ndists::AbstractVector: observation distributions\n\n\n\n\n\n","category":"type"},{"location":"api/#Interface","page":"API reference","title":"Interface","text":"","category":"section"},{"location":"api/","page":"API reference","title":"API reference","text":"initialization\ntransition_matrix\nobs_distributions","category":"page"},{"location":"api/#HiddenMarkovModels.initialization","page":"API reference","title":"HiddenMarkovModels.initialization","text":"initialization(hmm)\n\nReturn the vector of initial state probabilities for hmm.\n\n\n\n\n\n","category":"function"},{"location":"api/#HiddenMarkovModels.transition_matrix","page":"API reference","title":"HiddenMarkovModels.transition_matrix","text":"transition_matrix(hmm)\ntransition_matrix(hmm, control)\n\nReturn the matrix of state transition probabilities for hmm (possibly when control is applied).\n\n\n\n\n\n","category":"function"},{"location":"api/#HiddenMarkovModels.obs_distributions","page":"API reference","title":"HiddenMarkovModels.obs_distributions","text":"obs_distributions(hmm)\nobs_distributions(hmm, control)\n\nReturn a vector of observation distributions, one for each state of hmm (possibly when control is applied).\n\nThese distribution objects should implement\n\nRandom.rand(rng, dist) for sampling\nDensityInterface.logdensityof(dist, obs) for inference\nStatsAPI.fit!(dist, obs_seq, weight_seq) for learning\n\n\n\n\n\n","category":"function"},{"location":"api/#Utils","page":"API reference","title":"Utils","text":"","category":"section"},{"location":"api/","page":"API reference","title":"API reference","text":"length\nrand\neltype\nseq_limits","category":"page"},{"location":"api/#Base.length","page":"API reference","title":"Base.length","text":"length(hmm)\n\nReturn the number of states of hmm.\n\n\n\n\n\n","category":"function"},{"location":"api/#Base.rand","page":"API reference","title":"Base.rand","text":"rand([rng,] hmm, T)\nrand([rng,] hmm, control_seq)\n\nSimulate hmm for T time steps, or when the sequence control_seq is applied.\n\nReturn a named tuple (; state_seq, obs_seq).\n\n\n\n\n\n","category":"function"},{"location":"api/#Base.eltype","page":"API reference","title":"Base.eltype","text":"eltype(hmm, obs, control)\n\nReturn a type that can accommodate forward-backward computations for hmm on observations similar to obs.\n\nIt is typically a promotion between the element type of the initialization, the element type of the transition matrix, and the type of an observation logdensity evaluated at obs.\n\n\n\n\n\n","category":"function"},{"location":"api/#HiddenMarkovModels.seq_limits","page":"API reference","title":"HiddenMarkovModels.seq_limits","text":"seq_limits(seq_ends, k)\n\n\nReturn a tuple (t1, t2) giving the begin and end indices of subsequence k within a set of sequences ending at seq_ends.\n\n\n\n\n\n","category":"function"},{"location":"api/#Inference","page":"API reference","title":"Inference","text":"","category":"section"},{"location":"api/","page":"API reference","title":"API reference","text":"logdensityof\nforward\nviterbi\nforward_backward","category":"page"},{"location":"api/#DensityInterface.logdensityof","page":"API reference","title":"DensityInterface.logdensityof","text":"logdensityof(hmm)\n\nReturn the prior loglikelihood associated with the parameters of hmm.\n\n\n\n\n\nlogdensityof(hmm, obs_seq; control_seq, seq_ends)\n\n\nRun the forward algorithm to compute the loglikelihood of obs_seq for hmm, integrating over all possible state sequences.\n\n\n\n\n\nlogdensityof(hmm, obs_seq, state_seq; control_seq, seq_ends)\n\n\nRun the forward algorithm to compute the the joint loglikelihood of obs_seq and state_seq for hmm.\n\n\n\n\n\n","category":"function"},{"location":"api/#HiddenMarkovModels.forward","page":"API reference","title":"HiddenMarkovModels.forward","text":"forward(hmm, obs_seq; control_seq, seq_ends)\n\n\nApply the forward algorithm to infer the current state after sequence obs_seq for hmm.\n\nReturn a tuple (storage.α, sum(storage.logL)) where storage is of type ForwardStorage.\n\n\n\n\n\n","category":"function"},{"location":"api/#HiddenMarkovModels.viterbi","page":"API reference","title":"HiddenMarkovModels.viterbi","text":"viterbi(hmm, obs_seq; control_seq, seq_ends)\n\n\nApply the Viterbi algorithm to infer the most likely state sequence corresponding to obs_seq for hmm.\n\nReturn a tuple (storage.q, sum(storage.logL)) where storage is of type ViterbiStorage.\n\n\n\n\n\n","category":"function"},{"location":"api/#HiddenMarkovModels.forward_backward","page":"API reference","title":"HiddenMarkovModels.forward_backward","text":"forward_backward(hmm, obs_seq; control_seq, seq_ends)\n\n\nApply the forward-backward algorithm to infer the posterior state and transition marginals during sequence obs_seq for hmm.\n\nReturn a tuple (storage.γ, sum(storage.logL)) where storage is of type ForwardBackwardStorage.\n\n\n\n\n\n","category":"function"},{"location":"api/#Learning","page":"API reference","title":"Learning","text":"","category":"section"},{"location":"api/","page":"API reference","title":"API reference","text":"baum_welch\nfit!","category":"page"},{"location":"api/#HiddenMarkovModels.baum_welch","page":"API reference","title":"HiddenMarkovModels.baum_welch","text":"baum_welch(\n hmm_guess,\n obs_seq;\n control_seq,\n seq_ends,\n atol,\n max_iterations,\n loglikelihood_increasing\n)\n\n\nApply the Baum-Welch algorithm to estimate the parameters of an HMM on obs_seq, starting from hmm_guess.\n\nReturn a tuple (hmm_est, loglikelihood_evolution) where hmm_est is the estimated HMM and loglikelihood_evolution is a vector of loglikelihood values, one per iteration of the algorithm.\n\nKeyword arguments\n\natol: minimum loglikelihood increase at an iteration of the algorithm (otherwise the algorithm is deemed to have converged)\nmax_iterations: maximum number of iterations of the algorithm\nloglikelihood_increasing: whether to throw an error if the loglikelihood decreases\n\n\n\n\n\n","category":"function"},{"location":"api/#StatsAPI.fit!","page":"API reference","title":"StatsAPI.fit!","text":"fit!(\n hmm::AbstractHMM,\n fb_storage::ForwardBackwardStorage,\n obs_seq::AbstractVector;\n control_seq::AbstractVector,\n seq_ends::AbstractVector{Int},\n)\n\nUpdate hmm in-place based on information generated during forward-backward.\n\n\n\n\n\n","category":"function"},{"location":"api/#In-place-versions","page":"API reference","title":"In-place versions","text":"","category":"section"},{"location":"api/#Forward","page":"API reference","title":"Forward","text":"","category":"section"},{"location":"api/","page":"API reference","title":"API reference","text":"HiddenMarkovModels.ForwardStorage\nHiddenMarkovModels.initialize_forward\nHiddenMarkovModels.forward!","category":"page"},{"location":"api/#HiddenMarkovModels.ForwardStorage","page":"API reference","title":"HiddenMarkovModels.ForwardStorage","text":"struct ForwardStorage{R}\n\nFields\n\nOnly the fields with a description are part of the public API.\n\nα::Matrix: posterior last state marginals α[i] = ℙ(X[T]=i | Y[1:T])\nlogL::Vector: one loglikelihood per observation sequence\nB::Matrix\nc::Vector\n\n\n\n\n\n","category":"type"},{"location":"api/#HiddenMarkovModels.initialize_forward","page":"API reference","title":"HiddenMarkovModels.initialize_forward","text":"initialize_forward(hmm, obs_seq; control_seq, seq_ends)\n\n\n\n\n\n\n","category":"function"},{"location":"api/#HiddenMarkovModels.forward!","page":"API reference","title":"HiddenMarkovModels.forward!","text":"forward!(storage, hmm, obs_seq; control_seq, seq_ends)\n\n\n\n\n\n\n","category":"function"},{"location":"api/#Viterbi","page":"API reference","title":"Viterbi","text":"","category":"section"},{"location":"api/","page":"API reference","title":"API reference","text":"HiddenMarkovModels.ViterbiStorage\nHiddenMarkovModels.initialize_viterbi\nHiddenMarkovModels.viterbi!","category":"page"},{"location":"api/#HiddenMarkovModels.ViterbiStorage","page":"API reference","title":"HiddenMarkovModels.ViterbiStorage","text":"struct ViterbiStorage{R}\n\nFields\n\nOnly the fields with a description are part of the public API.\n\nq::Vector{Int64}: most likely state sequence q[t] = argmaxᵢ ℙ(X[t]=i | Y[1:T])\nlogL::Vector: one joint loglikelihood per pair of observation sequence and most likely state sequence\nlogB::Matrix\nϕ::Matrix\nψ::Matrix{Int64}\n\n\n\n\n\n","category":"type"},{"location":"api/#HiddenMarkovModels.initialize_viterbi","page":"API reference","title":"HiddenMarkovModels.initialize_viterbi","text":"initialize_viterbi(hmm, obs_seq; control_seq, seq_ends)\n\n\n\n\n\n\n","category":"function"},{"location":"api/#HiddenMarkovModels.viterbi!","page":"API reference","title":"HiddenMarkovModels.viterbi!","text":"viterbi!(storage, hmm, obs_seq; control_seq, seq_ends)\n\n\n\n\n\n\n","category":"function"},{"location":"api/#Forward-backward","page":"API reference","title":"Forward-backward","text":"","category":"section"},{"location":"api/","page":"API reference","title":"API reference","text":"HiddenMarkovModels.ForwardBackwardStorage\nHiddenMarkovModels.initialize_forward_backward\nHiddenMarkovModels.forward_backward!","category":"page"},{"location":"api/#HiddenMarkovModels.ForwardBackwardStorage","page":"API reference","title":"HiddenMarkovModels.ForwardBackwardStorage","text":"struct ForwardBackwardStorage{R, M<:AbstractArray{R, 2}}\n\nFields\n\nOnly the fields with a description are part of the public API.\n\nγ::Matrix: posterior state marginals γ[i,t] = ℙ(X[t]=i | Y[1:T])\nξ::Vector{M} where {R, M<:AbstractMatrix{R}}: posterior transition marginals ξ[t][i,j] = ℙ(X[t]=i, X[t+1]=j | Y[1:T])\nlogL::Vector: one loglikelihood per observation sequence\nB::Matrix\nα::Matrix\nc::Vector\nβ::Matrix\nBβ::Matrix\n\n\n\n\n\n","category":"type"},{"location":"api/#HiddenMarkovModels.initialize_forward_backward","page":"API reference","title":"HiddenMarkovModels.initialize_forward_backward","text":"initialize_forward_backward(\n hmm,\n obs_seq;\n control_seq,\n seq_ends,\n transition_marginals\n)\n\n\n\n\n\n\n","category":"function"},{"location":"api/#HiddenMarkovModels.forward_backward!","page":"API reference","title":"HiddenMarkovModels.forward_backward!","text":"forward_backward!(\n storage,\n hmm,\n obs_seq;\n control_seq,\n seq_ends,\n transition_marginals\n)\n\n\n\n\n\n\n","category":"function"},{"location":"api/#Baum-Welch","page":"API reference","title":"Baum-Welch","text":"","category":"section"},{"location":"api/","page":"API reference","title":"API reference","text":"HiddenMarkovModels.baum_welch!","category":"page"},{"location":"api/#HiddenMarkovModels.baum_welch!","page":"API reference","title":"HiddenMarkovModels.baum_welch!","text":"baum_welch!(\n fb_storage,\n logL_evolution,\n hmm,\n obs_seq;\n control_seq,\n seq_ends,\n atol,\n max_iterations,\n loglikelihood_increasing\n)\n\n\n\n\n\n\n","category":"function"},{"location":"api/#Misc","page":"API reference","title":"Misc","text":"","category":"section"},{"location":"api/","page":"API reference","title":"API reference","text":"HiddenMarkovModels.rand_prob_vec\nHiddenMarkovModels.rand_trans_mat\nHiddenMarkovModels.LightDiagNormal\nHiddenMarkovModels.LightCategorical\nHiddenMarkovModels.fit_in_sequence!","category":"page"},{"location":"api/#HiddenMarkovModels.rand_prob_vec","page":"API reference","title":"HiddenMarkovModels.rand_prob_vec","text":"rand_prob_vec([rng, ::Type{R},] N)\n\nGenerate a random probability distribution of size N with normalized uniform entries.\n\n\n\n\n\n","category":"function"},{"location":"api/#HiddenMarkovModels.rand_trans_mat","page":"API reference","title":"HiddenMarkovModels.rand_trans_mat","text":"rand_trans_mat([rng, ::Type{R},] N)\n\nGenerate a random transition matrix of size (N, N) with normalized uniform entries.\n\n\n\n\n\n","category":"function"},{"location":"api/#HiddenMarkovModels.LightDiagNormal","page":"API reference","title":"HiddenMarkovModels.LightDiagNormal","text":"struct LightDiagNormal{T1, T2, T3, V1<:AbstractArray{T1, 1}, V2<:AbstractArray{T2, 1}, V3<:AbstractArray{T3, 1}}\n\nAn HMMs-compatible implementation of a multivariate normal distribution with diagonal covariance, enabling allocation-free in-place estimation.\n\nThis is not part of the public API and is expected to change.\n\nFields\n\nμ::AbstractVector: means\nσ::AbstractVector: standard deviations\nlogσ::AbstractVector: log standard deviations\n\n\n\n\n\n","category":"type"},{"location":"api/#HiddenMarkovModels.LightCategorical","page":"API reference","title":"HiddenMarkovModels.LightCategorical","text":"struct LightCategorical{T1, T2, V1<:AbstractArray{T1, 1}, V2<:AbstractArray{T2, 1}}\n\nAn HMMs-compatible implementation of a discrete categorical distribution, enabling allocation-free in-place estimation.\n\nThis is not part of the public API and is expected to change.\n\nFields\n\np::AbstractVector: class probabilities\nlogp::AbstractVector: log class probabilities\n\n\n\n\n\n","category":"type"},{"location":"api/#HiddenMarkovModels.fit_in_sequence!","page":"API reference","title":"HiddenMarkovModels.fit_in_sequence!","text":"fit_in_sequence!(dists, i, x, w)\n\n\nModify the i-th element of dists by fitting it to an observation sequence x with associated weight sequence w.\n\nDefault behavior:\n\nfit!(dists[i], x, w)\n\nOverride for Distributions.jl (in the package extension)\n\ndists[i] = fit(eltype(dists), x, w)\n\n\n\n\n\n","category":"function"},{"location":"api/#Index","page":"API reference","title":"Index","text":"","category":"section"},{"location":"api/","page":"API reference","title":"API reference","text":"","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"EditURL = \"../../../examples/temporal.jl\"","category":"page"},{"location":"examples/temporal/#Time-dependency","page":"Time dependency","title":"Time dependency","text":"","category":"section"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"Here, we demonstrate what to do transition and observation laws depend on the current time. This time-dependent HMM is implemented as a particular case of controlled HMM.","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"using Distributions\nusing HiddenMarkovModels\nimport HiddenMarkovModels as HMMs\nusing Random\nusing SimpleUnPack\nusing StatsAPI","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"rng = Random.default_rng()\nRandom.seed!(rng, 63);\nnothing #hide","category":"page"},{"location":"examples/temporal/#Model","page":"Time dependency","title":"Model","text":"","category":"section"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"We focus on the particular case of a periodic HMM with period L. It has only one initialization vector, but L transition matrices and L vectors of observation distributions. Once again we need to subtype AbstractHMM.","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"struct PeriodicHMM{T<:Number,D,L} <: AbstractHMM\n init::Vector{T}\n trans_per::NTuple{L,Matrix{T}}\n dists_per::NTuple{L,Vector{D}}\nend","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"The interface definition is almost the same as in the homogeneous case, but we give the control variable (here the time) as an additional argument to transition_matrix and obs_distributions.","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"period(::PeriodicHMM{T,D,L}) where {T,D,L} = L\n\nfunction HMMs.initialization(hmm::PeriodicHMM)\n return hmm.init\nend\n\nfunction HMMs.transition_matrix(hmm::PeriodicHMM, t::Integer)\n l = (t - 1) % period(hmm) + 1\n return hmm.trans_per[l]\nend\n\nfunction HMMs.obs_distributions(hmm::PeriodicHMM, t::Integer)\n l = (t - 1) % period(hmm) + 1\n return hmm.dists_per[l]\nend","category":"page"},{"location":"examples/temporal/#Simulation","page":"Time dependency","title":"Simulation","text":"","category":"section"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"init = [0.8, 0.2]\ntrans_per = ([0.7 0.3; 0.3 0.7], [0.3 0.7; 0.7 0.3])\ndists_per = ([Normal(-1.0), Normal(-2.0)], [Normal(+1.0), Normal(+2.0)])\nhmm = PeriodicHMM(init, trans_per, dists_per);\nnothing #hide","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"Since the behavior of the model depends on control variables, we need to pass these to the simulation routine (instead of just the number of time steps T).","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"control_seq = 1:10\nstate_seq, obs_seq = rand(rng, hmm, control_seq);\nobs_seq'","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"The observations mostly alternate between positive and negative values, which is coherent with negative observation means at odd times and positive observation means at even times.","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"We now generate several sequences of variable lengths, for inference and learning tasks.","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"control_seqs = [1:rand(100:200) for k in 1:1000]\nobs_seqs = [rand(rng, hmm, control_seqs[k]).obs_seq for k in eachindex(control_seqs)];\n\nobs_seq = reduce(vcat, obs_seqs)\ncontrol_seq = reduce(vcat, control_seqs)\nseq_ends = cumsum(length.(obs_seqs));\nnothing #hide","category":"page"},{"location":"examples/temporal/#Inference","page":"Time dependency","title":"Inference","text":"","category":"section"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"All three inference algorithms work in the same way, except that we need to provide the control sequence as a keyword argument.","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"best_state_seq, _ = viterbi(hmm, obs_seq; control_seq, seq_ends)","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"For Viterbi, unsurprisingly, the most likely state sequence aligns with the sign of the observations.","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"vcat(obs_seq', best_state_seq')","category":"page"},{"location":"examples/temporal/#Learning","page":"Time dependency","title":"Learning","text":"","category":"section"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"When estimating parameters for a custom subtype of AbstractHMM, we have to override the fitting procedure after forward-backward (with an additional control_seq keyword argument). The key is to split the observations according to which periodic parameter they belong to.","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"function StatsAPI.fit!(\n hmm::PeriodicHMM{T},\n fb_storage::HMMs.ForwardBackwardStorage,\n obs_seq::AbstractVector;\n control_seq::AbstractVector,\n seq_ends::AbstractVector{Int},\n) where {T}\n @unpack γ, ξ = fb_storage\n L, N = period(hmm), length(hmm)\n\n hmm.init .= zero(T)\n for l in 1:L\n hmm.trans_per[l] .= zero(T)\n end\n for k in eachindex(seq_ends)\n t1, t2 = HMMs.seq_limits(seq_ends, k)\n hmm.init .+= γ[:, t1]\n for l in 1:L\n hmm.trans_per[l] .+= sum(ξ[(t1 + l - 1):L:t2])\n end\n end\n hmm.init ./= sum(hmm.init)\n for l in 1:L, row in eachrow(hmm.trans_per[l])\n row ./= sum(row)\n end\n\n for l in 1:L\n times_l = Int[]\n for k in eachindex(seq_ends)\n t1, t2 = HMMs.seq_limits(seq_ends, k)\n append!(times_l, (t1 + l - 1):L:t2)\n end\n for i in 1:N\n HMMs.fit_in_sequence!(hmm.dists_per[l], i, obs_seq[times_l], γ[i, times_l])\n end\n end\n\n for l in 1:L\n HMMs.check_hmm(hmm; control=l)\n end\n return nothing\nend","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"Now let's test our procedure with a reasonable guess.","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"init_guess = [0.7, 0.3]\ntrans_per_guess = ([0.6 0.4; 0.4 0.6], [0.4 0.6; 0.6 0.4])\ndists_per_guess = ([Normal(-0.7), Normal(-1.7)], [Normal(+0.7), Normal(+1.7)])\nhmm_guess = PeriodicHMM(init_guess, trans_per_guess, dists_per_guess);\nnothing #hide","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"Naturally, Baum-Welch also requires knowing control_seq.","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"hmm_est, loglikelihood_evolution = baum_welch(hmm_guess, obs_seq; control_seq, seq_ends);\nfirst(loglikelihood_evolution), last(loglikelihood_evolution)","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"Did we do well?","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"cat(transition_matrix(hmm_est, 1), transition_matrix(hmm, 1); dims=3)","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"cat(transition_matrix(hmm_est, 2), transition_matrix(hmm, 2); dims=3)","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"hcat(obs_distributions(hmm_est, 1), obs_distributions(hmm, 1))","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"hcat(obs_distributions(hmm_est, 2), obs_distributions(hmm, 2))","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"This page was generated using Literate.jl.","category":"page"},{"location":"alternatives/#Competitors","page":"Alternatives","title":"Competitors","text":"","category":"section"},{"location":"alternatives/#Julia","page":"Alternatives","title":"Julia","text":"","category":"section"},{"location":"alternatives/","page":"Alternatives","title":"Alternatives","text":"We compare features among the following Julia packages:","category":"page"},{"location":"alternatives/","page":"Alternatives","title":"Alternatives","text":"HiddenMarkovModels.jl (abbreviated to HMMs.jl)\nHMMBase.jl\nHMMGradients.jl","category":"page"},{"location":"alternatives/","page":"Alternatives","title":"Alternatives","text":"We discard MarkovModels.jl because its focus is GPU computation. There are also more generic packages for probabilistic programming, which are able to perform MCMC or variational inference (eg. Turing.jl) but we leave those aside.","category":"page"},{"location":"alternatives/","page":"Alternatives","title":"Alternatives","text":" HMMs.jl HMMBase.jl HMMGradients.jl\nAlgorithms Sim, FB, Vit, BW Sim, FB, Vit, BW FB\nObservation types anything Number / Vector anything\nObservation distributions DensityInterface.jl Distributions.jl manual\nMultiple sequences yes no yes\nPriors / structures possible no possible\nTemporal dependency yes no no\nControl dependency yes no no\nNumber types anything Float64 AbstractFloat\nAutomatic differentiation yes no yes\nLinear algebra yes yes no\nLogarithmic probabilities halfway halfway yes","category":"page"},{"location":"alternatives/","page":"Alternatives","title":"Alternatives","text":"Sim = Simulation, FB = Forward-Backward, Vit = Viterbi, BW = Baum-Welch","category":"page"},{"location":"alternatives/#Python","page":"Alternatives","title":"Python","text":"","category":"section"},{"location":"alternatives/","page":"Alternatives","title":"Alternatives","text":"hmmlearn (based on NumPy)\npomegrnate (based on PyTorch)\ndynamax (based on JAX)","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"EditURL = \"../../../examples/autodiff.jl\"","category":"page"},{"location":"examples/autodiff/#Autodiff","page":"Autodiff","title":"Autodiff","text":"","category":"section"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"Here we show how to compute gradients of the observation sequence loglikelihood with respect to various parameters.","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"using ComponentArrays\nusing DensityInterface\nusing Distributions\nusing Enzyme: Enzyme\nusing ForwardDiff: ForwardDiff\nusing HiddenMarkovModels\nimport HiddenMarkovModels as HMMs\nusing LinearAlgebra\nusing Random: Random, AbstractRNG\nusing StatsAPI\nusing Zygote: Zygote","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"rng = Random.default_rng()\nRandom.seed!(rng, 63);\nnothing #hide","category":"page"},{"location":"examples/autodiff/#Data-generation","page":"Autodiff","title":"Data generation","text":"","category":"section"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"init = [0.8, 0.2]\ntrans = [0.7 0.3; 0.3 0.7]\nmeans = [-1.0, 1.0]\ndists = Normal.(means)\nhmm = HMM(init, trans, dists);\nnothing #hide","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"obs_seqs = [rand(rng, hmm, 10).obs_seq, rand(rng, hmm, 20).obs_seq];\nobs_seq = reduce(vcat, obs_seqs)\nseq_ends = cumsum(length.(obs_seqs));\nnothing #hide","category":"page"},{"location":"examples/autodiff/#Forward-mode","page":"Autodiff","title":"Forward mode","text":"","category":"section"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"Since all of our code is type-generic, it is amenable to forward-mode automatic differentiation with ForwardDiff.jl.","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"Because this backend only accepts a single vector input, we wrap all parameters with ComponentArrays.jl, and define a new function to differentiate.","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"params = ComponentVector(; init, trans, means)\n\nfunction f(params::ComponentVector)\n new_hmm = HMM(params.init, params.trans, Normal.(params.means))\n return logdensityof(new_hmm, obs_seq; seq_ends)\nend;\nnothing #hide","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"The gradient computation is now straightforward. We will use this value as a source of truth to compare with reverse mode.","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"grad_f = ForwardDiff.gradient(f, params)","category":"page"},{"location":"examples/autodiff/#Reverse-mode","page":"Autodiff","title":"Reverse mode","text":"","category":"section"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"In the presence of many parameters, reverse mode automatic differentiation of the loglikelihood will be much more efficient. The package includes a chain rule for logdensityof, which means backends like Zygote.jl can be used out of the box.","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"grad_z = Zygote.gradient(f, params)[1]","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"grad_f ≈ grad_z","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"For increased efficiency, one can use Enzyme.jl and provide temporary storage. This requires going one level deeper, to the mutating HiddenMarkovModels.forward! function.","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"control_seq = fill(nothing, length(obs_seq))\n\nfunction f!(storage::HMMs.ForwardStorage, params::ComponentVector)\n new_hmm = HMM(params.init, params.trans, Normal.(params.means))\n HMMs.forward!(storage, new_hmm, obs_seq; control_seq, seq_ends)\n return sum(storage.logL)\nend\n\nstorage = HMMs.initialize_forward(hmm, obs_seq; control_seq, seq_ends)\nstorage_shadow = HMMs.initialize_forward(hmm, obs_seq; control_seq, seq_ends)\nparams_shadow = zero(params)\n\nEnzyme.autodiff(\n Enzyme.Reverse,\n f!,\n Enzyme.Active,\n Enzyme.Duplicated(storage, storage_shadow),\n Enzyme.Duplicated(params, params_shadow),\n)\n\ngrad_e = params_shadow","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"grad_e ≈ grad_f","category":"page"},{"location":"examples/autodiff/#Gradient-methods","page":"Autodiff","title":"Gradient methods","text":"","category":"section"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"Once we have gradients of the loglikelihood, it is a natural idea to perform gradient descent in order to fit the parameters of a custom HMM. However, there are two caveats we must keep in mind.","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"First, computing a gradient essentially requires running the forward-backward algorithm, which means it is expensive. Given the output of forward-backward, if there is a way to perform a more accurate parameter update (like going straight to the maximum likelihood value), it is probably worth it. That is what we show in the other tutorials with the reimplementation of the fit! method.","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"Second, HMM parameters live in a constrained space, which calls for a projected gradient descent. Most notably, the transition matrix must be stochastic, and the orthogonal projection onto this set (the Birkhoff polytope) is not easy to obtain.","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"Still, first order optimization can be relevant when we lack explicit formulas for maximum likelihood.","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"This page was generated using Literate.jl.","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"EditURL = \"../../../examples/types.jl\"","category":"page"},{"location":"examples/types/#Types","page":"Types","title":"Types","text":"","category":"section"},{"location":"examples/types/","page":"Types","title":"Types","text":"Here we explain why playing with different number and array types can be useful in an HMM.","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"using Distributions\nusing HiddenMarkovModels\nusing LinearAlgebra\nusing LogarithmicNumbers\nusing Random\nusing SparseArrays","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"rng = Random.default_rng()\nRandom.seed!(rng, 63);\nnothing #hide","category":"page"},{"location":"examples/types/#Logarithmic-numbers","page":"Types","title":"Logarithmic numbers","text":"","category":"section"},{"location":"examples/types/","page":"Types","title":"Types","text":"warning: Warning\nWork in progress","category":"page"},{"location":"examples/types/#Sparse-arrays","page":"Types","title":"Sparse arrays","text":"","category":"section"},{"location":"examples/types/","page":"Types","title":"Types","text":"Using sparse matrices is very useful for large models, because it means the memory and computational requirements will scale as the number of possible transitions. In general, this number is much smaller than the square of the number of states.","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"We can easily construct an HMM with a sparse transition matrix, where some transitions are structurally forbidden.","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"init = [0.2, 0.6, 0.2]\ntrans = sparse([\n 0.8 0.2 0.0\n 0.0 0.8 0.2\n 0.2 0.0 0.8\n])\ndists = [Normal(-2.0), Normal(0.0), Normal(+2.0)]\nhmm = HMM(init, trans, dists);\nnothing #hide","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"When we simulate it, the transitions outside of the nonzero coefficients simply cannot happen.","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"state_seq, obs_seq = rand(rng, hmm, 1000)\nstate_transitions = collect(zip(state_seq[1:(end - 1)], state_seq[2:end]));\nnothing #hide","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"count(isequal((2, 2)), state_transitions)","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"count(isequal((2, 1)), state_transitions)","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"Now we apply Baum-Welch from a guess with the right sparsity pattern.","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"init_guess = [0.3, 0.4, 0.3]\ntrans_guess = sparse([\n 0.7 0.3 0.0\n 0.0 0.7 0.3\n 0.3 0.0 0.7\n])\ndists_guess = [Normal(-1.5), Normal(0.0), Normal(+1.5)]\nhmm_guess = HMM(init_guess, trans_guess, dists_guess);\nnothing #hide","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"hmm_est, loglikelihood_evolution = baum_welch(hmm_guess, obs_seq);\nfirst(loglikelihood_evolution), last(loglikelihood_evolution)","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"The estimated model has kept the same sparsity pattern.","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"transition_matrix(hmm_est)","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"transition_matrix(hmm)","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"This page was generated using Literate.jl.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"EditURL = \"../../../examples/interfaces.jl\"","category":"page"},{"location":"examples/interfaces/#Interfaces","page":"Interfaces","title":"Interfaces","text":"","category":"section"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"Here we discuss how to extend the observation distributions or model fitting to satisfy specific needs.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"using DensityInterface\nusing Distributions\nusing HiddenMarkovModels\nimport HiddenMarkovModels as HMMs\nusing LinearAlgebra\nusing Random: Random, AbstractRNG\nusing StatsAPI","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"rng = Random.default_rng()\nRandom.seed!(rng, 63);\nnothing #hide","category":"page"},{"location":"examples/interfaces/#Custom-distributions","page":"Interfaces","title":"Custom distributions","text":"","category":"section"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"In an HMM object, the observation distributions do not need to come from Distributions.jl. They only need to implement three methods:","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"Random.rand(rng, dist) for sampling\nDensityInterface.logdensityof(dist, obs) for inference\nStatsAPI.fit!(dist, obs_seq, weight_seq) for learning","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"In addition, the observation can be arbitrary Julia types. So let's construct a distribution that generates stuff.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"struct Stuff{T}\n quantity::T\nend","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"The distribution will only be a wrapper for a normal distribution on the quantity.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"mutable struct StuffDist{T}\n quantity_mean::T\nend","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"Simulation is fairly easy.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"function Random.rand(rng::AbstractRNG, dist::StuffDist)\n quantity = dist.quantity_mean + randn(rng)\n return Stuff(quantity)\nend","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"It is important to declare to DensityInterface.jl that the custom distribution has a density, thanks to the following trait. The logdensity itself can be computed up to an additive constant without issue.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"DensityInterface.DensityKind(::StuffDist) = HasDensity()\n\nfunction DensityInterface.logdensityof(dist::StuffDist, obs::Stuff)\n return -abs2(obs.quantity - dist.quantity_mean) / 2\nend","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"Finally, the fitting procedure must happen in place, and take a sequence of weighted samples.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"function StatsAPI.fit!(\n dist::StuffDist, obs_seq::AbstractVector{<:Stuff}, weight_seq::AbstractVector{<:Real}\n)\n dist.quantity_mean =\n sum(weight_seq[k] * obs_seq[k].quantity for k in eachindex(obs_seq, weight_seq)) /\n sum(weight_seq)\n return nothing\nend","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"Let's put it to the test.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"init = [0.8, 0.2]\ntrans = [0.7 0.3; 0.3 0.7]\ndists = [StuffDist(-1.0), StuffDist(+1.0)]\nhmm = HMM(init, trans, dists);\nnothing #hide","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"When we sample an observation sequence, we get a vector of Stuff.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"state_seq, obs_seq = rand(rng, hmm, 100)\neltype(obs_seq)","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"And we can pass these observations to all of our inference algorithms.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"viterbi(hmm, obs_seq)","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"If we implement fit!, Baum-Welch also works seamlessly.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"init_guess = [0.7, 0.3]\ntrans_guess = [0.6 0.4; 0.4 0.6]\ndists_guess = [StuffDist(-0.5), StuffDist(+0.5)]\nhmm_guess = HMM(init_guess, trans_guess, dists_guess);\nnothing #hide","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"hmm_est, loglikelihood_evolution = baum_welch(hmm, obs_seq)\nfirst(loglikelihood_evolution), last(loglikelihood_evolution)","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"obs_distributions(hmm_est)","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"transition_matrix(hmm_est)","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"If you want more sophisticated examples, check out HiddenMarkovModels.LightDiagNormal and HiddenMarkovModels.LightCategorical, which are designed to be fast and allocation-free.","category":"page"},{"location":"examples/interfaces/#Custom-HMM-structures","page":"Interfaces","title":"Custom HMM structures","text":"","category":"section"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"In some scenarios, the vanilla Baum-Welch algorithm is not exactly what we want. For instance, we might have a prior on the parameters of our model, which we want to apply during the fitting step of the iterative procedure. Then we need to create a new type that satisfies the AbstractHMM interface.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"Let's make a simpler version of the built-in HMM, with a prior saying that each transition has already been observed a certain number of times. Such a prior can be very useful to regularize estimation and avoid numerical instabilities. It amounts to drawing every row of the transition matrix from a Dirichlet distribution, where each Dirichlet parameter is one plus the number of times the corresponding transition has been observed.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"struct PriorHMM{T,D} <: AbstractHMM\n init::Vector{T}\n trans::Matrix{T}\n dists::Vector{D}\n trans_prior_count::Int\nend","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"The basic requirements for AbstractHMM are the following three functions.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"While initialization will always have the same signature, transition_matrix and obs_distributions can accept an additional control argument, as we will see later on.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"HiddenMarkovModels.initialization(hmm::PriorHMM) = hmm.init\nHiddenMarkovModels.transition_matrix(hmm::PriorHMM) = hmm.trans\nHiddenMarkovModels.obs_distributions(hmm::PriorHMM) = hmm.dists","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"It is also possible to override logdensityof(hmm) and specify a prior loglikelihood for the model itself. If we forget to implement this, the loglikelihood computed in Baum-Welch will be missing a term, and thus it might decrease.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"function DensityInterface.logdensityof(hmm::PriorHMM)\n prior = Dirichlet(fill(hmm.trans_prior_count + 1, length(hmm)))\n return sum(logdensityof(prior, row) for row in eachrow(transition_matrix(hmm)))\nend","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"Finally, we must redefine the specific method of fit! that is used during Baum-Welch re-estimation. This function takes as inputs:","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"the hmm itself\na fb_storage of type HiddenMarkovModels.ForwardBackwardStorage containing the results of the forward-backward algorithm.\nthe same inputs as baum_welch for multiple sequences (we haven't encountered control_seq yet but its role will become clear in other tutorials)","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"The goal is to modify hmm in-place, updating parameters with their maximum likelihood estimates given current inference results. We will make use of the fields fb_storage.γ and fb_storage.ξ, which contain the state and transition marginals γ[i, t] and ξ[t][i, j] at each time step.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"function StatsAPI.fit!(\n hmm::PriorHMM,\n fb_storage::HiddenMarkovModels.ForwardBackwardStorage,\n obs_seq::AbstractVector;\n control_seq::AbstractVector,\n seq_ends::AbstractVector{Int},\n)\n # initialize to defaults without observations\n hmm.init .= 0\n hmm.trans .= hmm.trans_prior_count # our prior comes into play, otherwise 0\n # iterate over observation sequences\n for k in eachindex(seq_ends)\n # get sequence endpoints\n t1, t2 = seq_limits(seq_ends, k)\n # add estimated number of initializations in each state\n hmm.init .+= fb_storage.γ[:, t1]\n # add estimated number of transitions between each pair of states\n hmm.trans .+= sum(fb_storage.ξ[t1:t2])\n end\n # normalize\n hmm.init ./= sum(hmm.init)\n hmm.trans ./= sum(hmm.trans; dims=2)\n\n for i in 1:length(hmm)\n # weigh each sample by the marginal probability of being in state i\n weight_seq = fb_storage.γ[i, :]\n # fit observation distribution i using those weights\n fit!(hmm.dists[i], obs_seq, weight_seq)\n end\n\n # perform a few checks on the model\n HMMs.check_hmm(hmm)\n return nothing\nend","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"Note that some distributions, such as those from Distributions.jl:","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"do not support in-place fitting\nmight expect different input formats, e.g. higher-order arrays instead of a vector of objects","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"The function HiddenMarkovModels.fit_in_sequence! is a replacement for fit!, designed to handle Distributions.jl. You can overload it for your own objects too if needed.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"Now let's see that everything works.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"trans_prior_count = 10\nprior_hmm_guess = PriorHMM(init_guess, trans_guess, dists_guess, trans_prior_count);\nnothing #hide","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"prior_hmm_est, prior_logl_evolution = baum_welch(prior_hmm_guess, obs_seq)\nfirst(prior_logl_evolution), last(prior_logl_evolution)","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"As we can see, the transition matrix for our Bayesian version is slightly more spread out, although this effect would nearly disappear with enough data.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"cat(transition_matrix(hmm_est), transition_matrix(prior_hmm_est); dims=3)","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"This page was generated using Literate.jl.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"EditURL = \"../../../examples/basics.jl\"","category":"page"},{"location":"examples/basics/#Basics","page":"Basics","title":"Basics","text":"","category":"section"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"Here we show how to use the essential ingredients of the package.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"using Distributions\nusing HiddenMarkovModels\nusing LinearAlgebra\nusing Random","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"rng = Random.default_rng()\nRandom.seed!(rng, 63);\nnothing #hide","category":"page"},{"location":"examples/basics/#Model","page":"Basics","title":"Model","text":"","category":"section"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"The package provides a versatile HMM type with three attributes:","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"a vector of state initialization probabilities\na matrix of state transition probabilities\na vector of observation distributions, one for each state","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"We keep it simple for now by leveraging Distributions.jl.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"d = 3\ninit = [0.8, 0.2]\ntrans = [0.7 0.3; 0.3 0.7]\ndists = [MvNormal(-1.0 * ones(d), I), MvNormal(+1.0 * ones(d), I)]\nhmm = HMM(init, trans, dists);\nnothing #hide","category":"page"},{"location":"examples/basics/#Simulation","page":"Basics","title":"Simulation","text":"","category":"section"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"You can simulate a pair of state and observation sequences with rand by specifying how long you want them to be.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"state_seq, obs_seq = rand(rng, hmm, 20);\nnothing #hide","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"Note that the observation sequence is a vector, whose elements have whatever type an observation distribution returns when sampled.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"state_seq[1], obs_seq[1]","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"In practical applications, the state sequence is not known, which is why we need inference algorithms to gather information about it.","category":"page"},{"location":"examples/basics/#Inference","page":"Basics","title":"Inference","text":"","category":"section"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"The Viterbi algorithm (viterbi) returns the most likely state sequence hatX_1T = undersetX_1TmathrmargmaxmathbbP(X_1T vert Y_1T), along with the joint loglikelihood mathbbP(hatX_1T Y_1T).","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"best_state_seq, best_joint_loglikelihood = viterbi(hmm, obs_seq);\nnothing #hide","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"As we can see, it is very close to the true state sequence, but not necessarily equal.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"vcat(state_seq', best_state_seq')","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"The forward algorithm (forward) returns a matrix of filtered state marginals alphai t = mathbbP(X_t = i Y_1t), along with the loglikelihood mathbbP(Y_1T) of the observation sequence.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"filtered_state_marginals, obs_seq_loglikelihood1 = forward(hmm, obs_seq);\nnothing #hide","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"At each time t, it takes only the observations up to time t into account. This is particularly useful to infer the marginal distribution of the last state.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"filtered_state_marginals[:, end]","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"Conversely, the forward-backward algorithm (forward_backward) returns a matrix of smoothed state marginals gammai t = mathbbP(X_t = i Y_1T), along with the loglikelihood mathbbP(Y_1T) of the observation sequence.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"smoothed_state_marginals, obs_seq_loglikelihood2 = forward_backward(hmm, obs_seq);\nnothing #hide","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"At each time t, it takes all observations up to time T into account. This is particularly useful during learning. Note that forward and forward-backward only coincide at the last time step.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"collect(zip(filtered_state_marginals, smoothed_state_marginals))","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"Finally, we provide a thin wrapper (logdensityof) around the forward algorithm for observation sequence loglikelihoods mathbbP(Y_1T).","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"logdensityof(hmm, obs_seq)","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"The same function can also compute joint loglikelihoods mathbbP(X_1T Y_1T) that take the states into account.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"logdensityof(hmm, obs_seq, state_seq)","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"For instance, we can check that the output of Viterbi is at least as likely as the true state sequence.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"logdensityof(hmm, obs_seq, best_state_seq)","category":"page"},{"location":"examples/basics/#Learning","page":"Basics","title":"Learning","text":"","category":"section"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"The Baum-Welch algorithm (baum_welch) is a variant of Expectation-Maximization, designed specifically to estimate HMM parameters. Since it is a local optimization procedure, it requires a starting point that is close enough to the true model.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"init_guess = [0.7, 0.3]\ntrans_guess = [0.6 0.4; 0.4 0.6]\ndists_guess = [MvNormal(-0.7 * ones(d), I), MvNormal(+0.7 * ones(d), I)]\nhmm_guess = HMM(init_guess, trans_guess, dists_guess);\nnothing #hide","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"Let's estimate parameters based on a slightly longer sequence.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"_, long_obs_seq = rand(rng, hmm, 100)\nhmm_est, loglikelihood_evolution = baum_welch(hmm_guess, long_obs_seq);\nnothing #hide","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"An essential guarantee of this algorithm is that the loglikelihood of the observation sequence keeps increasing as the model improves.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"first(loglikelihood_evolution), last(loglikelihood_evolution)","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"We can check that the transition matrix estimate has improved.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"cat(transition_matrix(hmm_est), transition_matrix(hmm); dims=3)","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"And so have the estimates for the observation distributions.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"map(dist -> dist.μ, hcat(obs_distributions(hmm_est), obs_distributions(hmm)))","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"On the other hand, the initialization is concentrated on one state. This effect can be mitigated by learning from several independent sequences.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"hcat(initialization(hmm_est), initialization(hmm))","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"Since HMMs are not identifiable up to a permutation of the states, there is no guarantee that state i in the true model will correspond to state i in the estimated model. This is important to keep in mind when testing new models.","category":"page"},{"location":"examples/basics/#Multiple-sequences","page":"Basics","title":"Multiple sequences","text":"","category":"section"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"In many applications, we have access to various observation sequences of different lengths.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"_, long_obs_seq2 = rand(rng, hmm, 300)\n_, long_obs_seq3 = rand(rng, hmm, 200)\nlong_obs_seqs = [long_obs_seq, long_obs_seq2, long_obs_seq3];\nnothing #hide","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"Every algorithm in the package accepts multiple sequences in a concatenated form. The user must also specify where each sequence ends in the concatenated vector, by passing seq_ends as a keyword argument. Otherwise, the input will be treated as a unique observation sequence, which is mathematically incorrect.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"long_obs_seq_concat = reduce(vcat, long_obs_seqs)\nseq_ends = cumsum(length.(long_obs_seqs))","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"The outputs of inference algorithms are then concatenated, and the associated loglikelihoods are summed over all sequences.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"best_state_seq_concat, _ = viterbi(hmm, long_obs_seq_concat; seq_ends);\nlength(best_state_seq_concat)","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"The function seq_limits returns the begin and end of a given sequence in the concatenated vector. It can be used to untangle the results.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"start2, stop2 = seq_limits(seq_ends, 2)","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"best_state_seq_concat[start2:stop2] == first(viterbi(hmm, long_obs_seq2))","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"While inference algorithms can also be run separately on each sequence without changing the results, considering multiple sequences together is nontrivial for Baum-Welch. That is why the package takes care of it automatically.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"hmm_est_concat, _ = baum_welch(hmm_guess, long_obs_seq_concat; seq_ends);\nnothing #hide","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"Our estimate should be a little better.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"cat(transition_matrix(hmm_est_concat), transition_matrix(hmm); dims=3)","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"map(dist -> dist.μ, hcat(obs_distributions(hmm_est_concat), obs_distributions(hmm)))","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"This page was generated using Literate.jl.","category":"page"},{"location":"formulas/#Formulas","page":"Formulas","title":"Formulas","text":"","category":"section"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"Suppose we are given observations Y_1 Y_T, with hidden states X_1 X_T. Following (Rabiner, 1989), we use the following notations:","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"let pi in mathbbR^N be the initial state distribution pi_i = mathbbP(X_1 = i)\nlet A in mathbbR^N times N be the transition matrix a_ij = mathbbP(X_t+1=j X_t = i)\nlet B in mathbbR^N times T be the matrix of statewise observation likelihoods b_it = mathbbP(Y_t X_t = i)","category":"page"},{"location":"formulas/#Vanilla-forward-backward","page":"Formulas","title":"Vanilla forward-backward","text":"","category":"section"},{"location":"formulas/#Recursion","page":"Formulas","title":"Recursion","text":"","category":"section"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"The forward and backward variables are defined by","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"beginalign*\nalpha_it = mathbbP(Y_1t X_t=i) \nbeta_it = mathbbP(Y_t+1T X_t=i)\nendalign*","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"They are initialized with","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"beginalign*\nalpha_i1 = pi_i b_i1 \nbeta_iT = 1\nendalign*","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"and satisfy the dynamic programming equations","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"beginalign*\nalpha_jt+1 = left(sum_i=1^N alpha_it a_ijright) b_jt+1 \nbeta_it = sum_j=1^N a_ij b_jt+1 beta_jt+1\nendalign*","category":"page"},{"location":"formulas/#Likelihood","page":"Formulas","title":"Likelihood","text":"","category":"section"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"The likelihood of the whole sequence of observations is given by","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"mathcalL = mathbbP(Y_1T) = sum_i=1^N alpha_iT","category":"page"},{"location":"formulas/#Marginals","page":"Formulas","title":"Marginals","text":"","category":"section"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"We notice that","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"beginalign*\nalpha_it beta_it = mathbbP(Y_1T X_t=i) \nalpha_it a_ij b_jt+1 beta_jt+1 = mathbbP(Y_1T X_t=i X_t+1=j)\nendalign*","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"Thus we deduce the one-state and two-state marginals","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"beginalign*\ngamma_it = mathbbP(X_t=i Y_1T) = frac1mathcalL alpha_it beta_it \nxi_ijt = mathbbP(X_t=i X_t+1=j Y_1T) = frac1mathcalL alpha_it a_ij b_jt+1 beta_jt+1\nendalign*","category":"page"},{"location":"formulas/#Derivatives","page":"Formulas","title":"Derivatives","text":"","category":"section"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"According to (Qin et al., 2000), derivatives of the likelihood can be obtained as follows:","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"beginalign*\nfracpartial mathcalLpartial pi_i = beta_i1 b_i1 \nfracpartial mathcalLpartial a_ij = sum_t=1^T-1 alpha_it b_jt+1 beta_jt+1 \nfracpartial mathcalLpartial b_j1 = pi_j beta_j1 \nfracpartial mathcalLpartial b_jt = left(sum_i=1^N alpha_it-1 a_ijright) beta_jt \nendalign*","category":"page"},{"location":"formulas/#Scaled-forward-backward","page":"Formulas","title":"Scaled forward-backward","text":"","category":"section"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"In this package, we use a slightly different version of the algorithm, including both the traditional scaling of (Rabiner, 1989) and a normalization of B using m_t = max_i b_it.","category":"page"},{"location":"formulas/#Recursion-2","page":"Formulas","title":"Recursion","text":"","category":"section"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"The variables are initialized with","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"beginalign*\nhatalpha_i1 = pi_i fracb_i1m_1 c_1 = frac1sum_i hatalpha_i1 baralpha_i1 = c_1 hatalpha_i1 \nhatbeta_iT = 1 barbeta_1T = c_T hatbeta_1T\nendalign*","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"and satisfy the dynamic programming equations","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"beginalign*\nhatalpha_jt+1 = left(sum_i=1^N baralpha_it a_ijright) fracb_jt+1m_t+1 c_t+1 = frac1sum_j hatalpha_jt+1 baralpha_jt+1 = c_t+1 hatalpha_jt+1 \nhatbeta_it = sum_j=1^N a_ij fracb_jt+1m_t+1 barbeta_jt+1 barbeta_jt = c_t hatbeta_jt\nendalign*","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"In terms of the original variables, we find","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"beginalign*\nbaralpha_it = alpha_it left(prod_s=1^t fracc_sm_sright) \nbarbeta_it = beta_it left(c_t prod_s=t+1^T fracc_sm_sright)\nendalign*","category":"page"},{"location":"formulas/#Likelihood-2","page":"Formulas","title":"Likelihood","text":"","category":"section"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"Since we normalized baralpha at each time step,","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"1 = sum_i=1^N baralpha_iT = left(sum_i=1^N alpha_iTright) left(prod_s=1^T fracc_sm_sright) ","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"which means","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"mathcalL = sum_i=1^N alpha_iT = prod_s=1^T fracm_sc_s","category":"page"},{"location":"formulas/#Marginals-2","page":"Formulas","title":"Marginals","text":"","category":"section"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"We can now express the marginals using scaled variables:","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"beginalign*\ngamma_it = frac1mathcalL alpha_it beta_it = frac1mathcalL left(baralpha_it prod_s=1^t fracm_sc_sright) left(barbeta_it frac1c_t prod_s=t+1^T fracm_sc_sright) \n= frac1mathcalL fracbaralpha_it barbeta_itc_t left(prod_s=1^T fracm_sc_sright) = fracbaralpha_it barbeta_itc_t\nendalign*","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"beginalign*\nxi_ijt = frac1mathcalL alpha_it a_ij b_jt+1 beta_jt+1 \n= frac1mathcalL left(baralpha_it prod_s=1^t fracm_sc_sright) a_ij b_jt+1 left(barbeta_jt+1 frac1c_t+1 prod_s=t+2^T fracm_sc_sright) \n= frac1mathcalL baralpha_it a_ij fracb_jt+1m_t+1 barbeta_jt+1 left(prod_s=1^T fracm_sc_sright) \n= baralpha_it a_ij fracb_jt+1m_t+1 barbeta_jt+1\nendalign*","category":"page"},{"location":"formulas/#Derivatives-2","page":"Formulas","title":"Derivatives","text":"","category":"section"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"And we also need to adapt the derivatives. For the initial distribution,","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"beginalign*\nfracpartial mathcalLpartial pi_i = beta_i1 b_i1 = left(barbeta_i1 frac1c_1 prod_s=2^T fracm_sc_s right) b_i1 \n= left(prod_s=1^T fracm_sc_sright) barbeta_i1 fracb_i1m_1 = mathcalL barbeta_i1 fracb_i1m_1 \nendalign*","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"For the transition matrix,","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"beginalign*\nfracpartial mathcalLpartial a_ij = sum_t=1^T-1 alpha_it b_jt+1 beta_jt+1 \n= sum_t=1^T-1 left(baralpha_it prod_s=1^t fracm_sc_s right) b_jt+1 left(barbeta_jt+1 frac1c_t+1 prod_s=t+2^T fracm_sc_s right) \n= sum_t=1^T-1 baralpha_it fracb_jt+1m_t+1 barbeta_jt+1 left(prod_s=1^T fracm_sc_s right) \n= mathcalL sum_t=1^T-1 baralpha_it fracb_jt+1m_t+1 barbeta_jt+1 \nendalign*","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"And for the statewise observation likelihoods,","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"beginalign*\nfracpartial mathcalLpartial b_j1 = pi_j beta_j1 = pi_j barbeta_j1 frac1c_1 prod_s=2^T fracm_sc_s = mathcalL pi_j barbeta_j1 frac1m_1\nendalign*","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"beginalign*\nfracpartial mathcalLpartial b_jt = left(sum_i=1^N alpha_it-1 a_ijright) beta_jt \n= sum_i=1^N left(baralpha_it-1 prod_s=1^t-1 fracm_sc_sright) a_ij left(barbeta_jt frac1c_t prod_s=t+1^T fracm_sc_s right) \n= sum_i=1^N baralpha_it-1 a_ij barbeta_jt frac1m_t left(prod_s=1^T fracm_sc_sright) \n= mathcalL sum_i=1^N baralpha_it-1 a_ij barbeta_jt frac1m_t \nendalign*","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"Finally, we note that","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"fracpartial log mathcalLpartial log b_jt = fracpartial log mathcalLpartial b_jt b_jt","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"To sum up,","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"beginalign*\nfracpartial log mathcalLpartial pi_i = fracb_i1m_1 barbeta_i1 \nfracpartial log mathcalLpartial a_ij = sum_t=1^T-1 baralpha_it fracb_jt+1m_t+1 barbeta_jt+1 \nfracpartial log mathcalLpartial log b_j1 = pi_j fracb_j1m_1 barbeta_j1 = fracbaralpha_j1 barbeta_j1c_1 = gamma_j1 \nfracpartial log mathcalLpartial log b_jt = sum_i=1^N baralpha_it-1 a_ij fracb_jtm_t barbeta_jt = fracbaralpha_jt barbeta_jtc_t = gamma_jt\nendalign*","category":"page"},{"location":"formulas/#Bibliography","page":"Formulas","title":"Bibliography","text":"","category":"section"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"Qin, F.; Auerbach, A. and Sachs, F. (2000). A Direct Optimization Approach to Hidden Markov Modeling for Single Channel Kinetics. Biophysical Journal 79, 1915–1927. Accessed on Sep 10, 2023.\n\n\n\nRabiner, L. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77, 257–286. Accessed on Sep 10, 2023.\n\n\n\n","category":"page"},{"location":"","page":"Home","title":"Home","text":"EditURL = \"https://github.com/gdalle/HiddenMarkovModels.jl/blob/main/README.md\"","category":"page"},{"location":"#HiddenMarkovModels.jl","page":"Home","title":"HiddenMarkovModels.jl","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"(Image: Stable) (Image: Dev) (Image: Build Status) (Image: Coverage) (Image: Code Style: Blue)","category":"page"},{"location":"","page":"Home","title":"Home","text":"(Image: DOI) (Image: Aqua QA) (Image: JET)","category":"page"},{"location":"","page":"Home","title":"Home","text":"A Julia package for simulation, inference and learning of Hidden Markov Models.","category":"page"},{"location":"#Getting-started","page":"Home","title":"Getting started","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"This package can be installed using Julia's package manager:","category":"page"},{"location":"","page":"Home","title":"Home","text":"pkg> add HiddenMarkovModels","category":"page"},{"location":"","page":"Home","title":"Home","text":"Then, you can create your first model as follows:","category":"page"},{"location":"","page":"Home","title":"Home","text":"using Distributions, HiddenMarkovModels\ninit = [0.4, 0.6]\ntrans = [0.9 0.1; 0.2 0.8]\ndists = [Normal(-1.0), Normal(1.0)]\nhmm = HMM(init, trans, dists)","category":"page"},{"location":"","page":"Home","title":"Home","text":"Take a look at the documentation to know what to do next!","category":"page"},{"location":"#Some-background","page":"Home","title":"Some background","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"Hidden Markov Models (HMMs) are a widely used modeling framework in signal processing, bioinformatics and plenty of other fields. They explain an observation sequence (Y_t) by assuming the existence of a latent Markovian state sequence (X_t) whose current value determines the distribution of observations. In our framework, both the state and the observation sequence are also allowed to depend on a known control sequence (U_t). Each of the problems below has an efficient solution algorithm which our package implements:","category":"page"},{"location":"","page":"Home","title":"Home","text":"Problem Goal Algorithm\nEvaluation Likelihood of the observation sequence Forward\nFiltering Last state marginals Forward\nSmoothing All state marginals Forward-backward\nDecoding Most likely state sequence Viterbi\nLearning Maximum likelihood parameter Baum-Welch","category":"page"},{"location":"#Main-features","page":"Home","title":"Main features","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"This package is generic. Observations can be arbitrary Julia objects, not just scalars or arrays. Number types are not restricted to floating point, which enables automatic differentiation. Time-dependent or controlled HMMs are supported out of the box.","category":"page"},{"location":"","page":"Home","title":"Home","text":"This package is fast. All the inference functions have allocation-free versions, which leverage efficient linear algebra subroutines. We will include extensive benchmarks against Julia and Python competitors.","category":"page"},{"location":"","page":"Home","title":"Home","text":"This package is reliable. It gives the same results as the previous reference package up to numerical accuracy. The test suite incorporates quality checks as well as type stability and allocation analysis.","category":"page"},{"location":"#Contributing","page":"Home","title":"Contributing","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"If you spot a bug or want to ask about a new feature, please open an issue on the GitHub repository. Once the issue receives positive feedback, feel free to try and fix it with a pull request that follows the BlueStyle guidelines.","category":"page"},{"location":"#Acknowledgements","page":"Home","title":"Acknowledgements","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"A big thank you to Maxime Mouchet and Jacob Schreiber, the respective lead devs of alternative packages HMMBase.jl and pomegranate, for their help and advice. Logo by Clément Mantoux based on a portrait of Andrey Markov.","category":"page"}] }