Skip to content

Commit

Permalink
grammarly changes
Browse files Browse the repository at this point in the history
  • Loading branch information
sebastianbot6969 committed Jan 13, 2025
1 parent f70fe91 commit 7ce035f
Show file tree
Hide file tree
Showing 2 changed files with 48 additions and 31 deletions.
31 changes: 16 additions & 15 deletions report/src/sections/02-preliminaries.tex
Original file line number Diff line number Diff line change
@@ -1,19 +1,19 @@
\section{Preliminaries}\label{sec:preliminaries}
In this section, we introduce the concepts and definitions essential for understanding the subsequent sections.
This section introduces the concepts and definitions essential for understanding the subsequent sections.
We begin by defining the key concepts of a \gls{hmm} and then describe how it can be represented using matrices.

We then introduce the Baum-Welch algorithm, which estimates the parameters of a \gls{hmm} from observed data.
We describe all the steps involved in the Baum-Welch algorithm, namely the forward-backward algorithm and the parameter's update.

Finally, we describe how the Baum-Welch algorithm can be implemented using matrix operations.
Finally, we describe how the Baum-Welch algorithm can be implemented using matrix operations..

\subsection{Hidden Markov Models}\label{subsec:hmm}
%definition of HMM
Baum and Petrie introduced \glspl{hmm} in 1966~\cite{baum1966statistical}.
\glspl{hmm} are a class of probabilistic models widely used to describe sequences of observations dependent on some underlying hidden states.
These models consist of two main components: observations and hidden states.
The observations are the visible data emitted by the model, while the hidden states represent the underlying process that generates these observations.
\glspl{hmm} have applications in fields such as speech recognition~\cite{chavan2013overview}, bioinformatics~\cite{de2007hidden}, and natural language processing~\cite{murveit1990integrating}.
\glspl{hmm} have applications in speech recognition\cite{chavan2013overview}, bioinformatics~\cite{de2007hidden}, and natural language processing~\cite{murveit1990integrating}.
\gls{hmm} was chosen as the model of choice for this project due to its versatility and ability to model complex systems.

\begin{definition}[Hidden Markov Model]
Expand Down Expand Up @@ -101,23 +101,24 @@ \subsection{Observations and Hidden States}\label{subsec:observations-hidden-sta
\subsection{The Baum-Welch Algorithm}\label{subsec:baum-welch}
The Baum-Welch algorithm is a fundamental method for estimating the parameters of a \gls{hmm} given a sequence of observations.
These parameters include the emission matrix $\pmb{\omega}$, the transition matrix $\pmb{P}$, and the initial state distribution $\pmb{\pi}$.
The algorithm is widely recognized as the standard approach for training \glspl{hmm} and was chosen for this project because it can estimate these parameters without prior knowledge of the hidden states that generated the observations~\cite{levinson1983introduction}.
The algorithm is widely recognized as the standard approach for training \glspl{hmm}.
It was chosen for this project because it can estimate these parameters without prior knowledge of the hidden states that generated the observations~\cite{levinson1983introduction}.

The Baum-Welch algorithm applies the Expectation-Maximization (EM) framework to iteratively improve the likelihood of the observed data under the current model parameters. It does so untill it reaches a set convergence value, which indicates how much the model improves after each iteration.
The Baum-Welch algorithm applies the Expectation-Maximization (EM) framework to iteratively improve the likelihood of the observed data under the current model parameters. It does so until it reaches a set convergence value, which indicates how much the model improves after each iteration.
It consists of the following steps:

\begin{enumerate}
\item \textbf{Initialization:} Begin with a given initial estimates for the \gls{hmm} parameters $(\pmb{\pi}, \pmb{P}, \pmb{\omega})$.
\item \textbf{Expectation Step (E-step):} Compute the expected counts of the latent variables, i.e., the hidden states, based on the observation sequence and the current model parameters.
That is we compute the probabilities of observing the sequence up to time $t$, given that the \gls{hmm} is in state $s$ at time $t$ and the probabilities of observing the sequence from time $t+1$ to the end, given that the \gls{hmm} is in state $s$ at time $t$.
That is, we compute the probabilities of observing the sequence up to time $t$, given that the \gls{hmm} is in state $s$ at time $t$ and the probabilities of observing the sequence from time $t+1$ to the end, given that the \gls{hmm} is in state $s$ at time $t$.
\item \textbf{Maximization Step (M-step):} Update the \gls{hmm} parameters $(\pmb{\pi}, \pmb{P}, \pmb{\omega})$ to maximize the likelihood of the observed data based on the expected counts computed in the E-step.
\item \textbf{Iteration:} Repeat the E-step and M-step until convergence, i.e., when the change in likelihood between iterations falls below a predefined threshold.
\end{enumerate}

The Baum Welch algorithm seeks to estimate the parameters $\pmb{\tau}$, $\pmb{\pi}$, and $\pmb{\omega}$ of a \gls{hmm} model $\mathcal{M}$, so that it maximizes the likelihood function $\mathfrak{l}(\mathcal{M} | O)$.
That is, the probability that the \gls{hmm} $\mathcal{M}$ has independently generated each observation sequence $O_1, \cdots, O_N$.

Starting with an initial hypothesis $\textbf{x}_0 = (\pmb{\pi}, \pmb{P}, \pmb{\omega})$, the algorithm produces a sequence of parameter estimates $\textbf{x}_1, \textbf{x}_2, \cdots$, where each new estimate improves upon the previous one.
Starting with initial hypothesis $\textbf{x}_0 = (\pmb{\pi}, \pmb{P}, \pmb{\omega})$, the algorithm produces a sequence of parameter estimates $\textbf{x}_1, \textbf{x}_2, \cdots$, where each new estimate improves upon the previous one.
The process terminates when the improvement in likelihood is sufficiently small, satisfying the convergence criterion:

\[
Expand All @@ -138,7 +139,7 @@ \subsection{Initialization of HMM Parameters}\label{subsec:initialization}
As we work with models that we have no prior knowledge of, meaning we do not know the number of observations or what states generated the observations, we cannot initialize the model parameters based on domain knowledge.
Therefore, we need to initialize the parameters based on some strategy.
If we set a probability to zero, we see in parameter estimation (\autoref{eq:xi}) that the probability will remain zero.
Therefore if no domain knowledge is available, it is better to initialize the parameters with non-zero values.
Therefore, if no domain knowledge is available, it is better to initialize the parameters with non-zero values.
A common approach to initialize these parameters is as one of the following strategies:

\begin{enumerate}
Expand All @@ -157,9 +158,9 @@ \subsection{Initialization of HMM Parameters}\label{subsec:initialization}
\end{enumerate}

We initialize the parameters using random initialization, as it provides a diverse set of initial values that can help avoid local optima.
The uniform initialization is the worst choice as it provides the least amount of information.
With uniform initialization, any transition is equally likely to be chosen to explain an observation, making the latent states indistinguishable from each other.
Therefore it is not recommended for practical use.
The uniform initialization is the worst choice as it provides the least information.
With uniform initialization, any transition is equally likely to be chosen to explain an observation, making the latent states indistinguishable.
Therefore, it is not recommended for practical use.

These initialization strategies provide a starting point for the Baum-Welch algorithm, which iteratively refines the model parameters based on the observed data.

Expand Down Expand Up @@ -196,12 +197,12 @@ \subsection{Forward-Backward Algorithm}\label{subsec:forward-backwards_algorithm

The forward-backward algorithm computes the forward and backward variables for each state $s$ and time $t$ in the observation sequence $\mathbf{o}$, providing a comprehensive view of the likelihood of the observed data under the model.

In preparation for later discussions, we would like to draw attention to the fact that the above recurrences can be solved using dynamic programming requiring one to use $\Theta(|S|\times|(|\mathbf{o}|-1)|)$ space.
In preparation for later discussions, we would like to draw attention to the fact that the above recurrences can be solved using dynamic programming, which requires $\Theta(|S| \times |\mathbf{o}|)$ space.

\subsection{Update Step}\label{subsec:update-algorithm}
%Update of HMM
The update step refines the parameter values of the \gls{hmm} model based on the observed data and the forward and backward variables computed in the forward-backward algorithm.
Given the forward and backward variables $\alpha_s(t)$ and $\beta_s(t)$, the update step aims to maximize the likelihood of the observed data by adjusting the parameter values.
Given the forward and backward variables $\alpha_s(t)$ and $\beta_s(t)$, the update step adjusts the parameter values to maximize the likelihood of the observed data.

The update step iteratively refines the parameter values until convergence is reached.

Expand Down Expand Up @@ -236,8 +237,8 @@ \subsubsection{Intermediate Variables}
For $\gamma_s(t)$, this involves dividing by the total probability across all states at time $t$, while for $\xi_{ss'}(t)$, normalization occurs over all possible transitions at time $t$.

\subsubsection{Parameter Update}
The parameter update step refines the parameter values of the model based on the earlier computed intermediate variables $\gamma_s(t)$ and $\xi_{ss'}(t)$.
The update step aims to maximize the expected likelihood of the observed data given the model $\mathcal{M}$ by adjusting the parameter values.
The parameter update step refines the models parameter values based on the intermediate variables $\gamma_s(t)$ and $\xi_{ss'}(t)$.
THe update step adjusts the parameter values to maximize the likelihood of the observed data given the model $\mathcal{M}$.

Once $\gamma_s(t)$ and $\xi_{ss'}(t)$ are computed for all states $s, s'$ and all time steps $t$ for every observation sequence, the model parameters can be updated to maximize the expected log-likelihood.

Expand Down
48 changes: 32 additions & 16 deletions report/src/sections/05-discussion.tex
Original file line number Diff line number Diff line change
Expand Up @@ -11,20 +11,36 @@
%Jajapy implementation in storm

\section{Discussion}\label{sec:discussion}

This section will cover the paper's discussion. This paper aimed to study how implementing a symbolic representation of the Baum-Welch algorithm could impact the expected runtime and model accuracy. This work extends the study conducted in~\cite{p7} by developing an entirely symbolic representation using \glspl{add} at every point of the Baum-Welch algorithm where matrices are traditionally used.

Unfortunately, no significant findings were achieved in our study. Several factors contributed to this outcome. One such factor was the issue of extending Storms implementation of CUDD to handle operations, required to compute the Baum-Welch algorithm using \glspl{add}. Initially, we aimed to read files representing our data and create models from these using Storm. However, this process proved more difficult than expected. The Storm library proved particularly challenging to navigate. Although its extensive documentation was initially considered beneficial, it often led to confusion and hindered progress. As a result, initial models were instead written manually, and random values were used as data for creating these models and observed data. If we had used manual initialization of models ealier, further testing and experimentation could have been conducted, leading to a more complete implementation.


Furthermore, the choice of library for manipulating \glspl{add} was not thoroughly considered, as we simply intended to extend the work in~\cite{p7}. Efforts should have been dedicated to evaluating alternative libraries that work with \glspl{add}. Examples of such libraries include Sylvan, also written in C and provides parallel execution capabilities. While CuDD worked effectively in~\cite{p7}, further consideration of alternative options could have yielded better results.

While working on our implementation, we studied the symbolic implementation of the Forward-Backward algorithm in~\cite{p7}. Here, we identified an error in the implementation that suggested significant runtime improvements. This error involved a miscalculation caused by misaligned ADD variables in the matrix multiplication of the forward step.

Although the results and values remained correct, the computation time increased unnecessarily. The paper~\cite{p7} had already demonstrated significant runtime improvements compared to the non-symbolic comparison implementations. Removing this unnecessary calculation could further enhance runtime performance.

Additionally, the implementation in~\cite{p7} was only partially symbolic. While the Forward and Backward calculations used symbolic representations, other components relied on traditional matrices. This resulted in frequent conversions between \glspl{add} and matrices, increasing computational overhead. An entirely symbolic implementation would likely reduce this overhead and improve efficiency.

These observations motivates the continuation of studying symbolic representations of the Baum-Welch algorithm. They also highlight the potential for further performance improvements through error correction and adopting an entirely symbolic approach.

This section will cover the paper's discussion.
This paper aimed to study how implementing a symbolic representation of the Baum-Welch algorithm could impact the expected runtime and model accuracy.
This work extends the study conducted in~\cite{p7} by developing an entirely symbolic representation using \glspl{add} at every point of the Baum-Welch algorithm where matrices are traditionally used.

Unfortunately, no significant findings were achieved in our study. Several factors contributed to this outcome.
One such factor was the issue of extending Storms implementation of CUDD to handle operations, which required computing the Baum-Welch algorithm using \glspl{add}.
Initially, we aimed to read files representing our data and create models using Storm.
However, this process proved more difficult than expected.
The Storm library proved particularly challenging to navigate.
Although its extensive documentation was initially considered beneficial, it often led to confusion and hindered progress.
As a result, initial models were instead written manually, and random values were used as data for creating these models and observed data.
If we had used manual initialization of models earlier, further testing and experimentation could have been conducted, leading to a more complete implementation.

Furthermore, the library choice for manipulating \glspl{add} was not thoroughly considered, as we intended to extend the work in~\cite{p7}.
Efforts should have been dedicated to evaluating alternative libraries that work with \glspl{add}.
Examples of such libraries include Sylvan, also written in C and provides parallel execution capabilities.
While CUDD worked effectively in~\cite{p7}, further consideration of alternative options could have yielded better results.

While working on our implementation, we studied the symbolic implementation of the Forward-Backward algorithm in~\cite{p7}.
Here, we identified an error in the implementation that suggested significant runtime improvements.
This error involved a miscalculation caused by misaligned \gls{add} variables in the matrix multiplication of the forward step.

Although the results and values remained correct, the computation time increased unnecessarily.
The paper~\cite{p7} had already demonstrated significant runtime improvements compared to the non-symbolic comparison implementations.
Removing this unnecessary calculation could further enhance runtime performance.

Additionally, the implementation in~\cite{p7} was only partially symbolic. While the Forward and Backward calculations used symbolic representations, other components relied on traditional matrices.
This resulted in frequent conversions between \glspl{add} and matrices, increasing computational overhead.
An entirely symbolic implementation would likely reduce this overhead and improve efficiency.

These observations motivate the continuation of studying symbolic representations of the Baum-Welch algorithm.
They also highlight the potential for further performance improvements through error correction and adopting an entirely symbolic approach.

0 comments on commit 7ce035f

Please sign in to comment.