diff --git a/report/src/deprecated/02-preliminaries.tex b/report/src/deprecated/02-preliminaries.tex
index 8373816..f3b4441 100644
--- a/report/src/deprecated/02-preliminaries.tex
+++ b/report/src/deprecated/02-preliminaries.tex
@@ -116,19 +116,21 @@ \subsubsection{Key Properties}
 These parametric formulations allow a single \gls{pctmc} to represent a broad class of \glspl{ctmc}, where the specific model instance is determined by fixing the parameter values.
 
 \subsection{Baum-Welch Algorithm}\label{subsec:baum-welch}
-The Baum-Welch algorithm is an iterative method used to estimate the parameters of a \gls{pctmc} from observed data.
-The algorithm aims to find the parameter values that maximize the likelihood of the observed data under the model.
-This process is based on the Expectation-Maximization (EM) framework and involves two main steps:
+The Baum-Welch algorithm is a key method for estimating the probabilities of an \gls{hmm} from observed data. 
+The probabilities of an \gls{hmm} are the emission matrix $\omega$, the transition matrix $P$, and the initial distribution $\pi$.
+It was chosen as the method for this project due to its ability to estimate the probabilities of a \gls{hmm} without knowing the hidden states that generated the observations, and it is also the standard method for training \glspl{hmm}.
+If looking at other Markov models such as \glspl{mc}, the Baum-Welch algorithm can be used to estimate the parameters of the model from observed data, therefore it is a suitable choice for this project, as this can be used to estimate the parameters of other Markov models.
+It leverages the Expectation-Maximization (EM) framework and consists of two iterative steps:
 
 \begin{enumerate}
-    \item \textbf{Expectation Step (E-step)}: Compute the expected values of the latent variables, which are the unobserved state sequences corresponding to the observations.
-    \item \textbf{Maximization Step (M-step)}: Update the parameter values to maximize the likelihood of the observed data, using the expected latent variables computed in the E-step.
+    \item \textbf{Expectation Step (E-step)}: Compute the expected the forward and backward variables, for each state $s$ and time $t$. of the latent variables, which are the unobserved state sequences corresponding to the observations. These variables represent the likelihood of being in state $s$ at time $t$ given the observed data up to time $t$ and the likelihood of observing the remaining data from time $t$ onwards given the state $s$ at time $t$, respectivly. 
+    \item \textbf{Maximization Step (M-step)}: Update the model parameters (emission matrix $\omega$, transition matrix $P$, and initial distribution $\pi$) to maximize the likelihood of the observed data, using the expected values computed in the E-step.
     \item Repeat the E-step and M-step until convergence.
 \end{enumerate}
 
-The Baum-Welch algorithm is particularly useful for estimating the parameters of a \gls{pctmc} when the underlying state sequence is unknown or partially observed.
+The Baum-Welch algorithm is particularly useful for estimating the properbilities of the emission and transition matrices of a HMM, given a set of observations, without knowing the hidden states that generated the observations.
 
-Given a multiset of observations $\mathcal{O}$ and initial parameters $\textbf{x}_0$, the Baum-Welch algorithm estimates the parameters of a \gls{pctmc} $\mathcal{P}$ by iteratively improving the current hypothesis $\textbf{x}_n$ using the previous estimate $\textbf{x}_{n-1}$ until a convergence criterion is met.
+Given a multiset of observations $\mathcal{O}$ and initial parameters $\textbf{x}_0$, the Baum-Welch algorithm estimates the parameters of a \gls{hmm} $\mathcal{P}$ by iteratively improving the current hypothesis $\textbf{x}_n$ using the previous estimate $\textbf{x}_{n-1}$ until a convergence criterion is met.
 A hypothesis refers to a specific set of values for the parameters $\mathbf{x}$.
 
 Each iteration of the algorithm produces a new hypothesis, denoted as $\textbf{x}_n$, which is the algorithm's current best guess for the parameter values based on the observed data.
@@ -137,10 +139,10 @@ \subsection{Baum-Welch Algorithm}\label{subsec:baum-welch}
 This is typically evaluated using a convergence criterion such as:
 
 \begin{equation}
-    ||\textbf{x}_n - \textbf{x}_{n-1}|| < \epsilon\label{eq:convergence-criterion}
+    ||l(\textbf{x}_n) - l(\textbf{x}_{n-1})|| < \epsilon\label{eq:convergence-criterion}
 \end{equation}
 
-where $\epsilon > 0$ is a small threshold, and $\textbf{x}_n$ denotes the parameter values at the $n$-th iteration.
+where $\epsilon > 0$ is a small threshold, and $l(\textbf{x}_n)$ denotes the likelihood of the observed data given the parameter values at the $n$-th iteration.
 
 The algorithm stops when the change in parameters is sufficiently small, indicating that the model has converged to a local maximum of the likelihood function.
 The parameter estimation procedure is outlined in \autoref{alg:parameter-estimation}.
@@ -149,7 +151,7 @@ \subsection{Baum-Welch Algorithm}\label{subsec:baum-welch}
     \begin{codebox}
         \Procname{$\proc{Estimate-Parameters}(\mathcal{P}, \mathbf{x}_0, \mathcal{O})$}
         \li $\mathbf{x} \gets \mathbf{x}0$
-        \li \While $\neg\proc{Criterion}(\mathbf{x}{n-1}, \mathbf{x}n)$
+        \li \While $\neg\proc{Criterion}(\mathbf{x}_{n-1}, \mathbf{x}_n)$
         \li \Do $\mathbf{x}_{n - 1} \gets \mathbf{x}_n$
         \li $(\alpha, \beta) = \proc{Forward-Backward}(\mathcal{P}(\mathbf{x}_n), \mathcal{O})$
         \li $\mathbf{x}_n = \proc{Update}(\mathcal{P}(\mathbf{x}_n), \mathcal{O}, \alpha, \beta)$ \End
@@ -159,8 +161,8 @@ \subsection{Baum-Welch Algorithm}\label{subsec:baum-welch}
     \label{alg:parameter-estimation}
 \end{algorithm}
 
-Starting with initial parameters $\mathbf{x}_0$, the parameter estimation procedure iteratively improves the current hypothesis $\mathbf{x}_n$ using the previous estimate $\mathbf{x}_{n-1}$ until a specified criterion for convergence is met.
-The specifics of the $\proc{Forward-Backward}$ and $\proc{Update}$ procedures are detailed in \autoref{subsec:forward-backwards_algorithm} and \autoref{subsec:update-algorithm} from~\cite{p7}.
+Starting with initial parameters $\mathbf{x}_0$, the parameter estimation procedure iteratively improves the current hypothesis $\mathbf{x}_n$ using the previous estimate $\mathbf{x}_{n-1}$ until a specified criterion for convergence is met, the algorithm returns the final estimate $\mathbf{x}_n$.
+The specifics of the $\proc{Forward-Backward}$ and $\proc{Update}$ procedures are detailed in \autoref{subsec:forward-backwards_algorithm} and \autoref{subsec:update-algorithm} from~\cite{baum1970maximization}.
 
 \subsection{The Forward-Backward Algorithm}\label{subsec:forward-backwards_algorithm}
 For a given \gls{ctmc} $\mathcal{M}$, the forward-backward algorithm computes the forward and backward variables, $\alpha_s(t)$ and $\beta_s(t)$, for each observation sequence $o_0, o_1, \dots, o_{|\mathbf{o}|-1} = \mathbf{o} \in \mathcal{O}$.
diff --git a/report/src/sections/02-preliminaries.tex b/report/src/sections/02-preliminaries.tex
index b13cc4f..163847a 100644
--- a/report/src/sections/02-preliminaries.tex
+++ b/report/src/sections/02-preliminaries.tex
@@ -72,58 +72,77 @@ \subsection{Observations and Hidden States}\label{subsec:observations-hidden-sta
 An \gls{hmm} operates on sequences of observations, denoted as $O = {O_1, O_2, \ldots, O_N}$, where each $O_i$ is a sequence of labels $o_1, o_2, \ldots, o_{|\textbf{O}|-1}$.
 The task is to infer the sequence of hidden states $S = s_1, s_2, \ldots, s_{|\textbf{O}|-1}$ that most likely generated these observations.
 
-Given an observation sequence $O$, the goal is to maximize the probability of the hidden states conditioned on the observations.
+This involves maximizing the probability of the hidden states $S$ conditioned on the observed sequence $O$. In essence, the observations are the labels for which we seek the corresponding hidden state sequence, representing the most probable path that generated these observations.
 
-This inference is commonly achieved using the Baum-Welch algorithm.
+This inference is commonly performed using the Baum-Welch algorithm, a widely used method for estimating the probabilities of an \gls{hmm} and finding the most likely hidden state sequence.
 
 %Baum-Welch for HMM mathamatically
 \subsection{Baum-Welch Algorithm}\label{subsec:baum-welch}
-The Baum-Welch algorithm is a key method for estimating the parameters of an \gls{hmm} from observed data. 
-It was chosen as the method of choice for this project due to its ability to estimate the parameters of a \gls{hmm} without knowing the hidden states that generated the observations, and it is also the standard method for training \glspl{hmm}.
-If looking at other Markov models such as \glspl{mc}, the Baum-Welch algorithm can be used to estimate the parameters of the model from observed data, therefore it is a suitable choice for this project, as this can be used to estimate the parameters of other Markov models.
-It leverages the Expectation-Maximization (EM) framework and consists of two iterative steps:
+The Baum-Welch algorithm is a fundamental method for estimating the parameters of a \gls{hmm} given a sequence of observations. 
+These parameters include the emission matrix $\omega$, the transition matrix $P$, and the initial state distribution $\pi$.
+The algorithm is widely recognized as the standard approach for training \glspl{hmm} and was chosen for this project due to its ability to estimate these parameters without prior knowledge of the hidden states that generated the observations.
+
+The Baum-Welch algorithm applies the Expectation-Maximization (EM) framework to iteratively improve the likelihood of the observed data under the current model parameters. It consists of the following steps:
 
 \begin{enumerate}
-    \item \textbf{Expectation Step (E-step)}: Compute the expected the forward and backward variables, for each state $s$ and time $t$. of the latent variables, which are the unobserved state sequences corresponding to the observations. These variables represent the likelihood of being in state $s$ at time $t$ given the observed data up to time $t$ and the likelihood of observing the remaining data from time $t$ onwards given the state $s$ at time $t$, respectivly. 
-    \item \textbf{Maximization Step (M-step)}: Update the model parameters (emission matrix $\omega$, transition matrix $P$, and initial distribution $\pi$) to maximize the likelihood of the observed data, using the expected values computed in the E-step.
-    \item Repeat the E-step and M-step until convergence.
+    \item \textbf{Initialization:} Begin with initial estimates for the \gls{hmm} parameters $(\pi, P, \omega)$.
+    \item \textbf{Expectation Step (E-step):}
+    \begin{enumerate}
+        \item Compute the forward probabilities $\alpha_s(t)$ and backward probabilities $\beta_s(t)$, which represent:
+        \begin{itemize}
+            \item The probability of observing the sequence up to time $t$, given that the \gls{hmm} is in state $s$ at time $t$ ($\alpha_s(t)$).
+            \item The probability of observing the sequence from time $t+1$ to the end, given that the \gls{hmm} is in state $s$ at time $t$ ($\beta_s(t)$).
+        \end{itemize}
+    \end{enumerate}
+    \item \textbf{Maximization Step (M-step):} Update the \gls{hmm} parameters $(\pi, P, \omega)$ to maximize the likelihood of the observed data based on the expected counts computed in the E-step.
+    \item \textbf{Iteration:} Repeat the E-step and M-step until convergence, i.e., when the change in likelihood between iterations falls below a predefined threshold.
+    
 \end{enumerate}
+The Baum-Welch algorithm refines the \gls{hmm} parameters by iteratively maximizing the likelihood of the observations. 
+Starting with an initial hypothesis $\textbf{x}_0 = (\pi, P, \omega)$, the algorithm produces a sequence of parameter estimates $\textbf{x}_1, \textbf{x}_2, \ldots$, where each new estimate improves upon the previous one. 
+The process terminates when the improvement in likelihood is sufficiently small, satisfying the convergence criterion:
+\[
+||l(\textbf{x}_n) - l(\textbf{x}_{n-1})|| < \epsilon
+\]
+where $l(\textbf{x}_n)$ is the likelihood of the observations under the parameters $\textbf{x}_n$, and $\epsilon > 0$ is a small threshold.
 
-The Baum-Welch algorithm is particularly useful for estimating the properbilities of the emission and transition matrices of a HMM, given a set of observations, without knowing the hidden states that generated the observations.
+The steps of the Baum-Welch algorithm are detailed in the following sections, including the initialization of the \gls{hmm} parameters, the forward-backward algorithm, and the update algorithm, see \autoref{subsec:initialization}, \ref{subsec:forward-backwards_algorithm}, and \ref{subsec:update-algorithm} respectively.
 
-Given a multiset of observations $\mathcal{O}$ and initial parameters $\textbf{x}_0$, the Baum-Welch algorithm estimates the parameters of a \gls{hmm} $\mathcal{P}$ by iteratively improving the current hypothesis $\textbf{x}_n$ using the previous estimate $\textbf{x}_{n-1}$ until a convergence criterion is met.
-A hypothesis refers to a specific set of values for the parameters $\mathbf{x}$.
+\subsection{Initialization of HMM Parameters}\label{subsec:initialization}
+Before starting the Baum-Welch algorithm, we need to initialize the model parameters: The emission matrix $\omega$, the transition matrix $P$, and the initial state distribution $\pi$.
 
-Each iteration of the algorithm produces a new hypothesis, denoted as $\textbf{x}_n$, which is the algorithm's current best guess for the parameter values based on the observed data.
-The algorithm consists of three main steps: the forward-backward procedure, the update step, and the convergence criterion.
-The Baum-Welch algorithm iteratively refines the parameters until the improvement between successive iterations falls below a predefined threshold.
-This is typically evaluated using a convergence criterion such as:
+Since the algorithm is designed to converge iteratively toward an optimal parameter set, the initial estimates do not need to be exact. However, reasonable initialization can accelerate convergence and improve numerical stability. A common approach to initialize these parameters is as follows:
+
+The initial state probabilities $\pi$ represent the likelihood of starting in each hidden state $s \in S$. 
+To estimate initial state probabilities, we can use one of the following strategies:
+\begin{itemize}
+    \item Examine the observed data: Estimate the initial probabilities based on the observed data. For example, if certain states are more likely to occur at the beginning of the sequence, assign higher probabilities to these states. 
+    $\pi_s = \frac{R_s}{\sum_{s' \in S} R_{s'}}$ where $R_s$ is the number of times state $s$ occurs at the beginning of the sequence.
+    \item Uniform initialization: Set $\pi_s = \frac{1}{|S|}$ for all $s \in S$.
+    \item Random initialization: Assign random probabilities to each state $s \in S$.
+\end{itemize}
 
-\begin{equation}
-    ||l(\textbf{x}_n) - l(\textbf{x}_{n-1})|| < \epsilon\label{eq:convergence-criterion}
-\end{equation}
 
-where $\epsilon > 0$ is a small threshold, and $l(\textbf{x}_n)$ denotes the likelihood of the observed data given the parameter values at the $n$-th iteration.
+The transition matrix $P$ represents the probability of transitioning from one state to another. 
+To estimate transition probabilities, we can use one of the following strategies:
+\begin{itemize}
+    \item Examine the observed data: Estimate the transition probabilities based on the observed data. For example, if certain states are more likely to transition to others, assign higher probabilities to these transitions. 
+    $P_{s \rightarrow s'} = \frac{C_{s \rightarrow s'}}{\sum_{s'' \in S} C_{s \rightarrow s''}}$ where $C_{s \rightarrow s'}$ is the number of transitions from state $s$ to state $s'$.
+    \item Random initialization: Assign random probabilities to each transition $P_{s \rightarrow s'}$.
+    \item Random walk initialization: Set $P_{s \rightarrow s'} = \frac{1}{|S|}$ for all $s, s' \in S$.
+\end{itemize}
 
-The algorithm stops when the change in parameters is sufficiently small, indicating that the model has converged to a local maximum of the likelihood function.
-The parameter estimation procedure is outlined in \autoref{alg:parameter-estimation}.
 
-\begin{algorithm}[htb!]
-    \begin{codebox}
-        \Procname{$\proc{Estimate-Parameters}(\mathcal{P}, \mathbf{x}_0, \mathcal{O})$}
-        \li $\mathbf{x} \gets \mathbf{x}0$
-        \li \While $\neg\proc{Criterion}(\mathbf{x}_{n-1}, \mathbf{x}_n)$
-        \li \Do $\mathbf{x}_{n - 1} \gets \mathbf{x}_n$
-        \li $(\alpha, \beta) = \proc{Forward-Backward}(\mathcal{P}(\mathbf{x}_n), \mathcal{O})$
-        \li $\mathbf{x}_n = \proc{Update}(\mathcal{P}(\mathbf{x}_n), \mathcal{O}, \alpha, \beta)$ \End
-        \li \Return $\mathbf{x}_n$
-    \end{codebox}
-    \caption{Parameter estimation procedure~\cite{p7}.}
-    \label{alg:parameter-estimation}
-\end{algorithm}
+The emission matrix $\omega$ represents the probability of emitting each observation label from each hidden state. 
+To estimate emission probabilities, we can use one of the following strategies:
+\begin{itemize}
+    \item Examine the observed data: Estimate the emission probabilities based on the observed data. For example, if certain states are more likely to emit certain labels, assign higher probabilities to these emissions. 
+    $\omega_{s, l} = \frac{C_{s, l}}{\sum_{l' \in \mathcal{L}} C_{s, l'}}$ where $C_{s, l}$ is the number of times state $s$ emits label $l$.
+    \item Random initialization: Assign random probabilities to each emission $\omega_{s, l}$.
+    \item Random walk initialization: Set $\omega_{s, l} = \frac{1}{|\mathcal{L}|}$ for all $s \in S, l \in \mathcal{L}$.
+\end{itemize}
 
-Starting with initial parameters $\mathbf{x}_0$, the parameter estimation procedure iteratively improves the current hypothesis $\mathbf{x}_n$ using the previous estimate $\mathbf{x}_{n-1}$ until a specified criterion for convergence is met, the algorithm returns the final estimate $\mathbf{x}_n$.
-The specifics of the $\proc{Forward-Backward}$ and $\proc{Update}$ procedures are detailed in \autoref{subsec:forward-backwards_algorithm} and \autoref{subsec:update-algorithm} from~\cite{baum1970maximization}.
+These initialization strategies provide a starting point for the Baum-Welch algorithm to iteratively refine the model parameters based on the observed data.
 
 \subsection{Forward-Backward Algorithm}\label{subsec:forward-backwards_algorithm}
 %Forward-Backward algorithm
@@ -143,10 +162,8 @@ \subsection{Forward-Backward Algorithm}\label{subsec:forward-backwards_algorithm
 %     \label{eq:beta-recursive}
 % \end{equation}
 
-
 The forward variable $\alpha_s(t)$ and backward variable $\beta_s(t)$ can be computed recursively as follows:
 
-
 \begin{equation}
     \alpha_s(t) =
     \begin{cases}
@@ -196,8 +213,9 @@ \subsubsection{Intermediate Variables}
 The denominator represents the likelihood of the observation sequence.
 
 \begin{equation}
-    \xi_{ss'}(t) = \frac{\alpha_s(t) P_{ss'} \omega_{s'}(t + 1) \beta_{s'}(t + 1)}{\sum_{s''}\alpha_{s''}(t) \beta_{s''}(t)}
-    %{\sum_{s'' \in S} \;(\sum_{s''' \in S} \; (\alpha_{s''}(t) \; P_{s''s'''} \; \omega_{s'''}(t + 1) \; \beta_{s'''}(t + 1)))}
+    \xi_{ss'}(t) = \frac{\alpha_s(t) P_{ss'} \omega_{s'}(t + 1) \beta_{s'}(t + 1)}
+    %{\sum_{s''}\alpha_{s''}(t) \beta_{s''}(t)}
+    {\sum_{s'' \in S} \;(\sum_{s''' \in S} \; (\alpha_{s''}(t) \; P_{s''s'''} \; \omega_{s'''}(t + 1) \; \beta_{s'''}(t + 1)))}
     \label{eq:xi}
 \end{equation}
 
@@ -283,14 +301,14 @@ \subsection{Matrix Operations}\label{subsec:matrix-operations}
 
 \begin{align}
     {\alpha}_t = \begin{bmatrix}
-                                \alpha_{s_0}(t)       \\
-                                \vdots                \\
-                                \alpha_{s_{|S|-1}}(t) \\
+        \alpha_{s_0}(t)       \\
+        \vdots                \\
+        \alpha_{s_{|S|-1}}(t) \\
     \end{bmatrix}, \;
     {\beta}_t = \begin{bmatrix}
-                               \beta_{s_0}(t)       \\
-                               \vdots               \\
-                               \beta_{s_{|S|-1}}(t) \\
+        \beta_{s_0}(t)       \\
+        \vdots               \\
+        \beta_{s_{|S|-1}}(t) \\
     \end{bmatrix}
 \end{align}
 
@@ -333,14 +351,14 @@ \subsection{Matrix Operations}\label{subsec:matrix-operations}
 
 \begin{align}
     {\gamma}_t = \begin{bmatrix}
-                                \gamma_{s_0}(t)       \\
-                                \vdots                \\
-                                \gamma_{s_{|S|-1}}(t) \\
+        \gamma_{s_0}(t)       \\
+        \vdots                \\
+        \gamma_{s_{|S|-1}}(t) \\
     \end{bmatrix}, \;
     {\xi}_t = \begin{bmatrix}
-                             \xi_{s_0 s_0}(t)      & \cdots & \xi_{s_0 s_{|S|-1}}(t)      \\
-                             \vdots                & \ddots & \vdots                      \\
-                             \xi_{s_{|S|-1}s_0}(t) & \cdots & \xi_{s_{|S|-1}s_{|S|-1}}(t) \\
+        \xi_{s_0 s_0}(t)      & \cdots & \xi_{s_0 s_{|S|-1}}(t)      \\
+        \vdots                & \ddots & \vdots                      \\
+        \xi_{s_{|S|-1}s_0}(t) & \cdots & \xi_{s_{|S|-1}s_{|S|-1}}(t) \\
     \end{bmatrix}
 \end{align}
 
diff --git a/report/src/sections/03-implementation.tex b/report/src/sections/03-implementation.tex
index cc52efd..4132861 100644
--- a/report/src/sections/03-implementation.tex
+++ b/report/src/sections/03-implementation.tex
@@ -3,24 +3,6 @@ \section{Implementation}\label{sec:implementation}
 We will start by discussing the tools used in the implementation, followed by the transition from matrices to \glspl{add}.
 Finally, we will discuss the implementation of the matrix operations using \glspl{add}.
 
-\subsection{CUDD}\label{subsec:cudd}
-The Colorado University Decision Diagram (CUDD) library~\cite{somenzi1997cudd} is a powerful tool for implementing and manipulating decision diagrams, including \glspl{bdd} and \glspl{add}. \glspl{add} are compact representations of functions, often used to handle large state spaces symbolically and efficiently.
-
-In this project, the CUDD library stores \glspl{add} and performs operations on them.
-Its optimized algorithms and efficient memory management allow us to handle large and complex matrices symbolically, leading to significant performance improvements over traditional methods.
-
-The CUDD library is implemented in C, ensuring high-performance execution, but it also ensures it can be used in C++ programs.
-
-\subsection{Storm}\label{subsec:storm}
-Storm is a versatile, open-source probabilistic model checking tool designed to verify the correctness and properties of stochastic models~\cite{hensel2021probabilistic}. It supports a wide range of probabilistic models, including \glspl{hmm},\glspl{mc} and \glspl{mdp}. Storm allows users to analyze models efficiently by computing various quantitative properties, such as probabilities, expected rewards, or long-run averages.
-
-A key feature of Storm is its ability to represent models symbolically, leveraging data structures like \glspl{bdd} and \glspl{add}. These symbolic representations compactly encode the model's state space and transition structure, enabling efficient manipulation and analysis even for large-scale systems. Storm achieves this by interfacing with the CUDD library, mentioned earlier.
-
-In our implementation, Storm serves as a parser for loading the input models. Specifically, we utilize Storm to convert the model into its \gls{add} representation. This \gls{add} representation provides a compact and hierarchical encoding of the underlying matrices, which can then be used to perform symbolic matrix operations using the CUDD library.
-
-The reason for using Storm lies in it is open-source, which makes it easy to integrate into our project. Storm is designed to handle large and complex models efficiently for model checking.
-Therefore the next step in Storm is to calculate the parameters of interest, such as transition probabilities, rewards, or other metrics derived from the model. By performing these computations symbolically within the ADD framework, we achieve a scalable and efficient approach to analyzing stochastic models.
-
 \subsection{Transition to ADDs}\label{subsec:transition-to-adds}
 The first step in the implementation is to transition from vectors and matrices to \glspl{add}.
 This conversion leverages the compact and efficient representation of \glspl{add} to perform operations symbolically.
@@ -96,6 +78,25 @@ \subsection{Transition to ADDs}\label{subsec:transition-to-adds}
     \label{fig:add_reduced}
 \end{figure}
 
+\subsection{CUDD}\label{subsec:cudd}
+The Colorado University Decision Diagram (CUDD) library~\cite{somenzi1997cudd} is a powerful tool for implementing and manipulating decision diagrams, including \glspl{bdd} and \glspl{add}. \glspl{add} are compact representations of functions, often used to handle large state spaces symbolically and efficiently.
+
+In this project, the CUDD library stores \glspl{add} and performs operations on them.
+Its optimized algorithms and efficient memory management allow us to handle large and complex matrices symbolically, leading to significant performance improvements over traditional methods.
+
+The CUDD library is implemented in C, ensuring high-performance execution, but it also ensures it can be used in C++ programs.
+
+\subsection{Storm}\label{subsec:storm}
+Storm is a versatile, open-source probabilistic model checking tool designed to verify the correctness and properties of stochastic models~\cite{hensel2021probabilistic}. It supports a wide range of probabilistic models, including \glspl{hmm},\glspl{mc} and \glspl{mdp}. Storm allows users to analyze models efficiently by computing various quantitative properties, such as probabilities, expected rewards, or long-run averages.
+
+A key feature of Storm is its ability to represent models symbolically, leveraging data structures like \glspl{bdd} and \glspl{add}. These symbolic representations compactly encode the model's state space and transition structure, enabling efficient manipulation and analysis even for large-scale systems. Storm achieves this by interfacing with the CUDD library, mentioned earlier.
+
+In our implementation, Storm serves as a parser for loading the input models. Specifically, we utilize Storm to convert the model into its \gls{add} representation. This \gls{add} representation provides a compact and hierarchical encoding of the underlying matrices, which can then be used to perform symbolic matrix operations using the CUDD library.
+
+The reason for using Storm lies in it is open-source, which makes it easy to integrate into our project. Storm is designed to handle large and complex models efficiently for model checking.
+Therefore the next step in Storm is to calculate the parameters of interest, such as transition probabilities, rewards, or other metrics derived from the model. By performing these computations symbolically within the ADD framework, we achieve a scalable and efficient approach to analyzing stochastic models.
+
+
 \subsection{Matrix operations using ADDs}\label{subsec:matrix-operations-using-adds}
 The matrix operations are implemented using \glspl{add}.
 The matrix operations implemented are matrix transpose, matrix addition, matrix multiplication, Hadamard product, Hadamard division, Kronecker product and Khatri-Rao product.
@@ -118,6 +119,9 @@ \subsection{Matrix operations using ADDs}\label{subsec:matrix-operations-using-a
 
 are used as examples in the following subsections.
 
+In CUDD the \textit{DdManager} is used to manage the \glspl{add} and the operations on them.
+The \textit{DdNode} is used to represent both the \glspl{add} and the variables in the \glspl{add}, that is the row and column indices in the \glspl{add}.
+
 \subsubsection{Matrix Transpose}
 The matrix transpose is implemented by swapping the row and column variables in the \gls{add}. Specifically, for each path in the \gls{add} representing an entry $(i, j)$, the roles of the row index 
 $i$ and column index $j$ are exchanged. The terminal nodes (values of the matrix entries) remain unchanged.
@@ -129,6 +133,8 @@ \subsubsection{Matrix Transpose}
     \end{bmatrix}
 \]
 
+In CUDD the function we use for transposing an \gls{add} is implemented as \textit{Cudd\_addSwapVariables(DdManager * dd, DdNode * f, DdNode ** x, DdNode ** y, int  n )}, where \textit{f} is the \gls{add} to be transposed, \textit{x} and \textit{y} are the set variables to be swapped and \textit{n} is the size of the variables to be swapped.
+
 \subsubsection{Matrix addition}
 Matrix addition is implemented by adding the terminal nodes of two \glspl{add} while keeping the structure of the row and column indices consistent. The process involves:
 \begin{enumerate}
@@ -144,6 +150,8 @@ \subsubsection{Matrix addition}
     \end{bmatrix}
 \]
 
+In CUDD the function we use for adding two \glspl{add} is implemented as \textit{Cudd\_addApply(DdManager * dd, Cudd\_addPlus(), DdNode * f, DdNode * g)}, where \textit{f} and \textit{g} are the two \glspl{add} to be added and \textit{Cudd\_addPlus()} is the function that is used to add the two \glspl{add}.
+
 \subsubsection{Matrix multiplication}
 Matrix multiplication is implemented symbolically using the dot product of the row and column indices. In the \gls{add}:
 \begin{enumerate}
@@ -159,6 +167,12 @@ \subsubsection{Matrix multiplication}
     \end{bmatrix}
 \]
 
+%\textit{Cudd\_addMatrixMultiply(DdManager dd, DdNode A, DdNode B, DdNode **z, int nz)}
+We use the function \textit{Cudd\_addMatrixMultiply(DdManager dd, DdNode A, DdNode B, DdNode **z, int nz)} in CUDD to multiply two \glspl{add}. 
+The function takes two \glspl{add} \textit{A} and \textit{B} to be multiplied as input and returns a pointer to the resulting \gls{add}.
+\textit{z} is the set of variables that are dependent on the columns in \textit{A} and the rows in \textit{B}.
+\textit{A} is assumed to have the same number of columns as \textit{B} has rows, namely \textit{nz}.
+
 \subsubsection{Hadamard product}\label{subsubsec:hadamard-product}
 The Hadamard product is implemented by pairwise multiplication of corresponding terminal nodes in the two \glspl{add}. For each matching row-column index pair $(i, j)$:
 \begin{enumerate}
@@ -175,6 +189,8 @@ \subsubsection{Hadamard product}\label{subsubsec:hadamard-product}
     \end{bmatrix}
 \]
 
+In CUDD the function we use for Hadamard product is implemented as \textit{Cudd\_addApply(DdManager * dd, Cudd\_addTimes(), DdNode * f, DdNode * g)}, where \textit{f} and \textit{g} are the two \glspl{add} to be multiplied and \textit{Cudd\_addTimes()} is the function that is used to multiply the two \glspl{add} elementwise.
+
 \subsubsection{Hadamard division}
 The Hadamard division is implemented as Hadamard product, but with division instead of multiplication. See~\autoref{subsubsec:hadamard-product} for more details.
 The Hadamard division of matrices $A$ and $B$ is
@@ -186,6 +202,8 @@ \subsubsection{Hadamard division}
     \end{bmatrix}
 \]
 
+The Hadamard division is implemented by \textit{Cudd\_addApply(DdManager * dd, Cudd\_addDivide(), DdNode * f, DdNode * g)}, where \textit{f} and \textit{g} are the two \glspl{add} to be divided and \textit{Cudd\_addDivide()} is the function that is used to divide the two \glspl{add} elementwise.
+
 \subsubsection{Kronecker product}
 The Kronecker product is implemented by expanding the indices and terminal nodes of the two matrices symbolically, with the resulting \gls{add} having the dimensions of the sum of the dimensions of the two matrices. 
 The Kronecker product is a generalization of the outer product, where each element of the first matrix is multiplied by the second matrix as a whole.