-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance stopping criterion options and handle infinite log-likelihoods #103
Enhance stopping criterion options and handle infinite log-likelihoods #103
Conversation
Hi, thank you for these contributions! I have a few follow up questions. Don't be discouraged by my skepticism, I'm really excited for someone to use my package 😍 Do we need this?
Can you explain why it is important in your use case? Do you face a scenario where the loglikelihood goes down?
Do you have a reproducible example of code breaking in that case? I think How do we do this if we need it?
That's kind of a weird name for a stopping criterion, either way we're checking convergence. Maybe we could use
This logic cannot be merged in the current state, because it is located in one of the central functions of the package ( findall(i -> i < -log(-nextfloat(-Inf)), logb) creates a new array of indices, which is very inefficient. If we are to change anything in this function (and I'm really doubtful whether we should), it has to be inside the loop. |
Dear Guillaume,
Regarding the stopping criterion:
- In my use case, I printed the log-likelihood at each iteration to better
understand what was happening. Interestingly, it initially decreased before
shooting up significantly after a few iterations, likely due to
initialization effects. This shift ultimately led to much better results. I
also completely agree with your naming suggestion.
As for the log-likelihood case:
- I encountered a state where all values were identical (and equal to 0),
which resulted in a normal distribution with zero variance. When
calculating the log-likelihood of a zero observation (for that state), I
got an Inf value, and for non-zero observations, it returned -Inf. These
extremes caused NaN values in the forward calculations, as there was an
attempt to invert a zero value. Thanks to these changes, I was able to
successfully run my use case.
Regarding the code efficiency:
- I agree that making it inside the loop would be better.
André
…On Thu, Sep 19, 2024 at 2:41 PM Guillaume Dalle ***@***.***> wrote:
Hi, thank you for these contributions! I have a few follow up questions.
Don't be discouraged by my skepticism, I'm really excited for someone to
use my package 😍
*Do we need this?*
:stability: introduces a new criterion where the algorithm stops when the
log-likelihood stops changing within a given margin (this is important in
my use case).
Can you explain why it is important in your use case? Do you face a
scenario where the loglikelihood goes down?
To address a bug where infinite log-likelihood values would break the code
Do you have a reproducible example of code breaking in that case? I think
-Inf values for the loglikelihood are perfectly fine. And if you
encounter +Inf it is possibly an issue with your model and not with my
library. I'd happily take a look if you can share an MWE.
*How do we do this if we need it?*
:convergence: maintains the original behavior of stopping when the
log-likelihood stops increasing.
That's kind of a weird name for a stopping criterion, either way we're
checking convergence. Maybe we could use :max_increase vs :max_variation?
I added logic to handle these cases by replacing infinite values with the
log of the largest representable float number.
This logic cannot be merged in the current state, because it is located in
one of the central functions of the package (obs_logdensities!), and this
function must remain non-allocating. At the moment the tests will probably
fail because
findall(i -> i < -log(-nextfloat(-Inf)), logb)
creates a new array of indices, which is very inefficient. If we are to
change anything in this function (and I'm really doubtful whether we
should), it has to be inside the loop.
—
Reply to this email directly, view it on GitHub
<#103 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AZQ4SVLNLPPKHEYWXGMACITZXMLGBAVCNFSM6AAAAABOQPKRKCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNRRHEYTIMZVHA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Okay, this seems to confirm that we actually don't need the modifications you suggest. Stopping criterion
This should never happen, at least in standard cases. The Baum-Welch algorithm is a version of the EM algorithm, which is guaranteed to increase the loglikelihood (up to floating point errors). The option to check that this increase happens is actually a really useful debugging tool, and should basically never be turned off.
Also you don't need to do that, Baum-Welch returns the vector of loglikelihood values at the end. Infinite loglikelihood
Indeed, zero-variance Gaussians are one of the cases where you may encounter a loglikelihood of |
My use case is one in which the transition matrices vary over time (and has
a large number of observations). Maybe that is why the guarantee that the
loglikelihood would increase does not hold. The difference is really
relevant between iterations (approximately from [9000, 4000, 2000, 5000,
17000, 29000 etc]). Unfortunately I cannot share my example right now, but
I can create an artificial case to show you.
The challenge arises because we actually want the model to estimate a state
with zero values. So, even if I didn't need to run the Baum-Welch algorithm
and already knew the transition matrices and distributions, the forward
algorithm would still fail under these conditions.
…On Thu, Sep 19, 2024 at 3:14 PM Guillaume Dalle ***@***.***> wrote:
Okay, this seems to confirm that we actually don't need the modifications
you suggest.
*Stopping criterion*
Interestingly, it initially decreased before shooting up significantly
after a few iterations, likely due to initialization effects.
This should never happen, at least in standard cases. The Baum-Welch
algorithm is a version of the EM algorithm, which is *guaranteed* to
increase the loglikelihood (up to floating point errors). The option to
check that this increase happens is actually a really useful debugging
tool, and should basically never be turned off.
If you provide me with a Minimum Working Example, I might be able to help
you debug. Was it a model where you implemented a custom fitting step?
In my use case, I printed the log-likelihood at each iteration to better
understand what was happening.
Also you don't need to do that, Baum-Welch returns the vector of
loglikelihood values at the end.
*Infinite loglikelihood*
I encountered a state where all values were identical (and equal to 0),
which resulted in a normal distribution with zero variance.
Indeed, zero-variance Gaussians are one of the cases where you may
encounter a loglikelihood of +Inf, I struggled with that one too. While
it may sound appealing, truncating the computed loglikelihood will only
mask the symptoms of the problem. It won't solve the underlying issue,
namely the degenerate emission distribution, which steps from the model
and/or the data. Please check out the debugging section
<https://gdalle.github.io/HiddenMarkovModels.jl/stable/debugging/#Numerical-underflow>
of the docs, where I give a few tips to solve it.
—
Reply to this email directly, view it on GitHub
<#103 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AZQ4SVMMJQXDGVYURTRA743ZXMPABAVCNFSM6AAAAABOQPKRKCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNRRHE4TIMJSG4>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Even with time-dependent matrices the loglikelihood should still increase. I'd love it if you could show me an artificial example to debug it together.
By zero values I'm assuming you mean the variances. If that is the case, I guess you should use a different distribution object (like a Dirac mass) to accurately describe these degenerate emissions. |
Can we schedule a meeting so that I can show you the problem? Maybe
tomorrow 10am(GMT-4)
…On Thu, Sep 19, 2024 at 3:25 PM André Ramos ***@***.***> wrote:
My use case is one in which the transition matrices vary over time (and
has a large number of observations). Maybe that is why the guarantee that
the loglikelihood would increase does not hold. The difference is really
relevant between iterations (approximately from [9000, 4000, 2000, 5000,
17000, 29000 etc]). Unfortunately I cannot share my example right now, but
I can create an artificial case to show you.
The challenge arises because we actually want the model to estimate a
state with zero values. So, even if I didn't need to run the Baum-Welch
algorithm and already knew the transition matrices and distributions, the
forward algorithm would still fail under these conditions.
On Thu, Sep 19, 2024 at 3:14 PM Guillaume Dalle ***@***.***>
wrote:
> Okay, this seems to confirm that we actually don't need the modifications
> you suggest.
>
> *Stopping criterion*
>
> Interestingly, it initially decreased before shooting up significantly
> after a few iterations, likely due to initialization effects.
>
> This should never happen, at least in standard cases. The Baum-Welch
> algorithm is a version of the EM algorithm, which is *guaranteed* to
> increase the loglikelihood (up to floating point errors). The option to
> check that this increase happens is actually a really useful debugging
> tool, and should basically never be turned off.
> If you provide me with a Minimum Working Example, I might be able to help
> you debug. Was it a model where you implemented a custom fitting step?
>
> In my use case, I printed the log-likelihood at each iteration to better
> understand what was happening.
>
> Also you don't need to do that, Baum-Welch returns the vector of
> loglikelihood values at the end.
>
> *Infinite loglikelihood*
>
> I encountered a state where all values were identical (and equal to 0),
> which resulted in a normal distribution with zero variance.
>
> Indeed, zero-variance Gaussians are one of the cases where you may
> encounter a loglikelihood of +Inf, I struggled with that one too. While
> it may sound appealing, truncating the computed loglikelihood will only
> mask the symptoms of the problem. It won't solve the underlying issue,
> namely the degenerate emission distribution, which steps from the model
> and/or the data. Please check out the debugging section
> <https://gdalle.github.io/HiddenMarkovModels.jl/stable/debugging/#Numerical-underflow>
> of the docs, where I give a few tips to solve it.
>
> —
> Reply to this email directly, view it on GitHub
> <#103 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AZQ4SVMMJQXDGVYURTRA743ZXMPABAVCNFSM6AAAAABOQPKRKCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNRRHE4TIMJSG4>
> .
> You are receiving this because you authored the thread.Message ID:
> ***@***.***>
>
|
10:30 AM on GMT-4 (NYC time) is perfect for me. Can you send me an invite on my email, found here? |
Perfect! Just sent the invitation.
…On Thu, Sep 19, 2024 at 3:47 PM Guillaume Dalle ***@***.***> wrote:
10:30 AM on GMT-4 (NYC time) is perfect for me. Can you send me an invite
on my email, found here <https://gdalle.github.io/>?
—
Reply to this email directly, view it on GitHub
<#103 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AZQ4SVLHJYYWSKYVSH3QFMLZXMS3PAVCNFSM6AAAAABOQPKRKCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNRSGA2DOOJZHE>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Hello Guillaume,
Did you receive the invitation on the correct email? If not, this is the
link of the meeting: meet.google.com/nux-eyym-szk
…On Thu, Sep 19, 2024 at 3:52 PM André Ramos ***@***.***> wrote:
Perfect! Just sent the invitation.
On Thu, Sep 19, 2024 at 3:47 PM Guillaume Dalle ***@***.***>
wrote:
> 10:30 AM on GMT-4 (NYC time) is perfect for me. Can you send me an invite
> on my email, found here <https://gdalle.github.io/>?
>
> —
> Reply to this email directly, view it on GitHub
> <#103 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AZQ4SVLHJYYWSKYVSH3QFMLZXMS3PAVCNFSM6AAAAABOQPKRKCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNRSGA2DOOJZHE>
> .
> You are receiving this because you authored the thread.Message ID:
> ***@***.***>
>
|
Customizable Stopping Criterion:
Previously, the package only allowed stopping when the log-likelihood stops increasing. I've introduced a new argument stopping_criterion that accepts two symbols:
:convergence: maintains the original behavior of stopping when the log-likelihood stops increasing.
:stability: introduces a new criterion where the algorithm stops when the log-likelihood stops changing within a given margin (this is important in my use case).
Handling Infinite Log-Likelihoods:
To address a bug where infinite log-likelihood values would break the code, I added logic to handle these cases by replacing infinite values with the log of the largest representable float number. This prevents the process from crashing while still allowing it to continue with large values.