Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mistake: Cost New #8

Open
theovincent opened this issue Mar 20, 2022 · 0 comments
Open

Mistake: Cost New #8

theovincent opened this issue Mar 20, 2022 · 0 comments

Comments

@theovincent
Copy link

theovincent commented Mar 20, 2022

Hi, I am currently working on the paper that you published with this repository.

I am trying to add an example of your work in the galery of examples of ruptures. See the PR here.

I think I spot a mistake that would change all the results. I would like to know if what I think is true @YKatser.
Let's take the case of Window Ensemble. Going through the code I realized that the object CostNew() is only initialized once before going through the entire dataset. For example, in the notebook TEP_experiment_with_NAB_metric there is:

%%time
cost = CostNew()  # !!! Initialized only one !!!
table1 = []

for n in tnrange(1, NUM_CPDE, desc='agg functions loop'):
    for w in tqdm_notebook([10, 20, 30], desc='width loop', leave=False):
        table1.append(windowEnsemble(cost=cost, data=test, num_agg_func=n, width=w))
        clear_output()

This seems not to be a problem since you call the method cost.fit() for each new signal.
The main problem is coming from the following loop in CostNew().fit():

# Mahalanobis metric if self.metric is None
if self.metric is None:
    covar = np.cov(s_.T)
    self.metric = inv(
        covar.reshape(1, 1) if covar.size == 1 else covar)

This means that for the first signal you go through, self.metric will be none. It will be changed and fixed for all the other signal coming afterwards. It is a shame since you have shown that Mahalanobis is the best performing single cost. This is the reason why they have added the variable self.has_custom_metric in the package ruptures. This was a fix made by a PR.

Here is what I got by initializing CostNew() for each signal:
Screenshot from 2022-03-20 11-33-16

As a reminder, this is the former results that you have in the paper:
Screenshot from 2022-03-20 11-35-33

Here, you can see that the best performing aggregation function and scaling function have changed.

As I was curious to see what would be the results for the other experiments, I run the experiments again with this fix in a fork.

ps: I had a lot of fun working on your paper. Thanks a lot for your work and your interesting ideas :)

@theovincent theovincent changed the title Mistake on Cost New Mistake: Cost New Mar 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant