Remove sorted_highlow from form_discrete_distribution call in ppi_distribution_label_shift_ci. #15

justinkay · 2024-07-25T18:07:34Z

Addresses #14.

…tribution_label_shift_ci.

aangelopoulos · 2024-07-26T22:27:29Z

Just checking - have you tested this for the plankton example, and also on an example where the classes are not pre-sorted?

justinkay · 2024-07-31T16:48:32Z

I've checked this on the plankton example as well as a 'reversed' version of the planton example where I swap the frequencies of the 0 and 1 classes:

Y = ~Y
Yhat = ~Yhat
Y_unlabeled = ~Y_unlabeled
Yhat_unlabeled = ~Yhat_unlabeled

The resulting plots are:

Original plankton problem, no code change (sorted_highlow=True)

Original plankton problem, with proposed code change (sorted_highlow=False)

So, the original results are unaffected by the change in this PR.

Reversed distribution, no code change (sorted_highlow=True)

This estimate fails given the unchanged code.

Reversed distribution, with proposed code change (sorted_highlow=False)

The change in this PR results in a reasonable estimate.

However I have not tested this for other cases, for example where the number of classes is > 2. I want to make sure I understand the original intention of sorting the discrete distribution by class frequency to make sure this doesn't break something else in the future. Do you remember why this was originally done? Are there some other tests you can think of that I should try? Happy to add to the pytests as appropriate.

aangelopoulos · 2024-07-31T22:25:14Z

I think it's probably a relic from the way I used to run this experiment. It was in a notebook before it was in ppi_py, and in that notebook, there was some code for truncating the matrix to the first [K:,K:] submatrix or something (these are the K most common classes).

I don't see any way this could go wrong. If you could put a little synthetic data example as a pytest with more than 2 classes, go for it. Then once we confirm that it works, I'll test it too, and we'll merge.

You're a hero @justinkay thank you <3

aangelopoulos · 2024-09-23T01:52:37Z

Hey @justinkay --- sorry I lost track of this --- do you want to write these tests, or should I write them and merge?

justinkay · 2024-09-23T11:25:58Z

Hey @aangelopoulos sorry for the delay -- this keeps slipping down my to-do list. Happy to write a couple tests for good measure but probably won't get to it til after ECCV.

aangelopoulos · 2024-10-27T23:32:22Z

Merged! :)

aangelopoulos · 2024-10-27T23:32:34Z

Thanks for all your help @justinkay !

Remove sorted_highlow from form_discrete_distribution call in ppi_dis…

9325acf

…tribution_label_shift_ci.

[added a test]

da5970a

aangelopoulos merged commit cb245e0 into aangelopoulos:main Oct 27, 2024
1 check failed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove sorted_highlow from form_discrete_distribution call in ppi_distribution_label_shift_ci. #15

Remove sorted_highlow from form_discrete_distribution call in ppi_distribution_label_shift_ci. #15

justinkay commented Jul 25, 2024

aangelopoulos commented Jul 26, 2024

justinkay commented Jul 31, 2024

aangelopoulos commented Jul 31, 2024

aangelopoulos commented Sep 23, 2024 •

edited

Loading

justinkay commented Sep 23, 2024

aangelopoulos commented Oct 27, 2024

aangelopoulos commented Oct 27, 2024

Remove sorted_highlow from form_discrete_distribution call in ppi_distribution_label_shift_ci. #15

Remove sorted_highlow from form_discrete_distribution call in ppi_distribution_label_shift_ci. #15

Conversation

justinkay commented Jul 25, 2024

aangelopoulos commented Jul 26, 2024

justinkay commented Jul 31, 2024

aangelopoulos commented Jul 31, 2024

aangelopoulos commented Sep 23, 2024 • edited Loading

justinkay commented Sep 23, 2024

aangelopoulos commented Oct 27, 2024

aangelopoulos commented Oct 27, 2024

aangelopoulos commented Sep 23, 2024 •

edited

Loading