Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ppi_distribution_label_shift_ci Exception? #16

Open
kitkhai opened this issue Jul 30, 2024 · 2 comments
Open

ppi_distribution_label_shift_ci Exception? #16

kitkhai opened this issue Jul 30, 2024 · 2 comments

Comments

@kitkhai
Copy link

kitkhai commented Jul 30, 2024

Hi was playing around with the ppi_distribution_label_shift_ci function and was supplying dummy values when I encountered an exception. I'm not very sure if I defined the nu vector correctly as I'm not very sure what is it for and how to define it, would appreciate if you can clarify as well. Thank you!

import numpy as np
from ppi_py import ppi_distribution_label_shift_ci

# True labels
Y = np.array([0, 1, 0, 1, 0])

# Predicted labels for labeled data
Yhat = np.array([0, 1, 1, 1, 0])

# Predicted labels for unlabeled data
Yhat_unlabeled = np.array([0, 0, 1, 1, 1, 0, 1])

# Number of classes
K = 2

nu = np.array([0, 1])

# Calling the function
result = ppi_distribution_label_shift_ci(Y, Yhat, Yhat_unlabeled, K, nu)

ValueError Traceback (most recent call last)
in <cell line: 19>()
17
18 # Calling the function
---> 19 result = ppi_distribution_label_shift_ci(Y, Yhat, Yhat_unlabeled, K, nu)
20 print("Confidence Interval for class 1 probability:", result)

4 frames
/usr/local/lib/python3.10/dist-packages/ppi_py/ppi.py in ppi_distribution_label_shift_ci(Y, Yhat, Yhat_unlabeled, K, nu, alpha, delta, return_counts)
1206 budget_split = 0.999999
1207 epsilon1 = max(
-> 1208 [
1209 linfty_binom(C.sum(axis=0)[k], K, budget_split * delta, Ahat[:, k])
1210 for k in range(K)

/usr/local/lib/python3.10/dist-packages/ppi_py/ppi.py in (.0)
1207 epsilon1 = max(
1208 [
-> 1209 linfty_binom(C.sum(axis=0)[k], K, budget_split * delta, Ahat[:, k])
1210 for k in range(K)
1211 ]

/usr/local/lib/python3.10/dist-packages/ppi_py/utils/statistics_utils.py in linfty_binom(N, K, alpha, qhat)
111 epsilon = 0
112 for k in np.arange(K):
--> 113 bci = binomial_iid(N, alpha / K, qhat[k])
114 epsilon = np.maximum(epsilon, np.abs(bci - qhat[k]).max())
115 return epsilon

/usr/local/lib/python3.10/dist-packages/ppi_py/utils/statistics_utils.py in binomial_iid(N, alpha, muhat)
99 return binom.cdf(N * muhat, N, mu) - (1 - alpha / 2)
100
--> 101 u = brentq(invert_upper_tail, 0, 1)
102 l = brentq(invert_lower_tail, 0, 1)
103 return np.array([l, u])

/usr/local/lib/python3.10/dist-packages/scipy/optimize/_zeros_py.py in brentq(f, a, b, args, xtol, rtol, maxiter, full_output, disp)
804 raise ValueError(f"rtol too small ({rtol:g} < {_rtol:g})")
805 f = _wrap_nan_raise(f)
--> 806 r = _zeros._brentq(f, a, b, xtol, rtol, maxiter, args, full_output, disp)
807 return results_c(full_output, r, "brentq")
808

ValueError: f(a) and f(b) must have different signs

@aangelopoulos
Copy link
Owner

This is because $n$ is so small that it's causing a numerical exception in the solver. Can you try it with much larger $n$?

@kitkhai
Copy link
Author

kitkhai commented Jul 31, 2024

I tried with a larger N but still thrown the same error:

import numpy as np

# True labels
Y = np.array([1]*100000+[0]*100000)

# Predicted labels for labeled data
Yhat = np.array([1]*120000+[0]*80000)

# Predicted labels for unlabeled data
Yhat_unlabeled = np.array([1]*170000+[0]*30000)

# Number of classes
K = 2

nu = np.array([0, 1])

# Calling the function
result = ppi_distribution_label_shift_ci(Y, Yhat, Yhat_unlabeled, K, nu)
print("Confidence Interval for class 1 probability:", result)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants