Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] mypy errors due to missing value/variance via views #527

Closed
alexander-held opened this issue Mar 11, 2021 · 8 comments
Closed

[BUG] mypy errors due to missing value/variance via views #527

alexander-held opened this issue Mar 11, 2021 · 8 comments

Comments

@alexander-held
Copy link
Member

Describe the bug

When using .view() to access bin values / variances (as described in #421 (comment)), mypy flags this code with an error in boost-histogram 1.0.0.

An example is below:

test.py:

import boost_histogram as bh

h = bh.Histogram(bh.axis.Variable([0, 10]), storage=bh.storage.Weight())
h.fill([1,2,3])
h.view().value
h.view().variance

Output from mypy:

test.py:5: error: Item "ndarray" of "Union[ndarray, View]" has no attribute "value"  [union-attr]
    h.view().value
    ^
test.py:5: error: Item "View" of "Union[ndarray, View]" has no attribute "value"  [union-attr]
    h.view().value
    ^
test.py:6: error: Item "ndarray" of "Union[ndarray, View]" has no attribute "variance"  [union-attr]
    h.view().variance
    ^
test.py:6: error: Item "View" of "Union[ndarray, View]" has no attribute "variance"  [union-attr]
    h.view().variance
    ^
Found 4 errors in 1 file (checked 1 source file)

This may not be a bug and instead the result of improper usage. In this case, is there another way to access this information directly?

Steps to reproduce
see above, using mypy==0.812, boost-histogram==1.0.0

@henryiii
Copy link
Member

henryiii commented Mar 11, 2021

Three things:

  1. The simplest way to get what you want is with h.values() and h.variances(). This works on all storages.
  2. There is a typing bug; I should have made the return type Union[np.ndarray, WeightView, etc, since each of the view subclasses adds the interesting methods, they are not present in View.
  3. This will still be a little inconvenient, as MyPy won't know which one might be returned. This could be fixed by making Histogram Generic. I considered this for 1.0, but ended up not attempting it. If it could be done in a nice, backward compatible way (ie, no failure if Histogram instead of Histogram[Double] was used), and especially if Histogram[Weight] could make storage default to Weight() instead of Double(), that would be very useful, I think (though pretty sure that would require Python 3.7+ to be implemented). So maybe an idea for 1.1 (if simple and backward compatible and Python 3.6 compatible) or 2.0 (if it requires dropping 3.6).

@alexander-held
Copy link
Member Author

alexander-held commented Mar 12, 2021

Thank you! h.values() and h.variances() work fine for reading, but not for writing:

import boost_histogram as bh
import numpy as np

hist = bh.Histogram(bh.axis.Variable([0, 10]), storage=bh.storage.Weight())
hist.view().value = np.asarray([5])  # works
hist.values() = np.asarray([3])  # does not work

I rely on overriding values, which is why I use the approach via .view() in a few places.

I found another type related issue (this is probably very related to point 3):

np.sqrt(hist.variances())

results in

test.py:8: error: Argument 1 to "__call__" of "ufunc" has incompatible type "Optional[ndarray]";
expected
"Union[Union[int, float, complex, str, bytes, generic], Sequence[Union[int, float, complex, str, bytes, generic]], Sequence[Sequence[Any]], _SupportsArray]"
 [arg-type]
    np.sqrt(hist.variances())
            ^
Found 1 error in 1 file (checked 1 source file)

I assume the issue is that mypy cannot know that .variances() exists for this weighted histogram. The sqrt works fine with .values() which has type numpy.ndarray. I can go with # type: ignore here, or is there another way to approach this?

@henryiii
Copy link
Member

If we made Histogram Generic, then that would fix your issue - A weighted histogram would have np.ndarray instead of Optional[np.ndarray] here (the Optional is what "normal" histograms like Double / Int64 would have, it is based on how they were filled).

If you don't like the type: ignore, you can use type narrowing:

variances = hist.variances()
assert variances is not None
np.sqrt(variances)  # passes now, mypy knows it can't be None

Or, a simpler, less correct fix that would only work in Python 3.8+:

np.sqrt(v) if (v := hist.variances()) is not None else 0

MyPy is (correctly) asserting that you aren't handling the None case. The best fix would be to add the storage type to a Generic, so that it could learn that based on the storage.

@henryiii
Copy link
Member

henryiii commented Mar 12, 2021

Thank you! h.values() and h.variances() work fine for reading, but not for writing

If #504 was implemented, this would actually work using h.values =. But in leu of that, this should work:

hist = bh.Histogram(bh.axis.Variable([0, 10]), storage=bh.storage.Weight())
hist.values()[...] = np.asarray([3])

The problem is the Python syntax, you can't assign to a function call. #504 fixes this by allowing you not to call it if you don't want to, but instead assigning directly to the FlowArray (and you could even assign to flow bins that way simply by including them in the shape, like you can do now for Histograms). It would also give us more control on whether something is settable; currently, setting the variance on a storage that computes the variance (like the Mean storages) will just silently continue, but not actually set anything. I could partially improve that by making the np.ndarray returned non-modifiable, but then weird things like v = h.variance(); v+=1 would fail. If we implemented #504, I could reverse this, which I think would be the correct usage. The "rendered" view would not be settable if it was pointing at boost-histogram memory; it would only be settable as a FlowArray property. This would be less surprising in the long run, and would give the user full control over making a copy only if they wanted to.

The "best" way to set anything, though, is to set everything at once as a stacked array. Usually you want to set the view's fields together anyway (chaining value and variance together), and you can do that directly:

h[...] = np.array(...)

The array should match the slice you give of the histogram (optionally +flow bins) in the first N dimensions, and should be N+1D, with the final dimension being 2 (value, variance). This works for the Mean storages too, just with 3 or 4 values as required.

@henryiii
Copy link
Member

Current develop should be better, can you try that?

@alexander-held
Copy link
Member Author

I updated to develop, the error message has changed but mypy cannot find .value and .variance for the views:

import boost_histogram as bh

h = bh.Histogram(bh.axis.Variable([0, 10]), storage=bh.storage.Weight())
h.fill([1,2,3])
h.view().value
h.view().variance

output:

test.py:5: error: Item "ndarray" of "Union[ndarray, WeightedSumView, WeightedMeanView, MeanView]" has no attribute "value"  [union-attr]
    h.view().value
    ^
test.py:6: error: Item "ndarray" of "Union[ndarray, WeightedSumView, WeightedMeanView, MeanView]" has no attribute "variance"  [union-attr]
    h.view().variance
    ^
Found 2 errors in 1 file (checked 1 source file)

@henryiii
Copy link
Member

That's right - it doesn't know which item in the Union you have. So you could use type narrowing:

view = h.view()
assert isinstance(view, WeightedSumView)
view.value
view.variance

The error is now the correct one, and could be made more convenient by making Histogram Generic.

@alexander-held
Copy link
Member Author

I see, this makes sense then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants