[BUG] mypy errors due to missing value/variance via views #527

alexander-held · 2021-03-11T17:31:50Z

Describe the bug

When using .view() to access bin values / variances (as described in #421 (comment)), mypy flags this code with an error in boost-histogram 1.0.0.

An example is below:

test.py:

import boost_histogram as bh

h = bh.Histogram(bh.axis.Variable([0, 10]), storage=bh.storage.Weight())
h.fill([1,2,3])
h.view().value
h.view().variance

Output from mypy:

test.py:5: error: Item "ndarray" of "Union[ndarray, View]" has no attribute "value"  [union-attr]
    h.view().value
    ^
test.py:5: error: Item "View" of "Union[ndarray, View]" has no attribute "value"  [union-attr]
    h.view().value
    ^
test.py:6: error: Item "ndarray" of "Union[ndarray, View]" has no attribute "variance"  [union-attr]
    h.view().variance
    ^
test.py:6: error: Item "View" of "Union[ndarray, View]" has no attribute "variance"  [union-attr]
    h.view().variance
    ^
Found 4 errors in 1 file (checked 1 source file)

This may not be a bug and instead the result of improper usage. In this case, is there another way to access this information directly?

Steps to reproduce
see above, using mypy==0.812, boost-histogram==1.0.0

The text was updated successfully, but these errors were encountered:

henryiii · 2021-03-11T18:24:33Z

Three things:

The simplest way to get what you want is with h.values() and h.variances(). This works on all storages.
There is a typing bug; I should have made the return type Union[np.ndarray, WeightView, etc, since each of the view subclasses adds the interesting methods, they are not present in View.
This will still be a little inconvenient, as MyPy won't know which one might be returned. This could be fixed by making Histogram Generic. I considered this for 1.0, but ended up not attempting it. If it could be done in a nice, backward compatible way (ie, no failure if Histogram instead of Histogram[Double] was used), and especially if Histogram[Weight] could make storage default to Weight() instead of Double(), that would be very useful, I think (though pretty sure that would require Python 3.7+ to be implemented). So maybe an idea for 1.1 (if simple and backward compatible and Python 3.6 compatible) or 2.0 (if it requires dropping 3.6).

alexander-held · 2021-03-12T14:10:44Z

Thank you! h.values() and h.variances() work fine for reading, but not for writing:

import boost_histogram as bh
import numpy as np

hist = bh.Histogram(bh.axis.Variable([0, 10]), storage=bh.storage.Weight())
hist.view().value = np.asarray([5])  # works
hist.values() = np.asarray([3])  # does not work

I rely on overriding values, which is why I use the approach via .view() in a few places.

I found another type related issue (this is probably very related to point 3):

np.sqrt(hist.variances())

results in

test.py:8: error: Argument 1 to "__call__" of "ufunc" has incompatible type "Optional[ndarray]";
expected
"Union[Union[int, float, complex, str, bytes, generic], Sequence[Union[int, float, complex, str, bytes, generic]], Sequence[Sequence[Any]], _SupportsArray]"
 [arg-type]
    np.sqrt(hist.variances())
            ^
Found 1 error in 1 file (checked 1 source file)

I assume the issue is that mypy cannot know that .variances() exists for this weighted histogram. The sqrt works fine with .values() which has type numpy.ndarray. I can go with # type: ignore here, or is there another way to approach this?

henryiii · 2021-03-12T15:23:01Z

If we made Histogram Generic, then that would fix your issue - A weighted histogram would have np.ndarray instead of Optional[np.ndarray] here (the Optional is what "normal" histograms like Double / Int64 would have, it is based on how they were filled).

If you don't like the type: ignore, you can use type narrowing:

variances = hist.variances()
assert variances is not None
np.sqrt(variances)  # passes now, mypy knows it can't be None

Or, a simpler, less correct fix that would only work in Python 3.8+:

np.sqrt(v) if (v := hist.variances()) is not None else 0

MyPy is (correctly) asserting that you aren't handling the None case. The best fix would be to add the storage type to a Generic, so that it could learn that based on the storage.

henryiii · 2021-03-12T15:34:21Z

Thank you! h.values() and h.variances() work fine for reading, but not for writing

If #504 was implemented, this would actually work using h.values =. But in leu of that, this should work:

hist = bh.Histogram(bh.axis.Variable([0, 10]), storage=bh.storage.Weight())
hist.values()[...] = np.asarray([3])

The problem is the Python syntax, you can't assign to a function call. #504 fixes this by allowing you not to call it if you don't want to, but instead assigning directly to the FlowArray (and you could even assign to flow bins that way simply by including them in the shape, like you can do now for Histograms). It would also give us more control on whether something is settable; currently, setting the variance on a storage that computes the variance (like the Mean storages) will just silently continue, but not actually set anything. I could partially improve that by making the np.ndarray returned non-modifiable, but then weird things like v = h.variance(); v+=1 would fail. If we implemented #504, I could reverse this, which I think would be the correct usage. The "rendered" view would not be settable if it was pointing at boost-histogram memory; it would only be settable as a FlowArray property. This would be less surprising in the long run, and would give the user full control over making a copy only if they wanted to.

The "best" way to set anything, though, is to set everything at once as a stacked array. Usually you want to set the view's fields together anyway (chaining value and variance together), and you can do that directly:

h[...] = np.array(...)

The array should match the slice you give of the histogram (optionally +flow bins) in the first N dimensions, and should be N+1D, with the final dimension being 2 (value, variance). This works for the Mean storages too, just with 3 or 4 values as required.

henryiii · 2021-03-12T15:54:23Z

Current develop should be better, can you try that?

alexander-held · 2021-03-12T16:33:09Z

I updated to develop, the error message has changed but mypy cannot find .value and .variance for the views:

import boost_histogram as bh

h = bh.Histogram(bh.axis.Variable([0, 10]), storage=bh.storage.Weight())
h.fill([1,2,3])
h.view().value
h.view().variance

output:

test.py:5: error: Item "ndarray" of "Union[ndarray, WeightedSumView, WeightedMeanView, MeanView]" has no attribute "value"  [union-attr]
    h.view().value
    ^
test.py:6: error: Item "ndarray" of "Union[ndarray, WeightedSumView, WeightedMeanView, MeanView]" has no attribute "variance"  [union-attr]
    h.view().variance
    ^
Found 2 errors in 1 file (checked 1 source file)

henryiii · 2021-03-12T17:08:29Z

That's right - it doesn't know which item in the Union you have. So you could use type narrowing:

view = h.view()
assert isinstance(view, WeightedSumView)
view.value
view.variance

The error is now the correct one, and could be made more convenient by making Histogram Generic.

alexander-held · 2021-03-12T17:10:39Z

I see, this makes sense then.

henryiii mentioned this issue Mar 12, 2021

fix(types): better typing for view returns #530

Merged

henryiii closed this as completed Mar 12, 2021

alexander-held mentioned this issue Mar 16, 2021

feat: boost-histogram 1.0 compatibility and typing updates scikit-hep/cabinetry#204

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] mypy errors due to missing value/variance via views #527

[BUG] mypy errors due to missing value/variance via views #527

alexander-held commented Mar 11, 2021

henryiii commented Mar 11, 2021 •

edited

Loading

alexander-held commented Mar 12, 2021 •

edited

Loading

henryiii commented Mar 12, 2021

henryiii commented Mar 12, 2021 •

edited

Loading

henryiii commented Mar 12, 2021

alexander-held commented Mar 12, 2021

henryiii commented Mar 12, 2021

alexander-held commented Mar 12, 2021

[BUG] mypy errors due to missing value/variance via views #527

[BUG] mypy errors due to missing value/variance via views #527

Comments

alexander-held commented Mar 11, 2021

henryiii commented Mar 11, 2021 • edited Loading

alexander-held commented Mar 12, 2021 • edited Loading

henryiii commented Mar 12, 2021

henryiii commented Mar 12, 2021 • edited Loading

henryiii commented Mar 12, 2021

alexander-held commented Mar 12, 2021

henryiii commented Mar 12, 2021

alexander-held commented Mar 12, 2021

henryiii commented Mar 11, 2021 •

edited

Loading

alexander-held commented Mar 12, 2021 •

edited

Loading

henryiii commented Mar 12, 2021 •

edited

Loading