Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify feature bounds using np.nan instead of masking #283

Merged
merged 13 commits into from
May 13, 2024

Conversation

jdtsmith
Copy link
Contributor

@jdtsmith jdtsmith commented May 9, 2024

Partially masked arrays are not really well supported, and required elaborate workarounds for pretty printing. Here we switch to using a structured array type (val, min, max fields), and represent "fixed" bounds with a pair of np.nan values for min and max, instead of masking them. I.e. nan is interpreted as the same as negative or positive infinity, unless both bounds are nan, which indicates a fixed parameter.

"Fully open" (unconstrained) bounds are expressed just as before, with (-np.inf, np.inf). But in general: don't do those ;).

Partially masked arrays are not well supported, and required elaborate
workaround for pretty printing.  Here we use a structured array type,
and represent "fixed" bounds with np.nan instead of masking.  I.e. non
is interpreted as the same as negative or positive infinity, unless
both bound are nan, which indicates a fixed parameter.
@jdtsmith jdtsmith changed the base branch from master to dev May 9, 2024 13:08
jdtsmith added 4 commits May 9, 2024 09:14
A normal masked column using structured numpy arrays and nan for both
bounds when fixed now suffices.
Copy link
Contributor

@drvdputt drvdputt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fully support the main concept of this pull request. The ('val' 'min' 'max') datatype feels nice and explicit. Much better than the cryptic 3-tuples from before (where you always had to index on 0, 1, or 2 to addres the needed value)

There are some mistakes, but I fixed them myself to see how the code would continue.

I got it working up to where the Features table is loaded. But the new data type and way of addressing the values, makes it so the code in base.py no longer works. I /could/ try to fix the base.py code. But would very much like to avoid this, as we will remove it soon...

pahfit/features/features.py Outdated Show resolved Hide resolved
pahfit/features/features.py Outdated Show resolved Hide resolved
pahfit/features/features.py Show resolved Hide resolved
pahfit/features/features.py Outdated Show resolved Hide resolved
@jdtsmith
Copy link
Contributor Author

jdtsmith commented May 9, 2024

While working on the representation, I also stamped out the dtype printing on __repr__ (which is oddly on by default), and allow now a meta['pahfit_format'] on the table and/or individual columns, to affect the printed representation. E.g.

f.meta['pahfit_format']='.2f'
f['wavelength'].meta['pahfit_format']='.3f'

yielding:

...
    H2_S(7)     H2_lines           line              --             --  5.511 (fixed) 0.00 (0.00, ∞)              --              --       --
    H2_S(6)     H2_lines           line              --             --  6.109 (fixed) 0.00 (0.00, ∞)              --              --       --
    H2_S(5)     H2_lines           line              --             --  6.909 (fixed) 0.00 (0.00, ∞)              --              --       --
    H2_S(4)     H2_lines           line              --             --  8.026 (fixed) 0.00 (0.00, ∞)              --              --       --
...

To be documented somewhere.

@drvdputt
Copy link
Contributor

drvdputt commented May 9, 2024

It looks like all of this is working as intended. At least until the actual until the Features -> astropy model part.

We cannot really make all the tests succeed for this, without revising all the pieces of code that directly address the Features table (I tried to make things work for a bit, but there are more of these than I expected). Especially, I cannot justify revising all of base.py just to make the tests work, since I aim to delete it soon, when replaced by APFitter.

If we merge this, then we will have to continue in this broken state until I have integrated and adapted APFitter.

I suggest we merge the units pull request first (the extended version I just suggested, #284 ). In that case, the tests are still working, at least on my machine. And the plot rework #281 too.

After that, we can do the big break: merge this, and then include my internal API rework (which will be easier to adapt to the new Features format).

@jdtsmith
Copy link
Contributor Author

jdtsmith commented May 9, 2024

This sounds good. What type of updates to APFitter do you mean? Features is still accessible in the same way, and in fact p[0] and p['val'] both work.

@drvdputt
Copy link
Contributor

drvdputt commented May 9, 2024

Things that do not work anymore

  • access like this: f['fwhm'][:, 0] now has to be f['fwhm']['val']. So I expect there to be more cases where not all the ways of accessing still work.
  • access to the .mask attribute does not seem to work in some cases

The new columns are a structured array, instead of a regular 2D array, so there are some restrictions.

@jdtsmith
Copy link
Contributor Author

jdtsmith commented May 9, 2024

OK. Overall I think the structured array is much cleaner. I intend to merge your extend units PR #284 once finalized, then #281, then this #283.

@jdtsmith
Copy link
Contributor Author

jdtsmith commented May 9, 2024

Also do not worry about soon-to-be-forgotten base.py.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants