Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite the NXdata scaling_factor and offset fields #1333

Closed
wants to merge 4 commits into from

Conversation

phyy-nx
Copy link
Contributor

@phyy-nx phyy-nx commented Nov 21, 2023

Closes #1332

Also adds clarification in NXmx that these fields can be used as pedestal and gain correction fields, as well as elaborates on the possible rank options. These rank options were implied (in my opinion) in the original wording, but in NXmx I made it more explicit.

Tagging @biochem-fan for reference

@phyy-nx
Copy link
Contributor Author

phyy-nx commented Nov 21, 2023

I purport this doesn't need a vote as it doesn't change functionality, and the change to NXmx only clarifies functionality that was already there. Feedback welcome!

@phyy-nx phyy-nx added this to the NXDL 2023.10 milestone Nov 21, 2023
@biochem-fan
Copy link

biochem-fan commented Nov 22, 2023

@phyy-nx

The elements in data are usually float values really. For efficiency reasons these are usually stored as integers after scaling with a scale factor. This value is the scale factor. It is required to get the actual physical value, when necessary.

The original description for the gain sounds like stored_integer_value = physical_value * the_scale_factor, making the definition of the gain the same as in MX (i.e. value per photon). But your new description is the other way round. Did you check the interpretation at the NIAC?

I know you confirmed the order of the gain and the offset, but I don't know if the definition of the gain was confirmed. This post is just to double check it.

@phyy-nx
Copy link
Contributor Author

phyy-nx commented Nov 22, 2023

Ack, I think you might be right and I misinterpreted this field!

The elements in data are usually float values really. For efficiency reasons these are usually stored as integers after scaling with a scale factor. This value is the scale factor. It is required to get the actual physical value, when necessary.

So I think I now agree this is saying the stored value has already had the scaling factor applied. But that reads differently from offset:

An optional offset to apply to the values in data.

That implies the offset hasn't been implied yet!

So, @nexusformat/developers, how do you use scaling_factor and offset in NXdata, if at all?

@phyy-nx
Copy link
Contributor Author

phyy-nx commented Nov 30, 2023

Bump! @nexusformat/developers :)

How do you use scaling_factor and offset in NXdata, if at all?

@PeterC-DLS
Copy link
Contributor

I don't use these fields but it seems that if you want to document the stored values then stored_value = (physical_value - offset) * scaling_factor may be best for the pedestal use but other people want to use offset as a bias: stored_value = physical_value * scaling_factor + offset. So two more interpretations to choose from!

@benajamin
Copy link
Contributor

I have always interpreted these fields as things that need to be done to the stored value in order to get to the physical value. I would do this as:
physical value = stored value * scaling_factor + offset

@woutdenolf
Copy link
Contributor

I second #1333 (comment) by @benajamin with the exception that I always think about stored value vs. plotted value (meaning the coordinate in the plotting coordinate system) since NXdata is meant to represent "data to be plotted"

plotted value = stored value * scaling_factor + offset

@phyy-nx
Copy link
Contributor Author

phyy-nx commented Jan 3, 2024

Ok I like switching the discussion to "stored" vs. "plotted". Based on that, just to recap without math, there are two general ways these can be interpreted:

  1. You, the person plotting the data, must apply scaling_factor and offset before you plot the data
  2. I, the data preparer, have already applied scaling_factor and offset for you, and I'm noting the values I used

I hope the answer is 1. to match @biochem-fan's use case, but it seems to line up with some of the comments above.

I do note that both @benajamin and @woutdenolf switched the order of operations to match the "bias" version that @PeterC-DLS suggested, compared to the pedestal version that I originally suggested:

a. plotted_value = physical_value * scaling_factor + offset
b. plotted_value = (physical_value + offset) * scaling_factor

a. makes more intuitive sense to me as it more closely matches the equation of a line y=mx+b. But the consequence is that for our MX data that we need to store "gain-corrected" pedestals... But that's not relevant for @biochem-fan's use case (no pedestal I believe), so maybe a. is fine?

Change NXdata scaling_factor to refer to "plotted" data
Change NXdata to refer to "corrected" data, in addition to "physical" data, since it describes units of photons
@phyy-nx
Copy link
Contributor Author

phyy-nx commented Jan 3, 2024

I've made the change to use the "plotted" nomenclature for NXdata, but I've kept the pedestal formula for now, until we get a bit more discussion here (plotted_value = (physical_value + offset) * scaling_factor). Mainly because it's I'm coming from an MX background where pedestal is applied first in the use cases I know about. But I can be convinced otherwise! Especially if folks already have code with the other convention (plotted_value = physical_value * scaling_factor + offset)

@PeterC-DLS
Copy link
Contributor

LGTM. For clarity's sake in your comments, I think stored_value is better than physical_value as you may be plotting the "physical" value.

@PeterC-DLS PeterC-DLS self-requested a review January 8, 2024 14:30
@phyy-nx phyy-nx removed this from the NXDL 2023.10 milestone Jan 17, 2024
This resolves ambiguity if there is more than one signal

For NXmx specify data_scaling_factor and data_offset since the field data is named in the NXdata group
@phyy-nx
Copy link
Contributor Author

phyy-nx commented Jan 18, 2024

Feedback from Telco addressed. If there's no more comments we'll send this to a vote.

@phyy-nx
Copy link
Contributor Author

phyy-nx commented Jan 23, 2024

Hello, please vote by providing an emoji on this comment. Thanks.

@prjemian
Copy link
Contributor

Not happy with the term pedestal in this context. Is it the same as offset?

@phyy-nx
Copy link
Contributor Author

phyy-nx commented Jan 23, 2024

@prjemian pedestal, used only for NXmx, does mean offset, and I assumed it was a known term in x-ray diffraction land. I could be convinced that gain and pedestal should be defined more clearly, but that could be done in a separate, not-voted on PR, since it would only be a documentation clarification.

@prjemian
Copy link
Contributor

Since offset is the term that has been used with NXdata, let's not change it in this PR.

@prjemian
Copy link
Contributor

image

@phyy-nx
Copy link
Contributor Author

phyy-nx commented Jan 23, 2024

Wait it's not being changed in the PR. For NXdata it's FIELDNAME_scaling_factor and FIELDNAME_offset, and for NXmx is data_scaling_factor and data_offset.

Pedestal is just a documentation term. Does that help?


This formula will derive the corrected value, when necessary.

Use these fields to specify gain and/or pedestal constants that need to be applied
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Use these fields to specify gain and/or pedestal constants that need to be applied
Use these fields to specify gain and/or offset constants that need to be applied

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to add more clarification in a different PR after this vote.

to the data to correct it to physical values. For example, if the detector gain
is 10 counts per photon and a constant background of 400 needs to be subtracted
off the pixels, specify data_scaling_factor as 0.1 and data_offset as -400 to
specifiy the required conversion from raw counts to pedestal-corrected photons. It
Copy link
Contributor

@prjemian prjemian Jan 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
specifiy the required conversion from raw counts to pedestal-corrected photons. It
specify the required conversion from raw counts to offset-corrected photons. It

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to add more clarification and fix the typo in a different PR after this vote.

@biochem-fan
Copy link

Depending on the field, various terms are used: "offset", "pedestal" and "bias". For example, ThermoFisher's electron detectors use "bias".

Perhaps keeping "offset" for NXdata is a good idea but I would prefer to have other term mentioned as well in the NXmx documentation for better searchability.

@PeterC-DLS
Copy link
Contributor

Should reserved suffixes be updated too?

@prjemian
Copy link
Contributor

prjemian commented Jan 31, 2024 via email

@phyy-nx
Copy link
Contributor Author

phyy-nx commented Feb 5, 2024

Happy to modify reserved suffixes in a different PR after this vote.


.. code-block::

plotted_data = (data + offset) * scaling_factor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I strongly disagree with the use of "plotted_data" and "data" in this equation. Firstly, "data" is ambiguous and should be something like "stored values", "dataset values", or "recorded values". Secondly "plotted data" makes no sense because nothing is plotted (past tense) at the time when one is considering this equation and it is also meaningless because one could plot any old values. The NeXus manual says that we strive to record physically meaningful values - this equation is to be used when we do not record physically meaningful values and so its purpose should be to convert to physically meaningful values. Therefore, I would argue that "physical values" should definitely be used instead of "plotted values".

</doc>
</field>

<field name="scaling_factor" type="NX_FLOAT" deprecated="Use FIELDNAME_scaling_factor instead">
<doc>
Due to scaling_factor being ambiguous in the case of multiple signals, use

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of points (both minor)

  1. I suggest having some throw-away comment describing the intended semantics (e.g., "Had similar semantics to FIELDNAME_scaling_factor"). This is to allow someone reading the spec to understand how to interpret existing data, where scaling_factor has already been used.

    This comment also applies to the offset field.

  2. Since the ambiguity comes in the case where multiple signals are present (per the proposed doc), I'm assuming the ambiguity doesn't exist if there is a single signal. For single signal data, is scaling_factor still deprecated? I would imagine so, but perhaps the wording could be made more explicit; for example, by making the statement "use FIELDNAME_scaling_factor instead" a separate sentence, perhaps qualifying it by saying something like "all future data should use ..." .

    This comment also applies to the offset field.

I consider neither comment blocking

@paulmillar
Copy link

LGTM 😄

@phyy-nx
Copy link
Contributor Author

phyy-nx commented Feb 6, 2024

Vote did not pass (got 12 votes, needed 13 for quorum), which is fine given all the discussion. For the sake of clarity I'm closing this PR, removing it and the associated issue from the milestone, and I will make a new PR that addresses the feedback here. We'll try again then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rework NXdata scaling_factor and offset
7 participants