Rewrite the NXdata scaling_factor and offset fields #1333

phyy-nx · 2023-11-21T22:54:17Z

Also adds clarification in NXmx that these fields can be used as pedestal and gain correction fields, as well as elaborates on the possible rank options. These rank options were implied (in my opinion) in the original wording, but in NXmx I made it more explicit.

Tagging @biochem-fan for reference

phyy-nx · 2023-11-21T22:54:56Z

I purport this doesn't need a vote as it doesn't change functionality, and the change to NXmx only clarifies functionality that was already there. Feedback welcome!

biochem-fan · 2023-11-22T00:03:56Z

@phyy-nx

The elements in data are usually float values really. For efficiency reasons these are usually stored as integers after scaling with a scale factor. This value is the scale factor. It is required to get the actual physical value, when necessary.

The original description for the gain sounds like stored_integer_value = physical_value * the_scale_factor, making the definition of the gain the same as in MX (i.e. value per photon). But your new description is the other way round. Did you check the interpretation at the NIAC?

I know you confirmed the order of the gain and the offset, but I don't know if the definition of the gain was confirmed. This post is just to double check it.

phyy-nx · 2023-11-22T00:41:26Z

Ack, I think you might be right and I misinterpreted this field!

The elements in data are usually float values really. For efficiency reasons these are usually stored as integers after scaling with a scale factor. This value is the scale factor. It is required to get the actual physical value, when necessary.

So I think I now agree this is saying the stored value has already had the scaling factor applied. But that reads differently from offset:

An optional offset to apply to the values in data.

That implies the offset hasn't been implied yet!

So, @nexusformat/developers, how do you use scaling_factor and offset in NXdata, if at all?

phyy-nx · 2023-11-30T19:32:00Z

Bump! @nexusformat/developers :)

How do you use scaling_factor and offset in NXdata, if at all?

PeterC-DLS · 2023-12-11T12:53:02Z

I don't use these fields but it seems that if you want to document the stored values then stored_value = (physical_value - offset) * scaling_factor may be best for the pedestal use but other people want to use offset as a bias: stored_value = physical_value * scaling_factor + offset. So two more interpretations to choose from!

benajamin · 2023-12-20T15:55:57Z

I have always interpreted these fields as things that need to be done to the stored value in order to get to the physical value. I would do this as:
physical value = stored value * scaling_factor + offset

woutdenolf · 2023-12-20T16:07:52Z

I second #1333 (comment) by @benajamin with the exception that I always think about stored value vs. plotted value (meaning the coordinate in the plotting coordinate system) since NXdata is meant to represent "data to be plotted"

plotted value = stored value * scaling_factor + offset

phyy-nx · 2024-01-03T00:52:56Z

Ok I like switching the discussion to "stored" vs. "plotted". Based on that, just to recap without math, there are two general ways these can be interpreted:

You, the person plotting the data, must apply scaling_factor and offset before you plot the data
I, the data preparer, have already applied scaling_factor and offset for you, and I'm noting the values I used

I hope the answer is 1. to match @biochem-fan's use case, but it seems to line up with some of the comments above.

I do note that both @benajamin and @woutdenolf switched the order of operations to match the "bias" version that @PeterC-DLS suggested, compared to the pedestal version that I originally suggested:

a. plotted_value = physical_value * scaling_factor + offset
b. plotted_value = (physical_value + offset) * scaling_factor

a. makes more intuitive sense to me as it more closely matches the equation of a line y=mx+b. But the consequence is that for our MX data that we need to store "gain-corrected" pedestals... But that's not relevant for @biochem-fan's use case (no pedestal I believe), so maybe a. is fine?

Change NXdata scaling_factor to refer to "plotted" data Change NXdata to refer to "corrected" data, in addition to "physical" data, since it describes units of photons

phyy-nx · 2024-01-03T01:07:23Z

I've made the change to use the "plotted" nomenclature for NXdata, but I've kept the pedestal formula for now, until we get a bit more discussion here (plotted_value = (physical_value + offset) * scaling_factor). Mainly because it's I'm coming from an MX background where pedestal is applied first in the use cases I know about. But I can be convinced otherwise! Especially if folks already have code with the other convention (plotted_value = physical_value * scaling_factor + offset)

PeterC-DLS · 2024-01-08T14:30:20Z

LGTM. For clarity's sake in your comments, I think stored_value is better than physical_value as you may be plotting the "physical" value.

This resolves ambiguity if there is more than one signal For NXmx specify data_scaling_factor and data_offset since the field data is named in the NXdata group

phyy-nx · 2024-01-18T22:30:29Z

Feedback from Telco addressed. If there's no more comments we'll send this to a vote.

phyy-nx · 2024-01-23T00:10:19Z

Hello, please vote by providing an emoji on this comment. Thanks.

prjemian · 2024-01-23T03:13:43Z

Not happy with the term pedestal in this context. Is it the same as offset?

phyy-nx · 2024-01-23T18:19:23Z

@prjemian pedestal, used only for NXmx, does mean offset, and I assumed it was a known term in x-ray diffraction land. I could be convinced that gain and pedestal should be defined more clearly, but that could be done in a separate, not-voted on PR, since it would only be a documentation clarification.

prjemian · 2024-01-23T18:57:41Z

Since offset is the term that has been used with NXdata, let's not change it in this PR.

prjemian · 2024-01-23T18:59:36Z

phyy-nx · 2024-01-23T19:01:39Z

Wait it's not being changed in the PR. For NXdata it's FIELDNAME_scaling_factor and FIELDNAME_offset, and for NXmx is data_scaling_factor and data_offset.

Pedestal is just a documentation term. Does that help?

prjemian · 2024-01-23T19:03:24Z

applications/NXmx.nxdl.xml

+
+                        This formula will derive the corrected value, when necessary.
+
+                        Use these fields to specify gain and/or pedestal constants that need to be applied


Suggested change

Use these fields to specify gain and/or pedestal constants that need to be applied

Use these fields to specify gain and/or offset constants that need to be applied

Happy to add more clarification in a different PR after this vote.

prjemian · 2024-01-23T19:03:52Z

applications/NXmx.nxdl.xml

+                        to the data to correct it to physical values.  For example, if the detector gain
+                        is 10 counts per photon and a constant background of 400 needs to be subtracted
+                        off the pixels, specify data_scaling_factor as 0.1 and data_offset as -400 to
+                        specifiy the required conversion from raw counts to pedestal-corrected photons. It


Suggested change

specifiy the required conversion from raw counts to pedestal-corrected photons. It

specify the required conversion from raw counts to offset-corrected photons. It

Happy to add more clarification and fix the typo in a different PR after this vote.

biochem-fan · 2024-01-24T01:36:40Z

Depending on the field, various terms are used: "offset", "pedestal" and "bias". For example, ThermoFisher's electron detectors use "bias".

Perhaps keeping "offset" for NXdata is a good idea but I would prefer to have other term mentioned as well in the NXmx documentation for better searchability.

PeterC-DLS · 2024-01-31T09:35:22Z

Should reserved suffixes be updated too?

prjemian · 2024-01-31T12:14:23Z

Yes

…

On Wed, Jan 31, 2024, 3:35 AM Peter Chang ***@***.***> wrote: Should reserved suffixes <https://manual.nexusformat.org/datarules.html#reserved-suffixes> be updated too? — Reply to this email directly, view it on GitHub <#1333 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AARMUMAQXZF42W4M7HS25JLYRIF6LAVCNFSM6AAAAAA7VKHPNOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMJYG4ZDMOJVGA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

phyy-nx · 2024-02-05T17:49:12Z

Happy to modify reserved suffixes in a different PR after this vote.

benajamin · 2024-02-05T20:48:13Z

base_classes/NXdata.nxdl.xml

+
+			.. code-block::
+
+				plotted_data = (data + offset) * scaling_factor


I strongly disagree with the use of "plotted_data" and "data" in this equation. Firstly, "data" is ambiguous and should be something like "stored values", "dataset values", or "recorded values". Secondly "plotted data" makes no sense because nothing is plotted (past tense) at the time when one is considering this equation and it is also meaningless because one could plot any old values. The NeXus manual says that we strive to record physically meaningful values - this equation is to be used when we do not record physically meaningful values and so its purpose should be to convert to physically meaningful values. Therefore, I would argue that "physical values" should definitely be used instead of "plotted values".

paulmillar · 2024-02-05T21:05:45Z

base_classes/NXdata.nxdl.xml

 		</doc>
 	</field>

+	<field name="scaling_factor" type="NX_FLOAT" deprecated="Use FIELDNAME_scaling_factor instead">
+		<doc>
+			Due to scaling_factor being ambiguous in the case of multiple signals, use


A couple of points (both minor)

I suggest having some throw-away comment describing the intended semantics (e.g., "Had similar semantics to FIELDNAME_scaling_factor"). This is to allow someone reading the spec to understand how to interpret existing data, where scaling_factor has already been used.

This comment also applies to the offset field.

Since the ambiguity comes in the case where multiple signals are present (per the proposed doc), I'm assuming the ambiguity doesn't exist if there is a single signal. For single signal data, is scaling_factor still deprecated? I would imagine so, but perhaps the wording could be made more explicit; for example, by making the statement "use FIELDNAME_scaling_factor instead" a separate sentence, perhaps qualifying it by saying something like "all future data should use ..." .

This comment also applies to the offset field.

I consider neither comment blocking

paulmillar · 2024-02-05T21:07:38Z

LGTM 😄

phyy-nx · 2024-02-06T17:47:42Z

Vote did not pass (got 12 votes, needed 13 for quorum), which is fine given all the discussion. For the sake of clarity I'm closing this PR, removing it and the associated issue from the milestone, and I will make a new PR that addresses the feedback here. We'll try again then.

Rewrite the NXdata scaling_factor and offset fields

650cfc1

phyy-nx added this to the NXDL 2023.10 milestone Nov 21, 2023

Fix references

1ddde5a

Code review

6c4b856

Change NXdata scaling_factor to refer to "plotted" data Change NXdata to refer to "corrected" data, in addition to "physical" data, since it describes units of photons

PeterC-DLS self-requested a review January 8, 2024 14:30

PeterC-DLS approved these changes Jan 8, 2024

View reviewed changes

phyy-nx removed this from the NXDL 2023.10 milestone Jan 17, 2024

Use FIELDNAME_scaling_factor and FIELDNAME_offset

6ff76d6

This resolves ambiguity if there is more than one signal For NXmx specify data_scaling_factor and data_offset since the field data is named in the NXdata group

prjemian reviewed Jan 23, 2024

View reviewed changes

benajamin reviewed Feb 5, 2024

View reviewed changes

paulmillar reviewed Feb 5, 2024

View reviewed changes

phyy-nx closed this Feb 6, 2024

phyy-nx mentioned this pull request Feb 6, 2024

Better rewrite of NXdata scaling_factor and offset fields #1343

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite the NXdata scaling_factor and offset fields #1333

Rewrite the NXdata scaling_factor and offset fields #1333

phyy-nx commented Nov 21, 2023

phyy-nx commented Nov 21, 2023

biochem-fan commented Nov 22, 2023 •

edited

Loading

phyy-nx commented Nov 22, 2023

phyy-nx commented Nov 30, 2023

PeterC-DLS commented Dec 11, 2023

benajamin commented Dec 20, 2023

woutdenolf commented Dec 20, 2023

phyy-nx commented Jan 3, 2024

phyy-nx commented Jan 3, 2024

PeterC-DLS commented Jan 8, 2024

phyy-nx commented Jan 18, 2024

phyy-nx commented Jan 23, 2024

prjemian commented Jan 23, 2024

phyy-nx commented Jan 23, 2024

prjemian commented Jan 23, 2024

prjemian commented Jan 23, 2024

phyy-nx commented Jan 23, 2024

prjemian Jan 23, 2024

phyy-nx Feb 5, 2024

prjemian Jan 23, 2024 •

edited

Loading

phyy-nx Feb 5, 2024

biochem-fan commented Jan 24, 2024

PeterC-DLS commented Jan 31, 2024

prjemian commented Jan 31, 2024 via email

phyy-nx commented Feb 5, 2024

benajamin Feb 5, 2024

paulmillar Feb 5, 2024

paulmillar commented Feb 5, 2024

phyy-nx commented Feb 6, 2024


		This formula will derive the corrected value, when necessary.

		Use these fields to specify gain and/or pedestal constants that need to be applied

	specifiy the required conversion from raw counts to pedestal-corrected photons. It
	specify the required conversion from raw counts to offset-corrected photons. It


		.. code-block::

		plotted_data = (data + offset) * scaling_factor

Rewrite the NXdata scaling_factor and offset fields #1333

Rewrite the NXdata scaling_factor and offset fields #1333

Conversation

phyy-nx commented Nov 21, 2023

phyy-nx commented Nov 21, 2023

biochem-fan commented Nov 22, 2023 • edited Loading

phyy-nx commented Nov 22, 2023

phyy-nx commented Nov 30, 2023

PeterC-DLS commented Dec 11, 2023

benajamin commented Dec 20, 2023

woutdenolf commented Dec 20, 2023

phyy-nx commented Jan 3, 2024

phyy-nx commented Jan 3, 2024

PeterC-DLS commented Jan 8, 2024

phyy-nx commented Jan 18, 2024

phyy-nx commented Jan 23, 2024

prjemian commented Jan 23, 2024

phyy-nx commented Jan 23, 2024

prjemian commented Jan 23, 2024

prjemian commented Jan 23, 2024

phyy-nx commented Jan 23, 2024

prjemian Jan 23, 2024

Choose a reason for hiding this comment

phyy-nx Feb 5, 2024

Choose a reason for hiding this comment

prjemian Jan 23, 2024 • edited Loading

Choose a reason for hiding this comment

phyy-nx Feb 5, 2024

Choose a reason for hiding this comment

biochem-fan commented Jan 24, 2024

PeterC-DLS commented Jan 31, 2024

prjemian commented Jan 31, 2024 via email

phyy-nx commented Feb 5, 2024

benajamin Feb 5, 2024

Choose a reason for hiding this comment

paulmillar Feb 5, 2024

Choose a reason for hiding this comment

paulmillar commented Feb 5, 2024

phyy-nx commented Feb 6, 2024

biochem-fan commented Nov 22, 2023 •

edited

Loading

prjemian Jan 23, 2024 •

edited

Loading