-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change to the axes attribute meaning #1381
Comments
Also, the statement: "The names of all [AXISNAME] fields are listed in the [axes] attribute." Is I believe a big change, and has made all the files where we write read back and target/set/demand axes like (for a line scan in x):
Are now not valid NeXus. |
Also, from the 2014 spec here: axes: String array that defines the independent data fields used in default plot for all of the The third example provides two entries for dim 3. https://www.nexusformat.org/2014_axes_and_uncertainties.html |
An older version of the definition said:
|
So if I understand the issue: the number of elements in the The introduction of axes spanning more than one dimension and a single dimension that can be covered by multiple axes (multi-dimensional axes or not) prompted to lift this limitation. I don't see how this makes your files invalid. Could you provide an example of an NXdata structure that used to be valid and is no longer valid? |
If your application parsed the older NXdata correctly, then it would be broken by
|
Ok so the problem is not that old files are no longer valid. The problem is that existing software my not be able to read new files which take advantage of the new relaxed constraints on |
Tweak examples and clarify that alternative choices for axis values can be indicated by the presence of its corresponding AXISNAME_indices attribute
OK, lots to go through here:
I don't understand this argument - and hence the reason for this format change. I don't understand how _indices alone doesn't unambiguously describe these cases (axes could literally be [.,.,.]). The axes attribute just being an under-ordered list of axes names give no additional information to just parsing the NXdata for all the _indices tags.
Maybe I was interpreting the new specification incorrectly. We use _indices to assign multiple and multidimensional axes to dimensions of the signal, and at the same time have axes length matching the signal rank. I was assuming that the new specification meant that in this case the axes tag should be a list containing of all the _indices datasets. Ours is not, so if I was interpreting the spec correctly they are not valid NeXus files. If this is not the correct interpretation, can you explain when axes should be an ordered list matching the rank of the signal dataset and when it should not? |
Code is less ambiguous than words. In the MR that refactored NXdata #1213 I added this snippet that takes an NXdata group and returns a dictionary of axis names and the dimensions they span: from typing import Dict, List, Sequence
import h5py
def get_axes_dims(nxdata: h5py.Group) -> Dict[str,List[int]]:
axes_data_dims = dict()
axes = nxdata.attrs.get("axes", list())
if isinstance(axes, str):
axes = [axes]
for axisname in set(axes):
if axisname == ".":
continue
data_dims = nxdata.attrs.get(f"{axisname}_indices", None)
if data_dims is None:
data_dims = [i for i,s in enumerate(axes) if s == axisname]
elif not isinstance(data_dims, Sequence):
data_dims = [data_dims]
else:
data_dims = list(data_dims)
axes_data_dims[axisname] = data_dims
return axes_data_dims In other words, the position of |
In my experience long gitlab discussions like this don't help very much but since you insist, here we go:
Sure it is simple, but what does it validate? You want to check that the field mentioned in attributes So I would assume you want to check whether the shape of an How is the check In the past Since multi-dimensional axes were introduced, what does the check
I believe you are talking about this
It is a 2D meshed grid. Although the requested scan would be a regular 2D grid, the actual motor positions you get from the encoder values are not regular. So each point in the 2D map has unique [X,Y] coordinates. At the ESRF all scan data is like that, mapping, tomography, whatever. Suppose you say order does matter. Which order is correct, ["x", "y"] or ["y", "x"]? Unless we add meaning to the order of axes, I don't see why one would be correct and the other not. Hence "the order does not matter".
I believe you are talking about this
and the equivalent
And as you mentioned there is the third option to support software that does not handle
Sure fair enough. All three options are valid and the last option supports both old and new software.
I believe you are talking about this
Since you want This One argument that I always bump into when suggesting improvements to NXdata is:
Not really. The position of So to know which axes an NXdata plot has, you need to take all values from
Why do you have to do this? As I discussed in the beginning,
I hope the examples listed above make this clear
This is still valid NeXus. You have one axis called The fact that we also have
From all the above I think it is clear that
As I mentioned before, that's a very convoluted and unintuitive way of defining which fields are axes. For signals we have
See the example
Still valid NeXus as I mentioned.
In the absence of |
Regardless of all this, since #1213 was not voted on because we thought it didn't change the standard we should explicitly mention that |
@jacobfilik Perhaps a shorter way to point to the issues or at the least the undefined behavior when requiring
In other words, if the length of It seems clear now that But when we put it back we need to clearly define what it means. Since you are clearly convinced about this |
@woutdenolf In your example the axes attribute can literally be [.,.,.] [x,x,.] [x,y,.] it does not matter because you have explicitly set the _indices attributes. If _indices is missing you cant do anything in this case, and I would say that is not a valid file unless axes is [.,.,.] (_indices for all axes was a soft recommendation if i remember correctly, writers should add it, but readers should cope if its missing if possible, which in this case it is not). Assuming in your example x is the fast axes, I would argue that the axes attribute should be [y,x,.] because at least then when you see x and y are 2D you can take the average/first line of x and y and get sensible axes for your signal. Even better than that, since its a grid scan, i would recommend including the set axes as well, like: data:NXdata since signal is a grid, so will be best visualised against the requested linear axes, rather than the readback values which will only be as linear as your control system is configured/forced to make them. This is how we have been aspiring to write xrf grid scans for many years. In the nexusformat example data repo there is an example file from diamond showing this: https://github.com/nexusformat/exampledata/blob/master/DLS/p45/hdf5/p45-1168.nxs
Do you disagree that this is what is rewritten in the 2014 spec? My third comment I thought showed it pretty explicitly. If you can demonstrate why your proposal to change this 10 year old meaning of the standard is an improvement I would be happy to use it, but I have not seen that case. I am quite concerned that no one else from the NIAC (except Pete) has opinions on what I thought was one of the core parts of the standard. I agree that the documentation could use improvement in this area and am happy to contribute. I am not part of the NIAC so have not been involved in the discussions so far, but am happy to join a call to discuss this issue. |
Maybe something like this?
I guess that would be internally consistent. I'll think about it some more and make a proposal. With that definition the example can be expressed as
or as
This would mean none of the ESRF data can be expressed as NXdata. Too bad but as I said, if we want to change the standard it needs a vote so we'll need to live with that for now. |
Ha, we commented at exactly the same time. This is getting a bit messy for an issue, but for your above example I dont see why ["y","x","."] or even [".",".","."], or the two cases you gave would not be completely valid NeXus. Why is this stopping ESRF data being written as NXdata? |
Isn't this code buggy for the new all-axes-choices case where |
The case of multi-dimensional axis fields was expounded in the 2014 document so it is not a new case that can prompt the change of rank-@axes requirement. |
Because we have no sensible way of knowing what axes an NXdata group has. Looping over the attributes and checking whether they end with the
Ok so this is the crux of the matter. If anything goes as long Could you provide a definition for |
Ok this in combination with "I dont see why ["y","x","."] or even [".",".","."], or the two cases you gave would not be completely valid NeXus" completely breaks any logic from my pov. So if you could provide a definition of |
I don't see how but maybe I don't understand what the "all-axes-choices" is |
That's data_dims = [i for i,s in enumerate(axes) if s == axisname] can contain integers > It is my opinion that |
Because at least it defines
So if this is not a correct definition we need to fix it, but then I would like the see the correct definition. And not one based on examples, an actual definition like the one I provided. And honestly, neither myself nor any of the reviewers realized we were changing the definition. And the reason imo is that NXdata was obfuscated to keep appearances up but did not make any sense logically. I've spend years trying to understand NXdata and still got it wrong apparently.
Excellent, so can you provide a definition of
This is a fallback when Let me rename from typing import Dict, List, Sequence
import h5py
def get_axes_dims(nxdata: h5py.Group) -> Dict[str,List[int]]:
"""Return all the NXdata axes and the signal dimensions they span.
"""
axes_data_dims = dict()
axes = nxdata.attrs.get("axes", list())
if isinstance(axes, str):
axes = [axes]
for axisname in set(axes):
if axisname == ".":
continue
axisname_indices = nxdata.attrs.get(f"{axisname}_indices", None)
if axisname_indices is None:
axisname_indices = [i for i,s in enumerate(axes) if s == axisname]
elif not isinstance(axisname_indices, Sequence):
axisname_indices = [axisname_indices]
else:
axisname_indices = list(axisname_indices)
axes_data_dims[axisname] = axisname_indices
return axes_data_dims |
Certainly you do not mean "possible"? That could be an enormous list with
enumerations for each dimension? The intent of this group is to declare the
default visualization is some data. The axes attribute declares the default
field associated with each dimension. A data file user could choose these
defaults or override with their own.
…On Thu, Jun 27, 2024, 11:44 AM Peter Chang ***@***.***> wrote:
Isn't this code buggy for the new all-axes-choices case where ***@***.***
<https://github.com/axes>) > rank?
I don't see how but maybe I don't understand what the "all-axes-choices" is
That's @axes being the list of all possible AXISNAMEs case. Then
data_dims = [i for i,s in enumerate(axes) if s == axisname]
can contain integers > rank and also be of length > rank.
It is my opinion that @axes was meant to a list of the AXISNAMEs *used*
as coordinate values in the corresponding dimensions of the default plot.
This is what the 2014 proposal states.
—
Reply to this email directly, view it on GitHub
<#1381 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AARMUMAUUQJT7N6BDONKEI3ZJQ6NJAVCNFSM6AAAAABGWNGV4OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJVGE4TINRTHE>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
@prjemian @PeterC-DLS @jacobfilik I think it will be more constructive that each of us writes down the definition of I'll repeat the (wrong) definition I had in mind when refactoring the NXdata docs in #1213
This definition unambiguously defines what all the axes fields are, what signal dimensions they span and what their shape must be. It does not provide the default axis for a dimension that is spanned by more than one axis. To be clear with the word "shape" I mean this. Maybe there is a better word for that. |
OK. We are getting somewhere.
Absolutely completely agree with this statement. It does work, but it is not nice and can easily lead to invalid files if one is not careful. Can I propose a different solution: |
Thanks for your proposal @jacobfilik. Sure we can talk about something like I would like to see a definition of |
Ok, but I have never had a problem with what is described here: https://www.nexusformat.org/2014_axes_and_uncertainties.html Which has the description and examples of multidimensional data and multiple axes per dimension. If this description is not reproduced (or linked to) in the NXdata object description that is a different issue and if there are cases that are poorly defined by this definition I am happy to discuss them. |
Perfect. Then you are in the ideal position to write down the definition of |
The @axes attribute provides the names of the AXISNAME fields to be used the in default plot for all of the dimensions of the signal field. One entry is provided for every dimension in the signal field. When no default axis is available for a particular dimension of the plottable data, a “.” is used in that position. All AXISNAME fields should have a corresponding @AXISNAME_indices attribute providing the signal dimensions they span. The shape of an AXISNAME field is required to be equal to the shape of the spanned signal dimensions. |
Ok thanks! The definition leaves a couple of things open
|
So sounds like it is mandatory for writers? I assume its only optional in the NXdata definition because AXISNAME is also optional i.e. you cant mandate axes when there are no axes.
|
Searching the repo for "auxiliary", these seem relevant: |
The concept of auxiliary_axes was mentioned on the mailing list back in 2018 during the discussion of auxiliary_signals to compensate for the loss of the signal=n feature that was in the pre-2014 NXdata definition. It was suggested this could be used to provide a separate axis for the auxiliary signal but I think it was agreed that a separate signal with a separate axes should be a separate NXdata. To quote Armando:
So this was never adopted and I think the use of the term auxiliary_axes here better fits with how auxiliary_signals works (i.e. both would be a list of datasets that meet specific criteria). Having said that I would also be fine with a different name. |
Thanks to @jacobfilik and @woutdenolf for spending so much time finding ways of making the 2014 axes definitions work. I have never had time to implement this in the NeXpy, because I never knew what to do with multidimensional axes. I spent some time investigating Voronoi plots, such as the following, but I was never sure whether it was worth writing generic software to produce them.
Of course, we would need to work out how to generalize this to higher dimensions, and decide whether, for example, the length of the coordinates attribute has to be the same as the signal rank. Perhaps there would still be a role for 'x_indices', etc., although even that might not be necessary.
I think the reason why the issues discussed here are so difficult to resolve is that we are trying to make the 'axes' attribute perform too many incompatible functions at the same time. By the way, I was uncomfortable about the claim that, with the '_indices' attributes, the order wouldn't matter. When plotting in NeXpy, I still want to know what is the x-axis and what is the y, even if they are listed as coordinates. About 15 years ago, I proposed that we had a separate NXdata subclass, in which the centers of each pixel were defined by coordinates, rather than axes, but I don't think I explained it very well and I never followed it up, unfortunately. |
Proposal to rectify the wrongful |
@rayosborn The AXISNAME fields are not vectors or something, they contain numbers, so these are coordinates in a coordinate system. So me there is no difference between "axes" and "coordinates" in the context of the NXdata definition. Whether the grid is regular or irregular, each value of the signal (dependent variable) has coordinates (independent variable) in a coordinate system that is not defined. Which reminds me of #1033. Hopefully by September we got NXdata sorted out and I can continue with this proposal. |
After the Telco, I wondered if one way of making the axes descriptions a little simpler in the documentation would be to write that the axes could be defined either as one-dimensional axes (the default) or as multidimensional coordinates. In other words, without defining new attribute names, we just state that both ways of using the 'axes' attribute are allowed. After describing the default method to define 1D axes, you can then describe how to use the '_indices' attributes to define the axes as coordinates. This wouldn't require any changes to the current standard. |
Hello,
It has recently been raised at DLS that the documentation for NXdata now explicitly says that the axes attribute does not have to be the same length as the rank of the signal dataset.
I do not understand the reasoning for this change, the check that the axes attribute length matches the rank of the signal dataset has been a validation check we have used since the 2014 changes to the NXdata.
I have seen there is lots of discussion around this change in various issue and PRs, but I do not believe it is a sensible choice, since the length of the axes attribute matching the rank of the signal is a simple check (and if it wasn't necessary why have the "." marker at all).
In the docs, there are three examples where the axes attribute has been commented with ""the order does NOT matter" - the latter two showing the issue raised.
The first example looks reasonable, an x and y value for all values in the grid - likely to be the readback values from the stages in a grid scan (although if this was the case it would probably have been preferable to include the _set demand axes as well and tag them as axes for each dimension - if it is not a "meshed" grid scan I would wonder why all the datasets were not vectors).
The second example I would argue you should still have [x, . , y] as the axes attribute, so the _indices are consistent with that attribute - then at least software that doesn't understand the _indices would still correctly parse the NXdata.
The third example I don't know why the axes attribute is not ["x", "y", "energy"] or ["x", "y", "wavelength"] or ["x", "y", "energy", "."] - with the writer of the file deciding the priority (or which default axis to show - either wavelength or energy or just index). Again, any software that expects axes attribute length to match the signal rank would parse this, and anything that understands _indices should give the option to let the user change the axis for the final dimension.
With this change the axes attribute just becomes a list of axes names (which I can get by parsing the NXdata for anything ending in _indices). The first thing I have to do is now check axes length == signal rank and if it doesnt I completely ignore the axes attribute because it tells me nothing.
Can you give an explicit use case for this change of behaviour, or explain why the above interpretation is wrong?
The text was updated successfully, but these errors were encountered: