Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NXdata: improve documentation regarding axes #1213

Merged

Conversation

woutdenolf
Copy link
Contributor

@woutdenolf woutdenolf commented Oct 10, 2022

Closes #1212

In this PR we refactor the NXdata definition to resolve inconsistencies in definition of @axes, AXISNAME_indices and AXISNAME. NIAC discussions:

The proposed changes do not change anything to the current NXdata, they just clear up ambiguity:

  • @AXISNAME_indices: data dimension(s) spanned by AXISNAME. When missing it defaults to the index or indices of AXISNAME in @axes.
  • @AXISNAME_indices: a single integer or an array of integers.
  • the rank of AXISNAME must be equal to the number of dimensions it spans
  • dimension scales is a confusing term which comes from HDF5 terminology: use axes, axis coordinates or independent variables depending on the context
  • move all complex explanations from the fields/attribute docs to the global NXdata description
  • divide the global NXdata description in sub-sections: signals, axes, uncertainties and examples
  • add better examples

Proper code examples with be added in a separate merge request: #1253

A python reader would have to do something like this to figure out what all the axes are and to which data dimensions they belong

from typing import Dict, List, Sequence
import h5py

def get_axes_dims(nxdata: h5py.Group) -> Dict[str,List[int]]:
    axes_data_dims = dict()
    axes = nxdata.attrs.get("axes", list())
    if isinstance(axes, str):
        axes = [axes]
    for axisname in set(axes):
        if axisname == ".":
            continue
        data_dims = nxdata.attrs.get(f"{axisname}_indices", None)
        if data_dims is None:
            data_dims = [i for i,s in enumerate(axes) if s == axisname]
        elif not isinstance(data_dims, Sequence):
            data_dims = [data_dims]
        else:
            data_dims = list(data_dims)
        axes_data_dims[axisname] = data_dims
    return axes_data_dims

Rendering of the PR

https://hdf5.gitlab-pages.esrf.fr/nexus/nxdata_axes/classes/base_classes/NXdata.html

@woutdenolf woutdenolf linked an issue Oct 10, 2022 that may be closed by this pull request
@woutdenolf woutdenolf force-pushed the 1212-nxdata-axisname_indices-confusion branch 3 times, most recently from f065035 to 2359c3b Compare October 10, 2022 18:45
@woutdenolf woutdenolf force-pushed the 1212-nxdata-axisname_indices-confusion branch 3 times, most recently from 571420f to 234cd6d Compare October 11, 2022 20:25
@woutdenolf woutdenolf force-pushed the 1212-nxdata-axisname_indices-confusion branch 13 times, most recently from 8177df7 to b99c398 Compare October 11, 2022 22:11
@woutdenolf woutdenolf force-pushed the 1212-nxdata-axisname_indices-confusion branch 5 times, most recently from bfa0832 to 73f59d8 Compare March 14, 2023 18:09
base_classes/NXdata.nxdl.xml Outdated Show resolved Hide resolved
base_classes/NXdata.nxdl.xml Outdated Show resolved Hide resolved
base_classes/NXdata.nxdl.xml Outdated Show resolved Hide resolved
base_classes/NXdata.nxdl.xml Outdated Show resolved Hide resolved
@woutdenolf woutdenolf force-pushed the 1212-nxdata-axisname_indices-confusion branch 2 times, most recently from 8539f58 to c02e0c2 Compare March 15, 2023 09:20
@woutdenolf
Copy link
Contributor Author

woutdenolf commented Jun 20, 2023

Changes:

@woutdenolf
Copy link
Contributor Author

For reference, I took @rayosborn's description of axes #1213 (comment)

Axes:

The AXISNAME fields contain the coordinates associated with the data values. AXISNAME fields are typically one-dimensional arrays, which annotate one of the dimensions. However, the fields can also have a rank greater than 1, in which case the rank of each AXISNAME must be equal to the number of dimensions it spans.

The names of the AXISNAME fields to be used as axis coordinates are listed in the axes attribute. When all the AXISNAME fields are one-dimensional, their positions in the axes attribute correspond to the data dimension being annotated. If one of the data dimensions has no AXISNAME field, the string “.” can be used in the corresponding index of the axes list.

When the rank of any of the AXISNAME fields is greater than 1, the AXISNAME_indices attributes is used to define the data dimension(s) spanned by the corresponding AXISNAME fields. It is strongly discouraged to define the AXISNAME_indices attribute for some AXISNAME fields but not for others."

and merged it with my original wording to obtain this

Axes:

The AXISNAME fields contain the axis coordinates associated with the data values. The names of all AXISNAME fields are listed in the axes attribute.

Rank

AXISNAME fields are typically one-dimensional arrays, which annotate one of the dimensions. However, the fields can also have a rank greater than 1, in which case the rank of each AXISNAME must be equal to the number of data dimensions it spans.

Dimensions

The data dimensions annotated by an AXISNAME field are defined by the AXISNAME_indices attribute. When this attribute is missing, the position(s) of the AXISNAME string in the axes attribute are used.

When all AXISNAME fields are one-dimensional, and none of the data dimensions have more than one axis, the AXISNAME_indices attributes are often omitted. If one of the data dimensions has no AXISNAME field, the string “.” can be used in the corresponding index of the axes list.

When providing AXISNAME_indices attributes it is recommended to do it for all axes.

I mainly had an issue with this

When all the AXISNAME fields are one-dimensional, their positions in the axes attribute correspond to the data dimension being annotated.
...
When the rank of any of the AXISNAME fields is greater than 1, the AXISNAME_indices attributes is used to define the data dimension(s) spanned by the corresponding AXISNAME fields.

This seems to imply the AXISNAME_indices are to be ignored when you only have rank 1 axes. This is logically very hard to implement mostly for HDF5 readers. In the merged version this becomes

The data dimensions annotated by an AXISNAME field are defined by the AXISNAME_indices attribute. When this attribute is missing, the position(s) of the AXISNAME string in the axes attribute are used.

This is logically much simpler. @rayosborn's description in spirit wants to say that most of our NXdata groups do not has AXISNAME_indices. So I added this paragraph

When all AXISNAME fields are one-dimensional, and none of the data dimensions have more than one axis, the AXISNAME_indices attributes are often omitted. If one of the data dimensions has no AXISNAME field, the string “.” can be used in the corresponding index of the axes list.

@woutdenolf
Copy link
Contributor Author

To accommodate @phyy-nx who prefer positive wording I replaced

It is strongly discouraged to define the AXISNAME_indices attribute for some AXISNAME fields but not for others.

with

When providing AXISNAME_indices attributes it is recommended to do it for all axes.

@woutdenolf
Copy link
Contributor Author

Split the Axes section into "simple" and "advanced" use cases

I did the splitting in another way with subsections rank and dimensions. The axes section should be easier to read now.

@phyy-nx
Copy link
Contributor

phyy-nx commented Jun 21, 2023

Code review:

  • Example 1, switch x and y
  • Example 2, add note to caption that this is including AXISNAME_indices as an example
  • Merge examples up into the docstring body.

@woutdenolf
Copy link
Contributor Author

woutdenolf commented Jun 21, 2023

Applied the requested modifications from #1213 (comment): examples are merged with the doc sections and adapted to match those sections.

@woutdenolf woutdenolf force-pushed the 1212-nxdata-axisname_indices-confusion branch from 0b63ab1 to b981fa5 Compare June 21, 2023 18:15
@phyy-nx
Copy link
Contributor

phyy-nx commented Oct 19, 2023

From telco, let's get the same subcommittee mentioned above to re-review and approve the PR (@phyy-nx @woutdenolf @rayosborn @sanbrock). Further reviews are welcome. Target is November 2nd. Thanks!

@phyy-nx phyy-nx requested review from PeterC-DLS and removed request for t20100 October 19, 2023 16:24
@PeterC-DLS PeterC-DLS added this to the NXDL 2023.10 milestone Oct 19, 2023
Copy link
Contributor

@benajamin benajamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a big improvement to me!

Copy link
Contributor

@sanbrock sanbrock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@phyy-nx phyy-nx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looks good!

Copy link
Contributor

@rayosborn rayosborn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job.

@phyy-nx
Copy link
Contributor

phyy-nx commented Nov 3, 2023

Ok! Based on Telcos and these approvals, @woutdenolf feel free to merge at your convenience. Thanks!

@woutdenolf woutdenolf merged commit a7dda1e into nexusformat:main Nov 3, 2023
4 checks passed
@woutdenolf
Copy link
Contributor Author

Thanks for the review everyone!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Ready for Review
Development

Successfully merging this pull request may close these issues.

NXdata - @AXISNAME_indices confusion
8 participants