Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update -DMERCURY_USE_XDR=ON code to pass unit tests #773

Merged
merged 1 commit into from
Dec 19, 2024

Conversation

chuckcranor
Copy link
Collaborator

Mercury configured with -DMERCURY_USE_XDR=ON fails to pass the unit tests. The main issue is that mercury was not properly accounting for the data padding that xdr uses.

When encoding, xdr pads all data added to its buffer to the next multiple of BYTES_PER_XDR_UNIT (4 bytes, see /usr/include/rpc/xdr.h) and provides a macro RNDUP(s) to round "s" up to the next multiple of BYTES_PER_XDR_UNIT.

The native mercury code (when not using xdr) does not pad data. When mercury is compiled to use xdr it does not account for the padding xdr uses and gets out of sync with xdr. Given an hg_proc_t "p" we need to make sure that hg_proc_get_size_used(p) is always equal to xdr_getpos(hg_proc_get_xdr_ptr(p)) after encoding a chunk.

To fix this we:

  • modify the xdr version of HG_PROC_TYPE() in mercury_proc.h to round up the size using RNDUP().
  • modify the xdr version of HG_PROC_BYTES() to mercury_proc.h to round up the byte chunk size using RNDUP()

Note that the mercury checksum in this case does not cover the padding (since the calls to HG_PROC_CHECKSUM_UPDATE() are on the orig data buffer,
and that buffer is not padded. This is not an issue, since the padding
is thrown away at decode time.

In addition, we change the xdr call in HG_PROC_BYTES() from xdr_bytes() to xdr_opaque() which better matches the intent of HG_PROC_BYTES(). For reference: xdr_opaque(xdr, char *data, size) copies RNDUP(size) bytes (the data plus padding) to the buffer. xdr_bytes(xdr, char **data, u_int *size, UINT_MAX) copies both a uint32_t length and RNDUP(size) bytes (the data plus padding) the buffer. HG_PROC_BYTES() does not need xdr to encode the length in the buffer (matches the non-xdr case).

Remove code in mercury.c that forces payload_size to be the entire buffer. I do not think xdr requires the entire buffer to be payload if mercury accounts for the BYTES_PER_XDR_UNIT padding.

Unit tests attempt to verify the expected payload length by manually computing it and comparing it to what is received. Update the tests to handle BYTES_PER_XDR_UNIT padding when compiled with xdr enabled.

Copy link

github-actions bot commented Dec 17, 2024

CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅

@chuckcranor
Copy link
Collaborator Author

Can we add a -DMERCURY_USE_XDR=ON compile to the CI testing?

@chuckcranor
Copy link
Collaborator Author

chuckcranor commented Dec 17, 2024

I have read the CLA Document and I hereby sign the CLA

Mercury configured with -DMERCURY_USE_XDR=ON fails to pass the unit tests.
The main issue is that mercury was not properly accounting for the data
padding that xdr uses.

When encoding, xdr pads all data added to its buffer to the next multiple
of BYTES_PER_XDR_UNIT (4 bytes, see /usr/include/rpc/xdr.h) and provides
a macro RNDUP(s) to round "s" up to the next multiple of BYTES_PER_XDR_UNIT.

The native mercury code (when not using xdr) does not pad data.  When
mercury is compiled to use xdr it does not account for the padding xdr
uses and gets out of sync with xdr.  Given an hg_proc_t "p" we need to
make sure that hg_proc_get_size_used(p) is always equal to
xdr_getpos(hg_proc_get_xdr_ptr(p)) after encoding a chunk.

To fix this we:
 - modify the xdr version of HG_PROC_TYPE() in mercury_proc.h to round
   up the size using RNDUP().
 - modify the xdr version of HG_PROC_BYTES() to mercury_proc.h to round
   up the byte chunk size using RNDUP()

Note that the mercury checksum in this case does not cover the padding
(since the calls to HG_PROC_CHECKSUM_UPDATE() are on the orig data buffer,
and that buffer is not padded.   This is not an issue, since the padding
is thrown away at decode time.

In addition, we change the xdr call in HG_PROC_BYTES() from xdr_bytes()
to xdr_opaque() which better matches the intent of HG_PROC_BYTES().
For reference: xdr_opaque(xdr, char *data, size) copies RNDUP(size)
bytes (the data plus padding) to the buffer.  xdr_bytes(xdr, char **data,
u_int *size, UINT_MAX) copies both a uint32_t length and RNDUP(size)
bytes (the data plus padding) the buffer.  HG_PROC_BYTES() does not
need xdr to encode the length in the buffer (matches the non-xdr case).

Remove code in mercury.c that forces payload_size to be the entire
buffer.  I do not think xdr requires the entire buffer to be payload
if mercury accounts for the BYTES_PER_XDR_UNIT padding.

Unit tests attempt to verify the expected payload length by manually
computing it and comparing it to what is received.  Update the tests
to handle BYTES_PER_XDR_UNIT padding when compiled with xdr enabled.
@soumagne
Copy link
Member

@chuckcranor thanks for the PR! I'm trying to test it now but I see that there might be something wrong in the XDR header inclusion for the tests (unrelated to your PR) so I need to fix that first it looks like...

@soumagne
Copy link
Member

Thanks I think everything looks good, just to let you know I've also just added automated testing of XDR in #774 and your patch passed on both macOS and ubuntu builds.

@soumagne soumagne merged commit 93244fa into mercury-hpc:master Dec 19, 2024
20 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Dec 19, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants