Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

R Date Conversion in Dataset-JSON #15

Closed
cpiraux opened this issue Aug 28, 2023 · 5 comments
Closed

R Date Conversion in Dataset-JSON #15

cpiraux opened this issue Aug 28, 2023 · 5 comments

Comments

@cpiraux
Copy link

cpiraux commented Aug 28, 2023

I recently worked on the latest version of the xportr R package, which is designed to convert R datasets to xpt format. During this process, we encountered difficulties converting R dates to SAS date format. (Dates are not correctly derived (SAS vs R) · Issue #142). We might encounter the same issues when converting to Dataset-JSON.

The challenges we faced with the conversion process included:

  • Discrepancies in timestamps: Date value = 0 for 01 JAN 1960 in SAS and 01 JAN 1970 in R.
  • Variables with the R data type 'integer' are not recognized as dates in the conversion, the variable type need to be ‘date’.
  • R does not have display format for numeric variable like in SAS.

As a result of the R to SAS conversion, we encountered issues such as date variables appearing 10 years earlier than expected and variables not being recognized as dates.

Currently in the dataset-JSON specification, it is not clear for me that we should follow the SAS timestamp for numeric date variables. I also think if we don’t know that variable ending with --DT, --DTM and --TM are date(time) numeric variables and if “displayFormat” is missing (R does not have date format, in xportr package we add the ones from SAS), it is difficult to know that the variable is a numeric date.

image

I came across an informative webpage on JSON date formats that I found interesting: JSON date format: 3 ways to work with dates in JSON. Considering the issues we faced, I suggest utilizing the ISO 8601 Date string format for handling dates in dataset-JSON with a "type" attribute with values like 'numeric date'. It would be easier to convert dates from any programming language and reduce the risk of errors.

I would appreciate your thoughts on numeric date in Dataset-JSON, and how it would be possible to facilitate the conversion of dates between R and Dataset-JSON?

@DmitryMK
Copy link
Collaborator

Hi Celine, these are really good question.

There is a general goal to make Dataset-JSON a universal format and one of the requirements for it is to make it agnostic to a programming languages, so that in can be used for SAS/R/Python or any other programming language. Although I admit that displayFormat can be seen as SAS legacy.

I see your point and logic in using ISO 8601 Date string format for numeric dates and it still follows the "language agnostic" rule, but I'm not sure we want to change it at this stage. And I see it more a metadata task, rather than data-representation task,

One of the goals was to be able to review the data without Define-XML, which is why we have labels/types and other "metadata" attributes in it. But it was never an intention to have absolutely everything in Dataset-JSON. We intentionally did not add the date (which corresponds to character date) type from Define-XML, because Dataset-JSON is focused on data and not metadata and it is not intended to replace Define-XML.

Per ADaM-IG numeric date variables should end with --DT/--DTM/--TM, so I would expect that for the vast majority of cases it should be followed. And if we want to have a numeric date variable with other suffix, I would assume that it should be metadata (define.xml) where this information will be specified. As a matter of fact, there is no way it can be specified in Define-XML now, so it would be a good comment for the future Define-XML version.

Regarding epoch it is actually an interesting fact, which I never though about before. Looking into CDISC standards, I could not find it specified anywhere what should be used as an epoch for numeric dates. De facto the industry is using 1/1/1960, because mostly industry used SAS for the submissions, I believe that this should be specified somewhere in the standards or we can have a metadata field explaining what is used as an epoch for numeric dates in this submission.

I would suggest, that by default the following assumption can be used when reading Dataset-JSON:

  • If variable ends with --DT, it is a number of days since 1/1/1960
  • If variable ends with --DTM, it is a number of seconds since 1/1/1960
  • If variable ends with --TM, it is a number of seconds

I will raise this topic within the ODM team, to see if there are other thought about it.

@mstackhouse
Copy link

@cpiraux in the current version of the datasetjson R package we handle this conversion by inferring dates and datetimes from the display format, and then handle the conversion. The code we used for this is here: https://github.com/atorus-research/datasetjson/blob/3e7f4bb17ad0215b72fa4c87bc28017977499dbe/R/date_time_conversions.R#L43

It was discussed at PHUSE CSS that it's likely best to update the standard to something unambiguous, such as character ISO8601 dates/datetimes.

@cpiraux
Copy link
Author

cpiraux commented Oct 20, 2023

Thank you, @mstackhouse , for sharing the link to the dataset JSON package. I'm considering trying out the package for the Dataset-JSON Pilot.

I'm pleased to hear that date conversion has been a hot topic of discussion at the PHUSE CSS and that the standard will be updated to something more unambiguous, such as character ISO8601 dates/datetimes.

With these improvements in mind, it seems we can consider closing the issue.

@lexjansen
Copy link
Collaborator

@cpiraux Since we are addressing this issue in Dataset-JSON v1.1, I suggest closing this issue.

@cpiraux
Copy link
Author

cpiraux commented Aug 6, 2024

Yes, sure. Thank you for addressing this issue. I will go ahead and close it now.

@cpiraux cpiraux closed this as completed Aug 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants