Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset - Data for the paper "Phylogenomic position of eupelagonemids, abundant, and diverse deep-ocean heterotrophs" #97

Closed
25 of 27 tasks
hakai-it opened this issue Jul 17, 2024 · 33 comments
Assignees

Comments

@hakai-it
Copy link

hakai-it commented Jul 17, 2024

Data for the paper "Phylogenomic position of eupelagonemids, abundant, and diverse deep-ocean heterotrophs"

https://hakaiinstitute.github.io/hakai-metadata-entry-form/#/en/hakai/NSo1FkJnvIbQDOlYlwNfWGnAxx33/-O1xBAkNi0P8igzOb4rg

Best Practices Checklist

In General

  • No previous versions of this metadata record exist (eg for earlier versions of the data, if so update that record rather than creating a new one)

Data Identification

Dataset title:

  • No version information in the title
  • Frontloaded (with the most important information first)
  • Include the geographical region the data apply to
  • Short – aim for 60 characters including spaces
  • Does not include acronyms – put these in the keywords
  • Does not include the word “dataset”
  • Time series datasets should include “time series” at the end of the title

Abstract

  • Abbreviations have been expanded upon at first mention
  • Abstract describes how, when, what, where, why of data collection and is limited to no more than 500 words

DOI

  • A DOI has been drafted for this record
  • DOI has been updated via the form after review and changes to record
  • DOI has been manually edited on datacite fabrica
  • DOI status has been changed from Draft to Findable

Spatial

  • Ensure that Depth or Height Positive is correctly selected

Contact

  • ROR and ORCID(s) are included and linked properly where applicable
  • For datasets where DFO is a partner, ensure 'parent' ROR is added (https://ror.org/02qa1x782). DFO 'child' organizations (i.e. CHS) and their ROR are optional.
  • Include Hakai Institute as Publisher and include data@hakai.org as email
  • Make sure email address is provided if the role is 'Metadata Custodian' or 'Point of Contact'
  • Add contact affiliation where known including ROR
  • If resource is (partially) generated by Hakai researchers, include 'Tula Foundation' (with associated ROR) with 'Funder' role. Be sure to uncheck 'include in citation' for Tula Foundation.

Resources

  • Resource links go to specific dataset download (not generic platform like waterproperties.ca)
  • Readme, changelog, data dictionary, protocols included in data-package (for tabular text based data)
  • An archive folder, or other means, for older data versions is included in the data package if the version is not 1.0
  • Links work
  • All files in the data package can be opened and are not corrupt
  • No executable files in the data package. Files should be open formats and standards (.csv, .txt for example)
@JessyBarrette
Copy link
Contributor

JessyBarrette commented Jul 17, 2024

Here's couple thoughts:

Spatial

Is that really only for that unique location?
51.6505, -127.9516

If it is really the KC Buoy than the location isn't quite right

Contacts

Missing ORCIDS for

  • Chris Mackenzie
  • Patrick J. Keeling

@noricorino
Copy link

The sample came from KC10. I thought that was the GPS- do you know the correct coordinates? I have to fix it with the journal as well....

About ORCIDs, I confirmed with Chris that he doesn't have one. I assume it's the same with Patrick.

@noricorino
Copy link

Should I do anything with the title and the abstract as well?

@JessyBarrette
Copy link
Contributor

Since this a link to an already existing dataset on dryad I would stay consistent with what's already there for title and abstract

For the lat/long, these are the location from KC Buoy research dataset ERDDAP:

https://catalogue.hakai.org/erddap/tabledap/HakaiKCBuoyResearch.html

51.65N, -127.966E

It seems very similar to yours.

The map seems odd though on the record:

image

I think this is related to the Bounding box inputs which is different:

image

@JessyBarrette
Copy link
Contributor

I'm ok with the missing orcid. @CMack89 Would you want to create an orcid for you self here and share it with us? This will be good for you in the future to help link yourself with different data/science outputs

Just register here to generate one: https://orcid.org/

@JessyBarrette
Copy link
Contributor

Since the data package is already submitted and pretty big I won't look into it

@noricorino
Copy link

Yeah, the coordinate looks off on this map... Should I draw a bounding box instead? I think it's closer to the mouth of the channel. Google maps show this:

Screenshot 2024-07-17 at 12 16 07 PM

@timvdstap
Copy link
Collaborator

That's odd - what if you try the coordinates as they are in the ERDDAP: 51.65 and -127.966?

@JessyBarrette
Copy link
Contributor

yes I suspect it is the east/west limits of your bounding box that are wrong. Maybe try to replace them by -127.966

@noricorino
Copy link

I made it a polygon. What do you think?
Screenshot 2024-07-17 at 2 41 02 PM

@JessyBarrette
Copy link
Contributor

Much better!

@timvdstap
Copy link
Collaborator

timvdstap commented Jul 17, 2024

Great! When @CMack89 creates an ORCID I can review/finalize the information on DataCite and we can publish this record, if there's no further comments @JessyBarrette ? I am surprised that Patrick doesn't have an ORCID, but I've looked through some of his publications and I don't see it there either.

@noricorino
Copy link

Patrick established his reputation before ORCID, hence no need to have such, I guess?

@noricorino
Copy link

He has an instagram account if that counts instead...?

@timvdstap
Copy link
Collaborator

No I'm afraid not, but that's OK - we can publish this record to the Hakai Catalogue even without his ORCID.

@noricorino
Copy link

Patrick actually has a ORCID after all! I updated the info

@JessyBarrette
Copy link
Contributor

Perfect, yeah it's becoming more and more common has a lot of publishers are requiring them for the different authors. Think of it as the science social ID 😉

@timvdstap
Copy link
Collaborator

Thank you Noriko - when @CMack89 creates an ORCID I will confirm everything is OK in DataCite and then publish the record. :)

@CMack89
Copy link

CMack89 commented Jul 21, 2024 via email

@timvdstap
Copy link
Collaborator

timvdstap commented Jul 22, 2024

Thanks Chris! Noriko, I have updated Chris' ORCID in the record.

One final thought before we can publish - @JessyBarrette a Primary Resource is required, however the data is already - in its entirety - published to DRYAD. The description of the 'Primary Resource' in the intake form mentions: "Resources added here should not already have their own metadata record or digital object identifier, such resources should be added to the "Related Works" section." What do you think would be the best approach here? I'll create a ticket for discussion as well. https://github.com/HakaiInstitute/hakai-data/issues/187

@noricorino do you perhaps also have this data in a folder, package or repository somewhere on e.g. GitHub or Google Drive?

@timvdstap
Copy link
Collaborator

timvdstap commented Jul 23, 2024

@noricorino

In Dryad, Hakai is listed as a funder, but the ownership is attributed to UBC. Did we do more than fund this research (i.e. did we also collect, process, analyze the data)? In the metadata record, I see you listed Hakai Institute as a 'Data Owner' as well, so would it be reasonable to expect that Hakai should have a copy of this underlying data package in an institutional repo (preferably GitHub)? We want to make sure we appropriately represent the ownership and credit to the underlying credit, and where we can link external (meta)data back to records in the Hakai Catalogue so we can keep an accurate list of our data records.

@noricorino
Copy link

Hmm, Gordon deposited the data to Dryad. During the first half of the research, I was an official Hakai employee, and Hakai funded the sequencing, so I would say Hakai has some ownership of the data. I will talk to Gordon about how we can reflect it on the Dryad and the Hakai data catalogue.

@noricorino
Copy link

So I talked with Gordon, and we don't mind either Hakai or the UBC owning the data. The only technical issue, however, may be that the data submission was done through the UBC (they covered the submission cost). If it matters to Hakai, we can also submit the raw data (we have them in NCBI) with Hakai as the owner. What do you think?

@fostermh
Copy link
Collaborator

To me it seems like we would just create confusion for future generations by submitting the same data twice to NCBI under different organizations. If Hakai could be added to the affiliation list for the dryad record, and we clearly indicate Hakai's role in our own metadata that is probably sufficient.

I don't know much about how this data was collected or who was involved outside of the authors of the paper but I was assuming this was not a 100% UBC coordinated affair so it seemed odd to see Hakai listed as a funder with no other acknowledgment. Credit where credit is due and all that. If Hakai is not due credit in this case that is also fine.

@noricorino, if you and Gordon are happy that everyone's contribution has been accurately reflected in the various metadata records then that's great. If not then let's adjust them if possible. In the case of NCBI it sounds like it is not possible to adjust the metadata record, event if we wanted to, which is fine.

@noricorino
Copy link

noricorino commented Jul 26, 2024

The data deposited on Dryad is derivative of the sequence data on NCBI, but highly processed and intensively analyzed by Gordon. It's analogous to eggs and an omelette. Without eggs, you cannot make an omelette, yet eggs alone are not enough to make an omelette.

There's no easy option on the Dryad website to modify the ownership on our end. We're (well, technically he is... as it is not connected to my ORCID either) writing to them directly to make an amendment. It may take a while.

@fostermh
Copy link
Collaborator

I do like a good omelette. Sorry for making more work and thanks for keeping us in the loop.

@noricorino
Copy link

So we're trying to articulate the issue in the letter to Dryad. Could you tell me where we can see the ownership information? Is that "research facility" being UBC?

@fostermh
Copy link
Collaborator

If we can add hakai to the "research facility" list that would be great, assuming that is appropreit in this case. Also the affiliation list seems odd but perhaps that is because only Gordon has an orcid set. likely if this was set for all the authors then this would populate with their affiliations as well.

Screenshot 2024-07-26 at 2 29 16 PM

@noricorino
Copy link

noricorino commented Jul 26, 2024

Technically, my affiliation changed mid-research. While I posted both affiliation on the paper, currently there's no option to have two affiliation on Dryad. I can ask Gordon to change it Hakai, and send them a feature request instead.

@fostermh
Copy link
Collaborator

It sounds like it is more trouble than it is worth and I believe the standard practice is to credit the authors affiliation at the time of publishing anyway, which always seemed odd to me but there we are. Thanks for clarifying the affiliations bit.

@noricorino
Copy link

Gordon made a change on his end, and there's a reviewing process before they publish the change. I will keep you posted.

@timvdstap
Copy link
Collaborator

timvdstap commented Jul 29, 2024

Sounds great, thanks @noricorino and Gordon for all the effort being put into this! So because I guess Hakai is also a data owner, this is currently accurately reflected in the submitted Hakai metadata record. This leaves the following to do items:

  • TO DO (Tim): Download the processed data (package) from Dryad.
  • TO DO (Tim): Create a Hakai GitHub repository and add data files there. @fostermh I haven't worked with Git LFS before, and it seems we might need to use that here -- is there an approach you recommend (I haven't used it before)?
  • TO DO (Noriko): Brief review of GitHub repository content
  • TO DO (Tim): Publish a GitHub Release of the data package
  • TO DO (Tim): Update the Hakai Catalogue metadata record (Primary Resource = release), Related Works = Dryad ('Is Identical To')
  • TO DO (Tim): Update, verify and finalize record in DataCite
  • TO DO (Tim): Publish metadata record to the Hakai Catalogue

@timvdstap
Copy link
Collaborator

timvdstap commented Aug 2, 2024

After some consideration what's likely the best solution (striking a balance between feasibility/usefulness), is putting a link to the Dryad URL as Primary Resource and the DOI to that resource in the Related Works. I will do that @noricorino While essentially the record will then link to the same resource twice but it'll help identify Dryad as the external data holding programmatically later if needed.

In light of this, I have updated, verified and finalized the record in DataCite as well, and published the metadata record to the Hakai Catalogue. You should see it there shortly :) If there's any changes that you'd like us to make on our end Noriko, let me know! For now, I'll close this issue -- thanks for all your amazing work Noriko!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants