Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset - MusselSeg: Semantic Segmentation for Rocky Intertidal Mussel Habitat #104

Closed
21 of 27 tasks
hakai-it opened this issue Jul 18, 2024 · 18 comments
Closed
21 of 27 tasks
Assignees

Comments

@hakai-it
Copy link

hakai-it commented Jul 18, 2024

MusselSeg: Semantic Segmentation Dataset for Rocky Intertidal Mussel Habitat

https://hakaiinstitute.github.io/hakai-metadata-entry-form/#/en/hakai/DCH8GzsQKSM8xwJ8WdQFIZrtCaq2/-O26fyGiVXoF__M22UrP

Best Practices Checklist

In General

  • No previous versions of this metadata record exist (eg for earlier versions of the data, if so update that record rather than creating a new one)

Data Identification

Dataset title:

  • No version information in the title
  • Frontloaded (with the most important information first)
  • Include the geographical region the data apply to
  • Short – aim for 60 characters including spaces
  • Does not include acronyms – put these in the keywords
  • Does not include the word “dataset”
  • Time series datasets should include “time series” at the end of the title

Abstract

  • Abbreviations have been expanded upon at first mention
  • Abstract describes how, when, what, where, why of data collection and is limited to no more than 500 words

DOI

  • A DOI has been drafted for this record
  • DOI has been updated via the form after review and changes to record
  • DOI has been manually edited on datacite fabrica
  • DOI status has been changed from Draft to Findable

Spatial

  • Ensure that Depth or Height Positive is correctly selected

Contact

  • ROR and ORCID(s) are included and linked properly where applicable
  • For datasets where DFO is a partner, ensure 'parent' ROR is added (https://ror.org/02qa1x782). DFO 'child' organizations (i.e. CHS) and their ROR are optional.
  • Include Hakai Institute as Publisher and include data@hakai.org as email
  • Make sure email address is provided if the role is 'Metadata Custodian' or 'Point of Contact'
  • Add contact affiliation where known including ROR
  • If resource is (partially) generated by Hakai researchers, include 'Tula Foundation' (with associated ROR) with 'Funder' role. Be sure to uncheck 'include in citation' for Tula Foundation.

Resources

  • Resource links go to specific dataset download (not generic platform like waterproperties.ca)
  • Readme, changelog, data dictionary, protocols included in data-package (for tabular text based data)
  • An archive folder, or other means, for older data versions is included in the data package if the version is not 1.0
  • Links work
  • All files in the data package can be opened and are not corrupt
  • No executable files in the data package. Files should be open formats and standards (.csv, .txt for example)
@JessyBarrette
Copy link
Contributor

JessyBarrette commented Jul 22, 2024

@willhakai Thanks for submitting a metadata record for the dataset:

MusselSeg: Semantic Segmentation Dataset for Rocky Intertidal Mussel Habitat

Couple thoughts:

  • Would you mind adding your email address to your contact since your the metadata custodian
  • Can we add Hakai Institute as data owner
  • Can we add Tula as funder
  • I would perhaps add a rough region to the title like:
    MusselSeg: Semantic Segmentation Dataset for North East Pacific Rocky Intertidal Mussel Habitat
  • Should we generate a DOI for the dataset?
  • Is it expected the that huggingface dataset isn't accessible yet?

@timvdstap
Copy link
Collaborator

I think @tayden is going to add you to the Hakai HuggingFace account Jessy :) - also, I think that #105 is this records' duplicate?

@tayden
Copy link

tayden commented Jul 22, 2024

The dataset is located here: https://huggingface.co/datasets/HakaiInstitute/mussel-seg-1024-1024
It's currently private, so you'll have to make an account and I can then add you to see it. It'll be made public once it gets the rubber stamp of approval.

@willhakai
Copy link

@tayden is creating a DOI on HuggingFace

Other changes made, thanks!

@tayden
Copy link

tayden commented Jul 22, 2024

@tayden is creating a DOI on HuggingFace

Other changes made, thanks!

Yes I will do this. I can't do it until the dataset is public, so I will then. The HF DOIs will track the dataset versions automatically, so it'll be easier to do there rather than with an external service

@fostermh
Copy link
Collaborator

hmmm for the DOI are we talking about one that points to the data resource or the metadata record. The normal way we do this is to generate a DOI that points to the metadata record so that we can change the location of data storage if needed. There is a handy button in the form for just this. Or does HF require a doi that points to their site? if so we can do both.

@tayden
Copy link

tayden commented Jul 22, 2024

HF has one that points to the dataset, on HF. The main advantage to the HF one is that you can track a specific dataset revision.

That being said, there's no reason I couldn't reference a DOI that links to the metadata record instead, but we'd lose the automatic revision tracking.

@fostermh
Copy link
Collaborator

ok, so generating both in this case would make sense then. one on HF that points to their page, another on the form that points to our page, and we indicate that the HF one is identical to the hakai one, which we can do by adding the HF doi under 'related works'.

I realize that from a technical perspective non of this is needed and we could just use the HF doi. The issue being addressed here is one of appropriately representing ownership and credit. Wherever possible we want to link external data/metadata back to hakai records so that we can have an accurate list of hakai data holdings.

Anyway, the short version is Taylor you should carry on generating a doi on HF and link it to HF then add it to the metadata record under related works.

Thank you.

@tayden
Copy link

tayden commented Jul 22, 2024

Thanks @fostermh!

@tayden
Copy link

tayden commented Jul 22, 2024

  • Does not include the word “dataset”

I've updated this and removed "dataset"

@tayden tayden changed the title Dataset - MusselSeg: Semantic Segmentation Dataset for Rocky Intertidal Mussel Habitat Dataset - MusselSeg: Semantic Segmentation for Rocky Intertidal Mussel Habitat Jul 22, 2024
@JessyBarrette
Copy link
Contributor

Ok I cleaned up the contacts for hakai/tula to not have duplicated names.

We can certainly add a DOI to this specific repo, you can then still generate a DOI on hugginface and reference your version specific DOI for any publications. We can ourself tracks related DOIs external to the organization.

@timvdstap
Copy link
Collaborator

Hey @tayden ! Just trying to get a sense of what the current status of this record is :) Has a DOI been minted by HF for this data? As Matt indicated, the HF DOI can point to the data on their page, we can mint a DOI through the metadata form that points to the landing page in the Hakai Catalogue. Other than that I can also a review of the metadata record if you wish.

@tayden
Copy link

tayden commented Oct 16, 2024

Hi @timvdstap, yes there's a DOI on HF and the data is available there. If you have any feedback on the metadata, I'd be happy to update it. I had @willhakai take a look and help when creating it.

That data/metadata is here: https://huggingface.co/datasets/HakaiInstitute/mussel-seg-1024-1024

@timvdstap
Copy link
Collaborator

Hey @tayden this record was mentioned briefly yesterday during the talks and it made me realize that it's still in the submission phase! Some thoughts as I'm looking at this record:

  • I cleaned up the citation a bit - I don't think you necessarily wanted Hakai Institute and Tula Foundation listed as authors in the citation, but they should be represented in the metadata record.
  • On HF, I see Karah Ammann and Nathaniel Fletcher listed in the citation as well, should they be included in the metadata record?
  • Are you OK with adding your email address to the metadata record, given that you're an author?
  • I've changed the title and description of the primary resource slightly because otherwise this would be an exact copy of the related work which might be confusing for people. Let me know what you think.
  • @fostermh just to confirm - given that a record needs to have a primary resource, in this case, should that be "https://huggingface.co/datasets/HakaiInstitute/mussel-seg-1024-1024", with the associated HuggingFace DOI (https://doi.org/10.57967/hf/2762) listed as Related Work (relation type: 'Is Identical To')?

@tayden
Copy link

tayden commented Nov 22, 2024

  • I cleaned up the citation a bit - I don't think you necessarily wanted Hakai Institute and Tula Foundation listed as authors in the citation, but they should be represented in the metadata record.

👍

  • On HF, I see Karah Ammann and Nathaniel Fletcher listed in the citation as well, should they be included in the metadata record?

We received some imagery from these two, it might be a good idea/nice gesture to add them to the metadata record as well. They're California-based and non-Hakai affiliated. I'll leave it up to you to decide whether or not it's appropriate to include them.

  • Are you OK with adding your email address to the metadata record, given that you're an author?

That's fine with me, I've updated the record and added my ORCID as well.

  • I've changed the title and description of the primary resource slightly because otherwise this would be an exact copy of the
    related work which might be confusing for people. Let me know what you think.

Makes sense. What you've written looks good to me.

@timvdstap
Copy link
Collaborator

timvdstap commented Nov 22, 2024

  • On HF, I see Karah Ammann and Nathaniel Fletcher listed in the citation as well, should they be included in the metadata record?

We received some imagery from these two, it might be a good idea/nice gesture to add them to the metadata record as well. They're California-based and non-Hakai affiliated. I'll leave it up to you to decide whether or not it's appropriate to include them.

As much as I appreciate that, it should be either yourself or @willhakai that makes the choice of adding them into the citation yes or no. Personally, given that they're included in the citation information on HF, it would make sense.

@willhakai
Copy link

willhakai commented Nov 22, 2024 via email

@timvdstap
Copy link
Collaborator

Excellent, made the updates also as per our convo - will publish now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants