Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset - High performance liquid chromatography (HPLC) phytoplankton pigment timeseries for the northern Salish Sea and central coast, British Columbia #106

Closed
23 of 27 tasks
hakai-it opened this issue Jul 19, 2024 · 38 comments
Assignees

Comments

@hakai-it
Copy link

hakai-it commented Jul 19, 2024

High performance liquid chromatography (HPLC) phytoplankton pigment timeseries for the northern Salish Sea and central coast, British Columbia

https://hakaiinstitute.github.io/hakai-metadata-entry-form/#/en/hakai/7U7b8oPpeTN6gjvXlUCTGJr5pga2/-O25qzW3OpioeACGpRHl

Best Practices Checklist

In General

  • No previous versions of this metadata record exist (eg for earlier versions of the data, if so update that record rather than creating a new one)

Data Identification

Dataset title:

  • No version information in the title
  • Frontloaded (with the most important information first)
  • Include the geographical region the data apply to
  • Short – aim for 60 characters including spaces
  • Does not include acronyms – put these in the keywords
  • Does not include the word “dataset”
  • Time series datasets should include “time series” at the end of the title

Abstract

  • Abbreviations have been expanded upon at first mention
  • Abstract describes how, when, what, where, why of data collection and is limited to no more than 500 words

DOI

Spatial

  • Ensure that Depth or Height Positive is correctly selected

Contact

  • ROR and ORCID(s) are included and linked properly where applicable
  • For datasets where DFO is a partner, ensure 'parent' ROR is added (https://ror.org/02qa1x782). DFO 'child' organizations (i.e. CHS) and their ROR are optional.
  • Include Hakai Institute as Publisher and include data@hakai.org as email
  • Make sure email address is provided if the role is 'Metadata Custodian' or 'Point of Contact'
  • Add contact affiliation where known including ROR
  • If resource is (partially) generated by Hakai researchers, include 'Tula Foundation' (with associated ROR) with 'Funder' role. Be sure to uncheck 'include in citation' for Tula Foundation.

Resources

  • Resource links go to specific dataset download (not generic platform like waterproperties.ca)
  • Readme, changelog, data dictionary, protocols included in data-package (for tabular text based data)
  • An archive folder, or other means, for older data versions is included in the data package if the version is not 1.0
  • Links work
  • All files in the data package can be opened and are not corrupt
  • No executable files in the data package. Files should be open formats and standards (.csv, .txt for example)
@timvdstap
Copy link
Collaborator

Thanks for this submission @jdelbel - I will have a look at it on Monday. As discussed, until the data is available on the Hakai ERDDAP Server, you could indeed link to the Hakai Data Portal and indicate that your data is available upon request (unless @JessyBarrette you have any other suggestions?) Alternatively, you could create a private GitHub repository with the Hakai-dataset-template and store your dataset on there with the relevant documentation, and link to that repo.

@JessyBarrette
Copy link
Contributor

The ERDDAP dataset is already available for this dataset and is available on the development erddap here:
https://goose.hakai.org/erddap/tabledap/HakaiHPLCResearch.html

@JessyBarrette
Copy link
Contributor

Related to HakaiInstitute/hakai-erddap#151

@JessyBarrette
Copy link
Contributor

The ERDDAP dataset is ready just need to sync the metadata and push it to production erddap.

@JessyBarrette
Copy link
Contributor

JessyBarrette commented Jul 22, 2024

@jdelbel some comments:

  • I would omit the HPLC acronym in the title
  • From abstract "Data are currently not available on a repository and are available upon request." We will add an erddap for it, so I would omit that.
  • I added hplc as keyword
  • I added the future erddap dataset and moved it to the top since this should be main way for external users to access the data. https://catalogue.hakai.org/erddap/tabledap/HakaiHPLCResearch.html

@jdelbel
Copy link

jdelbel commented Jul 22, 2024

It wouldn't let me click the check boxes.

I made the requested changes to the title and abstract. Your changes look good.

This link https://catalogue.hakai.org/erddap/tabledap/HakaiHPLCResearch.html is coming up as not found.

@JessyBarrette
Copy link
Contributor

JessyBarrette commented Jul 22, 2024 via email

@jdelbel
Copy link

jdelbel commented Jul 22, 2024

Nice, yeah I realized that after I sent the message.

Thanks Jessy.

@timvdstap
Copy link
Collaborator

timvdstap commented Jul 30, 2024

It's looking good @jdelbel - few minor things:

  • Is this record in any way related to this record, or is that simply an older version that can be deleted?
  • In the title I see northern Salish Sea, in the abstract Northern Salish Sea (NSS). Minor thing, but do you also want to capitalize in the title?
  • Is it possible to add the phytoplankton species groupings under Taxonomic Classification?
  • I have added Hakai Institute as Publisher (you had it as Publisher, but not selected to be included in the citation because for the same contact info you had also selected Hakai as Data Owner -- the workaround is listing Hakai Institute twice in the contacts).
  • I've included Drew's email because they're listed as a Point of Contact.
  • Now that a ERDDAP dataset is available (though not yet pushed to production), I would remove the link to the Hakai Data Portal as Primary Resource. Unless there's a reason the ERDDAP dataset won't be pushed to production for a while?

@JessyBarrette
Copy link
Contributor

related to HakaiInstitute/hakai-erddap#196

@jdelbel
Copy link

jdelbel commented Jul 31, 2024

Thanks @timvdstap

  • Yes, that was largely uncomplete record I had forgotten about - deleted
  • Good catch. I actually took the capital away from northern in the abstract - I think that might be more correct, but can do the opposite if you think not.
  • Interesting question. HPLC represents pigment concentrations and is not directly taxonomic. I basically put these data into a statistical model to estimate group level biomass. As such, I don't think the HPLC data should have taxonomic information. Thoughts?
  • Thanks!
  • Thanks!
  • I removed the portal. I don't think so, more just which data will be pushed and when as quite a bit of it is being used in research projects.

@jdelbel
Copy link

jdelbel commented Jul 31, 2024

Why are the click boxes so glitchy - it won't let check yours and it gives me a huge sense of satisfaction to tick things off, haha.

@timvdstap
Copy link
Collaborator

timvdstap commented Aug 1, 2024

  • Good catch. I actually took the capital away from northern in the abstract - I think that might be more correct, but can do the opposite if you think not.

As far as I'm concerned it's all good, it was more the consistency that I was thinking about.

  • Interesting question. HPLC represents pigment concentrations and is not directly taxonomic. I basically put these data into a statistical model to estimate group level biomass. As such, I don't think the HPLC data should have taxonomic information. Thoughts?

That makes sense to me!

  • I removed the portal. I don't think so, more just which data will be pushed and when as quite a bit of it is being used in research projects.

I still saw the portal listed as Primary Resource, but I removed it now. Where possible we try to make sure resource links go to specific datasets, rather than 'generic' platforms or portals.

Once https://catalogue.hakai.org/erddap/tabledap/HakaiHPLCResearch.html is pushed to production and 'live', please add the publication date to the record. I can then publish the record to the Hakai Catalogue and make sure that the DOI is 'Findable' as well. :)

@timvdstap
Copy link
Collaborator

Hey @jdelbel hope you had a great vacation! Just tagging you here to put it on your radar :-)

@jdelbel
Copy link

jdelbel commented Sep 10, 2024

Thanks @timvdstap. I saw an update from Jessy on August 26th showing that it was pushed to production. Can you confirm if that is correct date to add to the record. Oddly, I no longer see the record within my account on the intake form, which looks like it has changed since my last sign on. Where can I access the form to make the change?

Also, currently no data is being pushed to erddap. I can't remember what filters Jessy had applied for deciding which data should be pushed. Can we review this and then I can make the changes so data become available?

@fostermh
Copy link
Collaborator

@jdelbel
Copy link

jdelbel commented Sep 10, 2024

@fostermh Thanks :). Weird my old link didn't go that.

@timvdstap
Copy link
Collaborator

Also, currently no data is being pushed to erddap. I can't remember what filters Jessy had applied for deciding which data should be pushed. Can we review this and then I can make the changes so data become available?

Hey @jdelbel I'm not sure what filters Jessy was applying. It seems like the most recent data is from May 28, 2024, is that accurate?

@jdelbel
Copy link

jdelbel commented Oct 4, 2024

@timvdstap Ok, the publication date has been added.

May 28th, 2024 is the last data we received from the analysis lab. We send out 2x shipments per year - one in June and the other in December. We get results back from the second shipment in Feb of the following year. I need time to QA/QC the data requiring other phyto data-types. Thus, I think we should publish to the end of the prior year. That would mean we could now publish to the end of 2023. Also, I think the all_chl_a_flag should be AV and the quality level "Principal Investigator". I need to update these, but can do so quickly.

How does that sound?

@timvdstap
Copy link
Collaborator

I'm looping in @fostermh here, because I'm not sure how to publish data only to 2023-12-31, as opposed to it currently being 2024-05-28. I think I'm confused about the difference between 'published' and 'pushed to production': So if I understand correctly, the HakaiHPLCResearch data on ERDDAP, that includes the data till May 2024 is 'pushed to production' but that doesn't mean that it's published/openly available through our ERDDAP server? It's like an interim version (available only to Hakai?), and from this production we should select the data till 2023-12-31 to publish openly?

@jdelbel
Copy link

jdelbel commented Oct 4, 2024

In theory, I could control what is openly available through the flagging and quality level if those are included as filters. This was the approach that Jessy was taking, but not sure how this was done on the backend.

@fostermh
Copy link
Collaborator

fostermh commented Oct 7, 2024

Currently the hplc output from out database that comprises the relevant ERDDAP dataset is limited to anything with results.

https://github.com/HakaiInstitute/hakai-erddap/blob/0ce897a5f1fcf9c2933c7042cc7f5c6e4c9baf64/views/HakaiHPLCSampleResearch.sql#L4

If you would like, I can easily change that to be anything with a Quality Level of 'Principal Investigator' which would imply that @jdelbel you had reviewed and signed off on it.

@timvdstap
Copy link
Collaborator

timvdstap commented Oct 10, 2024

Admittedly I'm not fully sure what the next steps are for this record, so please correct me if where I'm wrong:

@jdelbel
Copy link

jdelbel commented Oct 10, 2024

Yes, I agree with the PI criteria threshold for publication. Good on my end to make that change.

@timvdstap "push the public data to ERDDAP" - Does this mean actually adding quality level "PI" so the data are pushed?

Otherwise, looks good to me and will work on the update and final review.

@timvdstap
Copy link
Collaborator

@timvdstap "push the public data to ERDDAP" - Does this mean actually adding quality level "PI" so the data are pushed?

That's a good question, I'm actually not sure what the process of publishing data to ERDDAP looks like at a technical level, likely @fostermh will know. My guess is indeed that if you add PI as quality level, that would satisfy the established criteria and it would automatically get pushed to the Hakai ERDDAP Server (though not sure of the frequency).

@jdelbel
Copy link

jdelbel commented Oct 10, 2024

Ok, that will be good to know as I am unclear as well.

@fostermh
Copy link
Collaborator

the HPLC view has been updated to require a quality level of 'Principal Investigator' or 'Technicianmr'. I also limited the output to rows flagged as 'AV'

    x.organization = 'HAKAI'
    AND x.row_flag = 'Results'
    AND quality_level IN ('Principal Investigator', 'Technicianmr')
    AND x.hplc.all_chl_a_flag IN ('AV')

The erddap dataset will refresh nightly.

@timvdstap
Copy link
Collaborator

Sounds good, thanks for doing that @fostermh! Will I need to update the link in the metadata record or is https://catalogue.hakai.org/erddap/tabledap/HakaiHPLCResearch.html accurate still?

@fostermh
Copy link
Collaborator

same link as before. no change is needed.

@timvdstap
Copy link
Collaborator

timvdstap commented Oct 17, 2024

@jdelbel The https://catalogue.hakai.org/erddap/tabledap/HakaiHPLCResearch.html is now live, showing anything that is flagged as 'AV' and has a quality level of PI or Technicianmr. If it looks all good to you I will go ahead and publish the record to the Hakai Catalogue and make the DOI findable in DataCite.

edit: It seems like a lot of quality_level is set to Technicianm, and not Technicianmr, and as a result it currently only shows data from 2015-2017. Could you make that change to Technicianmr @jdelbel ?

@jdelbel
Copy link

jdelbel commented Oct 23, 2024

@timvdstap I reviewed the metadata record and made a few small changes, but looks good now.

Yes, I can start re-flagging data this afternoon.

I found something I missed - from 2015 through 2016, DFO-IOS analyzed QU39 surface samples for us (0m) using a different analysis method. Do you think this would go into a different record or could I just make a small addition to the description of the existing record so that these data can be included. The different analysis method means that there are some different pigments that were measured.

@jdelbel
Copy link

jdelbel commented Oct 23, 2024

@fostermh Tim and I discussed and we think the DFO analyzed data should be a different record as it was done with a different analysis method (see above). Can you add an additional filter to the current HPLC pipeline so that it only pushes "analyzing_lab" == "USC"? It's nice we have that column to help us differentiate.

@fostermh
Copy link
Collaborator

sure, done.

@timvdstap
Copy link
Collaborator

@jdelbel Just confirming that the metadata also reflects the fact that 2015/2016 DFO-IOS data is not included?

@jdelbel
Copy link

jdelbel commented Oct 24, 2024

@timvdstap is this necessary? The DFO-IOS data is from 0m depth and the metadata description says the data are 5m depth from 2015-2018 and then 0,5,10,20m from 2019 onwards. Easy to add though - could be nice to link it somehow saying there is a complementary 0m data set available here...

@timvdstap
Copy link
Collaborator

Sorry I phrased that poorly, I meant to say that if the metadata previously made specific mention of the fact that it includes DFO-IOS data from 2015/2016, that we should make sure that's revised now that the data is omitted.

I agree that it would be nice to link to the complementary 0m dataset in the future, but that shouldn't be a barrier to publishing this record.

Let me know when you've finished the reflagging and we can publish the record :)

@jdelbel
Copy link

jdelbel commented Dec 10, 2024

I've re-flagged pretty much everything from the oceanography core stations until the end of 2022. There are a few points prior to 2022 that need some historical metadata QC, which the technicians are working on, but it is a marginal number. I think there is enough now to publish the record. Let me know if that works @timvdstap.

@timvdstap
Copy link
Collaborator

Nice work @jdelbel ! I've published the record, should be on the catalogue shortly :) will re-open this issue in case there's anything wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants