-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NTR: abundance of sequence reads [BODCNVS-2107] #251
Comments
@pieterprovoost Thanks Pieter. A few quick comments as I am about to go on leave. The Vocab team will pick this up and if urgent will discuss with you how best to take this forward. Here are some of my thoughts. |
Hi @pieterprovoost have you had a chance to look into this? Many thanks! |
I was just myself looking for a term for this term, so I support this request. When I asked about what to call it, indeed our experts suggested "number of reads" rather than "abundance". |
Thanks @kmexter. Joanna has also forwarded the ticket to a couple of colleagues. We will wait a week or two to hear from them. As a quick summary our suggestion is currently to have 3 new P01 modelled as (see long comment above for details): with: If we go for this we will need to decide whether we need to remodel and align our legacy "Relative abundance of amplifiable DNA sequences" codes. |
I just realized my description is not very clear. I agree that count is better, and in fact my intention for this term was the number reads from sequencing which is not the same as the total number of molecules in the PCR product of course. But we may also need a term for the number of copies in the sample from qPCR, and will create a new ticket for that if necessary. I'll check with @SSuominen1 to get a clearer description for this one. |
Indeed having a term for the number of reads and the total number of reads will be very useful.
So for example: |
Thank you @pieterprovoost and @LynnDelgat - It looks like we all agree on "Count". The risk with over simplifying the P01 description is that it may become ambiguous when taken out of context but I don't think it would apply here so yes, we can remove "by PCR". Regarding creating separate codes for ASV and OTU the main reason would be the possibility of them coexisting as two separate variables in a dataset. If this is not a option then okay to have them as "specified elsewhere". |
One question @LynnDelgat from somebody not used to this type of data: how would you specify elsewhere ASV/OTU sequences? Would that be specified in the Occurrence record of the DwCA format? how would people know that the count in the measurementValue field refers to an ASV or an OTU? |
@gwemon The ASV/OTU sequences would be specified in the DNA extension (in the DNA_sequence field). They are linked 1:1 with occurrences, so occurrenceID should allow people to know which specific OTU/ASV (=> DNA_sequence) the count is for. |
@LynnDelgat So until there is a field in the DwC format that caters for this information, this is an information that can only be extracted by knowledgeable human users if I understand correctly. if the plan is to create such a field to store this information then that's okay. Thank you. |
@LynnDelgat having read a bit more about ASVs and OTUs I agree with you about not distinguishing in the P01 code. |
Describe the parameter code you need
Metabarcoding datasets usually come with relative abundances in terms of sequence reads per ASV or OTU. This is a request for terms for:
What would be its expected units of measure or its vector dimensions
Dimensionless
What property kind could be used to define the type of measurement this relates to
Abundance
What is the primary object of interest (i.e. a chemical, biological or physical entity)
Sequence reads
Would we need to define a matrix? If so what should the matrix be?
Possibly PCR product.
The text was updated successfully, but these errors were encountered: