-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Map - imputing zeroes doesn't seem to impute enough zeroes #31
Comments
Looked closer at the But with stratification turned off, one would expect the histogram to return counts for zeroes and it doesn't. (I check this by setting the bin width to 0.1) |
i think i know why the histo doesnt impute zeroes when the overlay is turned off, and will think about how to fix that.. the missing time points for species ill have to investigate some more.. |
ive spent a bit of time looking at this and thinking about it in a bit more depth. one thing thats clear so far is that we need to find all combinations of values for all study specific vocab variables regardless of if the study specific vocab variable is in the final plot. i think what id propose to do next is the following:
|
im finding this curious.. isnt this basically fabricating collections? are we saying to introduce a collection where every sample for all species has a specimen count of 0? or am i misunderstanding? |
also, just to keep all my thoughts in one place, here are some imputing zero related questions i have. i may have already asked and had these answered last time i looked at imputing zeroes, but i dont remember.. sorry! but i think getting answers to them now while im looking at this again would be good.
|
It's not fabricating collections. A collection effort was undertaken in a specific location with certain device, and at a certain time. This information is provided to us by the, er, providers. It's just that there are zeroes in all columns (it often comes in wide format). |
How can I know the difference between that case and a collection not having happened? Dates don't have vocabularies. maybe I'm still not understanding.. |
Thanks for looking into our favourite topic 🤯
I was thinking about this last week too. I came to the conclusion that a) this could only apply to categorical variables, and b) all the other variables were effectively homogenous/constant/single-valued within a study. So the answer is yes, we could consider all variables the same way, but in practice it's really only going to be the species, sex and dev. stage that produce any relevant combinations.
How is this different from 1?
Ah, I see you've been thinking about that too... I genuinely can't think of an example continuous variable that could live alongside unbiased specimen count. I think that's because the count variable is signifying X identical copies of identical specimens. Let's say we were collecting mosquitoes and measuring species, sex and wing length as a continuous variable. The unbiased specimen count would have to be 1 for every record - because no two wing lengths are identical. If the wing length measurement was binned/categorical then it would make sense, just like species, sex and dev. stage do now. So in short, no I don't think we need to worry about this. |
This all sounds good. I don't know if the data service "knows" enough to do 3. but hopefully it does! If there's anything the client can do, let me know. |
Megastudy
Filters:
Markers: donut by species
Floater: timeline
Should be available in this saved analysis: https://vectorbase.org/vectorbase/app/workspace/maps/A4nuqcm/import
Two issues
I notice that there is a
__UNSELECTED__
series returned from the back end with all zero y values. The client didn't ask for this (it only sent threeoverlayValues
in the requests for markers and lineplot). I don't think these are the missing zeroes.The text was updated successfully, but these errors were encountered: