Map - imputing zeroes doesn't seem to impute enough zeroes #31

bobular · 2023-12-01T22:15:10Z

Megastudy

Filters:

PopBio ID = VBP0000844
Collection start date: 2022-05-01 to 2022-11-01
Provider name for collection site: Canyon Rim Church

Markers: donut by species

Floater: timeline

unbiased specimen count vs. collection start date
overlay with species
bin width = 1 day

Should be available in this saved analysis: https://vectorbase.org/vectorbase/app/workspace/maps/A4nuqcm/import

Two issues

There ought to points for each species for all timepoints - where there are currently no points, the points should be at zero
There exist collections with no samples at all for some time points: 2022-06-01, 2022-06-14, 2022-08-16, 2022-08-30 and 2022-09-27 - these are shown in the Collection dates floating histogram - these should also have zero points for all species

I notice that there is a __UNSELECTED__ series returned from the back end with all zero y values. The client didn't ask for this (it only sent three overlayValues in the requests for markers and lineplot). I don't think these are the missing zeroes.

The text was updated successfully, but these errors were encountered:

bobular · 2023-12-01T22:41:34Z

Looked closer at the __UNSELECTED__ stratum of the histogram (of unbiased specimen count) response. It certainly returns a count of zeroes (equal to the number of collections in the subset, it seem) but only in this __UNSELECTED__ stratum.

But with stratification turned off, one would expect the histogram to return counts for zeroes and it doesn't. (I check this by setting the bin width to 0.1)

d-callan · 2023-12-04T16:19:55Z

i think i know why the histo doesnt impute zeroes when the overlay is turned off, and will think about how to fix that.. the missing time points for species ill have to investigate some more..

d-callan · 2023-12-11T18:11:04Z

ive spent a bit of time looking at this and thinking about it in a bit more depth. one thing thats clear so far is that we need to find all combinations of values for all study specific vocab variables regardless of if the study specific vocab variable is in the final plot.

i think what id propose to do next is the following:

make it so the method that imputes zeroes is happy to receive columns that wont be in the resulting plot, so we can use them to find combinations to impute zeroes for (for the histogram issue)
add some unit tests for the R that include things like dates (to at least eliminate that as a source of the problem for the line plot)
modify the data service so that itll pass the study-specific vocab variables any time we see unbiased specimen count in a plot, regardless of the study vocab variables are also in the plot
revisit the missing points in line plot, once we know the issue ive identified is corrected

d-callan · 2023-12-11T20:15:00Z

There exist collections with no samples at all for some time points: 2022-06-01, 2022-06-14, 2022-08-16, 2022-08-30 and 2022-09-27 - these are shown in the Collection dates floating histogram - these should also have zero points for all species

im finding this curious.. isnt this basically fabricating collections? are we saying to introduce a collection where every sample for all species has a specimen count of 0? or am i misunderstanding?

d-callan · 2023-12-11T20:19:58Z

also, just to keep all my thoughts in one place, here are some imputing zero related questions i have. i may have already asked and had these answered last time i looked at imputing zeroes, but i dont remember.. sorry! but i think getting answers to them now while im looking at this again would be good.

are (or should) all the vars on the sample entity be considered to have study-specific vocabs? and if not, why not?
if there are variables which dont have study-specific vocabs, do we still want to include that variable in finding combinations to impute 0s for? i think maybe so, but i dont think we talked about it.
is it possible to have a continuous variable outside of unbiased specimen counts on the sample entity? what values would we impute 0s for if we did?

bobular · 2023-12-11T21:53:10Z

There exist collections with no samples at all for some time points: 2022-06-01, 2022-06-14, 2022-08-16, 2022-08-30 and 2022-09-27 - these are shown in the Collection dates floating histogram - these should also have zero points for all species

im finding this curious.. isnt this basically fabricating collections? are we saying to introduce a collection where every sample for all species has a specimen count of 0? or am i misunderstanding?

It's not fabricating collections. A collection effort was undertaken in a specific location with certain device, and at a certain time. This information is provided to us by the, er, providers. It's just that there are zeroes in all columns (it often comes in wide format).

d-callan · 2023-12-11T22:05:50Z

How can I know the difference between that case and a collection not having happened? Dates don't have vocabularies. maybe I'm still not understanding..

bobular · 2023-12-11T22:10:42Z

Thanks for looking into our favourite topic 🤯

also, just to keep all my thoughts in one place, here are some imputing zero related questions i have. i may have already asked and had these answered last time i looked at imputing zeroes, but i dont remember.. sorry! but i think getting answers to them now while im looking at this again would be good.
1. are (or should) all the vars on the sample entity be considered to have study-specific vocabs? and if not, why not?

I was thinking about this last week too. I came to the conclusion that a) this could only apply to categorical variables, and b) all the other variables were effectively homogenous/constant/single-valued within a study.

So the answer is yes, we could consider all variables the same way, but in practice it's really only going to be the species, sex and dev. stage that produce any relevant combinations.

2. if there are variables which dont have study-specific vocabs, do we still want to include that variable in finding combinations to impute 0s for? i think maybe so, but i dont think we talked about it.

How is this different from 1?

3. is it possible to have a continuous variable outside of unbiased specimen counts on the sample entity? what values would we impute 0s for if we did?

Ah, I see you've been thinking about that too...

I genuinely can't think of an example continuous variable that could live alongside unbiased specimen count. I think that's because the count variable is signifying X identical copies of identical specimens. Let's say we were collecting mosquitoes and measuring species, sex and wing length as a continuous variable. The unbiased specimen count would have to be 1 for every record - because no two wing lengths are identical. If the wing length measurement was binned/categorical then it would make sense, just like species, sex and dev. stage do now.

So in short, no I don't think we need to worry about this.

bobular · 2023-12-11T22:12:43Z

i think what id propose to do next is the following:

This all sounds good. I don't know if the data service "knows" enough to do 3. but hopefully it does! If there's anything the client can do, let me know.

d-callan self-assigned this Dec 4, 2023

d-callan transferred this issue from VEuPathDB/EdaNewIssues Dec 4, 2023

d-callan mentioned this issue Dec 20, 2023

Always request all study vocabs VEuPathDB/EdaDataService#342

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Map - imputing zeroes doesn't seem to impute enough zeroes #31

Map - imputing zeroes doesn't seem to impute enough zeroes #31

bobular commented Dec 1, 2023 •

edited

Loading

bobular commented Dec 1, 2023 •

edited

Loading

d-callan commented Dec 4, 2023

d-callan commented Dec 11, 2023 •

edited

Loading

d-callan commented Dec 11, 2023 •

edited

Loading

d-callan commented Dec 11, 2023 •

edited

Loading

bobular commented Dec 11, 2023

d-callan commented Dec 11, 2023

bobular commented Dec 11, 2023

bobular commented Dec 11, 2023

Map - imputing zeroes doesn't seem to impute enough zeroes #31

Map - imputing zeroes doesn't seem to impute enough zeroes #31

Comments

bobular commented Dec 1, 2023 • edited Loading

bobular commented Dec 1, 2023 • edited Loading

d-callan commented Dec 4, 2023

d-callan commented Dec 11, 2023 • edited Loading

d-callan commented Dec 11, 2023 • edited Loading

d-callan commented Dec 11, 2023 • edited Loading

bobular commented Dec 11, 2023

d-callan commented Dec 11, 2023

bobular commented Dec 11, 2023

bobular commented Dec 11, 2023

bobular commented Dec 1, 2023 •

edited

Loading

bobular commented Dec 1, 2023 •

edited

Loading

d-callan commented Dec 11, 2023 •

edited

Loading

d-callan commented Dec 11, 2023 •

edited

Loading

d-callan commented Dec 11, 2023 •

edited

Loading