-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for sex chromosomes to per-sample summary stats #626
Conversation
… they should be. Also make y_nonpar values missing for XX samples
…to jg/summary_stats_sex_chr
This is a great rewrite, it is so clear. I totally forgot about the pipeline class and how much that cleans the script up -- as does the the test arg handling. As far as data out looks, the y_non_par stats are different when comparing the all in one run, ( |
**{k: hl.sum([x[s] for s in v]) for k, v in sums.items()}, | ||
**{k: x[d1] - x[d2] for k, (d1, d2) in diffs.items()}, | ||
**{ | ||
k: divide_null(hl.float64(x[n]), x[d]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unsure why I'm seeing NaNs in the ratio fields instead of missing, e.g. r_ti_tv, for your test tables, considering this is in here. Not a big deal since we can see its the result of divide by zero but unexpected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you take a look again? I only see NaNs now in the agg Tables, not the count Tables
I still need to fix the main issue and figure out the NaN, but in the meantime, can you take a look at the rest of the changes? |
… that this also gets done on the combined HT
OK, I found the problem, it's a very simple fix. Just needed to move the setting of y_nonpar in XX samples to missing so that it's also done on the combined HT. Here are the new tests:No intermediate:Full pipeline with autosomes and sex chromosomes processed in one run:
Output:
Job ID: cf9d6f6773d0466789324c7512fea122 Full pipeline on autosomes only:
Output:
Job ID: 1d3c35cad8704f85be6927734d19e565 Full pipeline on sex chromosomes only:
Output:
Job ID: 41a2325bf5804cd2ac6277b9ae6be5d6 Combine autosomes and sex chromosomes stats and aggregate:
Output:
Job ID: b012882d71894e3ba4fd7aac02bf225d Using intermediate:Full pipeline with autosomes and sex chromosomes processed in one run:
Output:
Job ID: ed4da37139e04ba09b8eb34f163a0693 Full pipeline on autosomes only:
Output:
Job ID: b85117e5f62f434ca30480ccf3b3f94f Full pipeline on sex chromosomes only:
Output:
Job ID: 65d226b70bdd4ca6baa279e70dd6a841 Combine autosomes and sex chromosomes stats and aggregate:
Output:
Job ID: 00fe80d228bf4682a3fa0a0ba5b58065 Updated the notebooks:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested with the following:
Create filter group HT for all tests:
Command:
Output:
Job ID: 55c5bbe6a3ff4abdab4581edf44dcad4
No intermediate:
Full pipeline with autosomes and sex chromosomes processed in one run:
Command:
Output:
Job ID: 84e68a8a723045449d348e8dd131578c
Full pipeline on autosomes only:
Command:
Output:
Job ID: e9e5619a71154b4b9ced0ae5b6f04bce
Full pipeline on sex chromosomes only:
Command:
Output:
Job ID: a04667e986224188a5cbfd6fdf8e3740
Combine autosomes and sex chromosomes stats and aggregate:
Command:
Output:
Job ID: b7c2e9eaea7549128cb30af010807f52
Using intermediate:
Full pipeline with autosomes and sex chromosomes processed in one run:
Command:
Output:
Job ID: 3ec7d2b378b24fdaaec691d483293585
Full pipeline on autosomes only:
Command:
Output:
Job ID: 7284b9db36de4909be6bcb6be6b9cb71
Full pipeline on sex chromosomes only:
Command:
Output:
Job ID: dc1b64766e454f88897fe8d9b5dddc03
Combine autosomes and sex chromosomes stats and aggregate:
Command:
Output:
Job ID: d5e4fb6a5a9445bba52e6f16dcd38ff4