Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove outliers across per-contig VCFs #654

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

epiercehoffman
Copy link
Collaborator

Updates

New workflow to remove outlier samples.

  • Uses src/sv-pipeline/scripts/downstream_analysis_and_filtering/determine_svcount_outliers.R for plotting and outlier determination which only considers SV types with a median SVs per sample of at least 100
  • Takes per-contig VCFs as input
  • Only performs outlier determination based on autosomes
  • Can rerun with new inputs and settings to separately perform SV counting, outlier determination at different thresholds, and filtering without redoing previous steps
  • Includes bcftools preprocessing step to restrict SVs considered during outlier determination
  • Filters sample list
  • Can provide list of additional (ex. withdrawn) samples to exclude at the same time as outlier removal

Testing

Tested on 1kgp reference panel with different settings and inputs.

Marking as draft while development for Phase 2 is ongoing. Designed for Phase 2 usage so may need changes to be more generally applicable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant