Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gnomAD update #183

Open
Madelinehazel opened this issue Jan 22, 2024 · 0 comments
Open

gnomAD update #183

Madelinehazel opened this issue Jan 22, 2024 · 0 comments
Assignees

Comments

@Madelinehazel
Copy link
Contributor

Madelinehazel commented Jan 22, 2024

Update gnomAD to version 4 in the crg2-hg38 branch.
Include gnomAD_faf95_popmax column.

Look into whether or not there is a GRCh37 version that we can use to update the GRCh37 pipeline.

Please see this document for a summary of a previous gnomAD update. And this associated pull request..

NOTE that you will need to be in branch crg2-hg38, not master, to run the hg38 crg2 pipeline! And for cre, switch to branch hg38 for report generation.

gnomAD is a database of exomes and genomes from (mostly) healthy individuals. We use gnomAD as a control cohort; a variant with a population allele frequency (AF) of 1% or higher is almost certainly not the cause of an extremely rare monogenic disease. The gnomAD AFs allow us to filter down the variants in an individual with rare monogenic disease so that we can more easily identify the variant or variants associated with their phenotype. Here we will be updating the gnomAD SNV/indel annotation source (they also provide SV AFs).

gnomAD AFs are available in a VCF (or per-chromosome VCFs that can be combined). We use vcfanno to add these AFs to the VCF generated by crg2 in this [rule](variant allele frequencies ). vcfanno requires a config that specifies which fields to use from a VCF to annotate another VCF, and any operations that might be applied to these. In crg2-hg38, that config is here.

You will need to:

  1. Download the gnomAD v4 VCFs for both exomes and genomes (https://gnomad.broadinstitute.org/downloads)
  2. Combine chromosome-wise VCFs for exomes, and combine chromosome-wise VCFs for genomes, resulting in one VCF for gnomAD exomes and one for gnomAD genomes.
  3. You will likely need to process the VCF to exclude unwanted fields, normalize, etc as in this script, the key step being the bcftools command. However, we want to keep FAIL variants so you would remove the first part of the command that filters to include only PASS variants.
  4. Check to see that these VCFs have the fields specified in the vcfanno config.
  5. Replace the filenames in the vcfanno config to reflect the v4 VCFs.
  6. Run the pipeline to generate small variant reports.
@r-varan r-varan self-assigned this Feb 19, 2024
@anjalijain22 anjalijain22 self-assigned this Jun 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants