Combine genotype and functional annotation queries

This workflow allows you to extract variants and samples that comply to both a set of genotype and functional annotation filters, by intersecting the genotype VCFs with the functional annotation VCFs.

Pipeline overview

The pipeline has the following main processes:

FIND_CHUNK: finds the genomic and functional annotation agg chunks of interest.
EXTRACT_VARIANT_VEP: filters the annotation agg vcfs.
INTERSECT_ANNOTATION_GENOTYPE_VCF: intersects the genomic vcf with the filtered annotation vcf.
FIND_SAMPLES: finds samples of interest.
SUMMARISE_OUTPUT: produces summary tables.

Required inputs

input_bed

This is a region file of your genes of interest. This must be a three or column tab-delimited file of chromosome, start, and stop (with an option fourth column of an identifier - i.e. a gene name). The file should have the .bed extension.

Example of input_bed file:

chr2	213005363	213151603	IKZF2
chr7	50304716	50405101	IKZF1

agg_chunks_bed

This is the list of chunk names and full file paths to both the genotype and functional annotation VCFs for either aggV2 or aggCOVID. These can be found under GEL data resources > aggregate_file_lists > aggV2_chunk_names.bed and GEL data resources > aggregate_file_lists > aggCOVID_4.2_chunk_names.bed

include_exclude

This parameter defines whether to include (set to -i) or to exclude (set to -e) the sites selected using the --expression parameter (see below).

expression

This parameter defines the bcftools filter of your query. See bcftools EXPRESSIONS for accepted filters https://samtools.github.io/bcftools/bcftools.html#expressions.

format

This parameter defines the format of the query, see https://samtools.github.io/bcftools/bcftools.html#query for details. For the process to run, you should add the following fields '[%SAMPLE\t%CHROM\t%POS\t%REF\t%ALT\n]', but you can also specify additional fields after the initial list.

cpus

Number of cpus to be used by each nextflow process. The default is set to 1 cpu per process, but when using and input_bed file with > 5 entries please set it to a higher value.

memory

Total RAM available for each nextflow process. The default is set to 2.GB per process, but when using and input_bed file with > 5 entries please set it to a higher value.

Optional inputs

severity_scale

This file lists the severity of variants. It can be found under GEL data resources > aggregations > gel_mainProgramme > somAgg > v0.2 > additional data > vep severity scale > VEP_severity_scale_2020.txt. Provide this file if interested only in variant with a specific consequence.

severity

With this parameter we choose the severity of variants we are interested in for our query. For example, if you want look only at missense variants or worse, the input value would be missense. Only use if the parameter severity_scale is set.

Outputs

This workflows produces three ouputs for each gene in your input bed file.

*_result.tsv file: this is a tab-delimited output from bcftools query command.
*_platekey_summary.tsv file: this is a two-column tab-delimited file, where one column is the list of platekeys recovered by the query, and the second column is the number of variants per each participant that satisfied the query.
*_variant_summary.tsv file: this is a two-column tab-delimited file, where one column is the list of variants that satisfied the query, and the second column is the number of participants that have that query.

Examples

Example 1

An example question would be: "I want to extract the samples in aggV2 who are homozygous alt for missense (or worse) rare variants within the gene IKZF1".

The final command would look like this:

Example 2

An example question would be: "I want to extract the samples in aggV2 who are homozygous alt for any type of variant within the gene IKZF1".

The final command would look like this:

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
bin		bin
conf		conf
images		images
input		input
modules		modules
workflows		workflows
.DS_Store		.DS_Store
README.md		README.md
main.nf		main.nf
nextflow.config		nextflow.config
submit.sh		submit.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Combine genotype and functional annotation queries

Table of contents

Pipeline overview

Required inputs

input_bed

agg_chunks_bed

include_exclude

expression

format

cpus

memory

Optional inputs

severity_scale

severity

Outputs

Examples

Example 1

Example 2

About

Releases

Packages

Languages

genomicsengland/agg_combininig_queries

Folders and files

Latest commit

History

Repository files navigation

Combine genotype and functional annotation queries

Table of contents

Pipeline overview

Required inputs

input_bed

agg_chunks_bed

include_exclude

expression

format

cpus

memory

Optional inputs

severity_scale

severity

Outputs

Examples

Example 1

Example 2

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages