polyoligo-pcr <INPUT> <OUTPUT> <BLASTDB> <OPTIONS>
For a list of all available options and their descriptions, type:
polyoligo-pcr -h
Recommendations (when applicable) are given in the option caption. Note that switches, i.e. boolean options that do not need arguments, have defaults set to
False
.
In the following example, primer pairs will be designed by considering both homologs and mutations within a selected subset population:
polyoligo-pcr sample_data/pcr_targets.txt out sample_data/blastdb --vcf sample_data/vcf.txt.gz --vcf_include sample_data/vcf_include.txt
The software requires three mandatory inputs:
<INPUT>
: Regions of interest declared as CHR:START-END NAME(optional).
<OUTPUT>
: The base name of the output files.
<FASTA/BLASTDB>
: A FASTA file and/or a BLAST database to use as the reference genome. Both file types can be provided by using the same basename. If either is provided, then a conversion will automatically be made to obtain both file types.
Optional files include:
--vcf
: A VCF file to design primers considering mutations (both SNPs and indels).
Note that a tabix index file, created using the Samtools'
tabix
binary is required. To create it, use the commandtabix -p vcf <VCF>.txt.gz
.
--vcf_include/--vcf_exclude
: List of samples in a text file to include/exclude from the VCF. See this example file.
--primer3
: YAML configuration file for Primer3. All Primer3 arguments can be set here. See this example file.
Three output files are produced:
<OUTPUT>.log
: A log file which contain details on the number of valid primers found during each search and for each marker.
<OUTPUT>.bed
Primer pairs reported in BED format for use with genome browsers. Names are composites of <primer_id>-<goodness>
(see below).
<OUTPUT>.txt
Primer pairs reported as a space-separated list with the following columns:
Column | Description |
---|---|
name |
Target name |
chr |
Chromosome |
start |
Primer start position in the genome |
end |
Primer end position in the genome |
direction |
Direction of the primer as F/R for forward/reverse, respectively |
assay_id |
ID of the primer pairs |
seq5_3 |
Sequence of the primer in a 5'-3' direction |
seq_5_3_ambiguous |
Sequence of the primer in a 5'-3' direction with ambiguous nucleotides for mutations (no indels) |
primer_id |
Unique primer identification for each marker. Intended to ensure same primers are not purchased multiple time. |
goodness |
Heuristic goodness score based on multiple criteria. Maximum score is 10 |
qcode |
Quality code containing warnings about the assay. Characters mean the following: . = No warnings t = Bad TM O = Off-targets d = Heterodimerization m/M = Mutations with allele frequencies >0/>0.1 i/I = Indels larger than 0/50 nucleotides |
length |
Primer length |
prod_size |
Expected PCR product size |
tm |
Predicted primer melting temperature (based on a NN thermodynamic model with SantaLucia et al, 1998 parameters) |
gc_content |
Percent GC in the primer sequence |
n_offtargets |
Number of possible genome-wide off-target PCR products. If larger than 5, then the number does not represent an exhaustive list. |
max_aaf |
Maximum alternative allele frequency across all mutations located in the primer |
indels |
Length of any indels located in the target PCR product |
offtargets |
Comma-separated list of expected off-target PCR products. If larger than 5, then this list is not exhaustive |
mutations |
Comma-separated list of mutations located in the primers and reported as [REF/ALT:AAF] |