Releases: niaid/primer-id-progs
A few things
Added a script count_haplotypes.pl
to take translated reads (output of convert_reads_to_amino_acid.pl), extract amino acids at particular positions and tally the motifs up across multiple files.
Added license information and third-party software references.
Fixed a bug in get_majority_block_bam (thanks to @jrkirk61 for reporting!)
Added logic to handle a situation where all residues at a position are ambiguous.
Added a new option to compute_cutoff script to allow the user to specify an acceptable minimum threshold primerID group size.
Confidence thresholds
This release sees some helpful additions:
- 95% confidence intervals for each of the variants and the cMAF (combined minor allele frequency), as well as the confidence threshold for each amplicon. The confidence threshold is the median value for the high end of the 95% confidence interval of the variants of an amplicon, which is basically the background level. If the low end of the 95% confidence interval for a variant is above this confidence threshold, we can be fairly certain that the variant is real. There are probably many real variants below that threshold as well, but this is a conservative filter to get a high-confident set of variants.
- Added feature to count up the bases before and after merging reads with PrimerID, so we can compute the overall background noise reduction
- Now combine_linkage_values.pl actually does merge files! The merge_replicates sub was just a shell before. It computes a combined p-value using Fishers method.
- Updated compare_variant_frequency script to work with the new output format for convert_reads script. Added additional columns to show whether the variant in a comparison pass the confidence threshold for the amplicon, and the log2 ratio of change in cMAF between the two samples being compared. I changed the filter for this script from just using adjusted p-value < 0.05 by adding filter for absolute log2Ratio_cMAF > 1, and requiring that at least one of the variants in the comparison is above the confidence threshold.
And a few bug fixes:
- Modified calculate_linkage script to compute the p-value outside of the Parallel loop. Much faster and no more Statistics::R errors for jobs with lots (e.g., 500+) comparisons.
- Modified merge_tally script to not print extremely low coverage variants (usually beyond the reference).
New convert_reads workflow for large files
This point release adds a new script merge_tally.pl
, which replaces the functionality of merge_tally_overlapping_regions.pl
, and also makes it possible to run convert_reads on very large files, containing millions of reads. The procedure basically entails splitting the fasta file into separate files (~5-10,000 reads per file is good), running convert_reads_to_amino_acid.pl
on each file, then merging the output with merge_tally.pl
. The nuc, codon, aa, and merged tally files are recreated. (See tutorial for examples of running.)
As part of this, the code for the convert_reads script was refactored to put several subroutines into a module, primerid.pm
so that they were accessible to merge_tally.pl
.
Snapshot for sharing
Sharing this release with Chris Brooke's team at the University of Illinois.
This release has a few bug fixes:
- Fixed a bug in convert_reads where stop codons were not being printed.
- Fixed a bug in convert_reads where stop codons were not being considered unambiguous.
- Fixed a bug in calculate_linkage to see that the variant is above the threshold in at least one of the replicates.
And a few enhancements:
- Added columns to tally up unambiguous residues separately from ambiguous residues. This makes the script add_consensus_columns_to_frequency_tables.pl unnecessary.
- Added compute_cutoff.pl script to compute minimum primerID size cutoff.
- Added libexec folder for MAFFT 7.221 to make it portable.
- Added a brief tutorial along with test files.
First release
This is the first snapshot release of the software repository, last updated June 20, 2016. This includes updates to enable the scripts to work on the NIAID Locus cluster. It also includes updated dependencies in the repository (e.g., samtools 1.3), and additional dependencies were added to the repository, including clustalw2, bcftools, vcfutils, seqtk, and picard.jar.