Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues AMPtk taxonomy #106

Open
msalamon2 opened this issue Nov 27, 2024 · 0 comments
Open

Issues AMPtk taxonomy #106

msalamon2 opened this issue Nov 27, 2024 · 0 comments

Comments

@msalamon2
Copy link

msalamon2 commented Nov 27, 2024

Hello Jon Palmer,

I was trying to run the hybrid taxonomy module from AMPtk on a small set of ASVs (about 4K) from metabarcoding data generated with the 12S Mimammal primer, which were processed with DADA2, along with the 12S Midori2 reference dataset MIDORI2_UNIQ_NUC_GB261_srRNA_SINTAX.fasta.

I am having some issues with the output of AMPtk taxonomy which is not behaving in the way that is described in the AMPtk taxonomy read the docs (especially regarding the LCA method).

  1. GS method (1267/3939 ASVs): all ASVs have final hybrid assignment at species level in this category. In most cases both the usearch Global Alignment (GA) and sintax were above the threshold (97% and 0.8 respectively, N = 1115), but in the rest of the cases GA provided more complete taxonomic level information than sintax, and the hybrid taxonomic assignment followed the full GA assignment instead of applying the LCA (at the lowest taxonomic level above the threshold for sintax), which is what I expected from the description "6. If the best Global Alignment result is greater than 97% identical then that hit is retained. A final LCA algorithm is applied to the Global Alignment hit and the Best Bayesian Classifier hit". More importantly there were 41 cases where GA and sintax did not agree but it was not flagged for disagreement and not corrected with LCA.

  2. GSL method (286 ASVs): has GA agreeing with sintax up to a given taxonomic level, from which the method retaining the most taxonomic level was retained as the hybrid assignment (either GA or sintax) . This does not really correspond to LCA.

  3. SS method (2386 ASVs): in at least 50 cases, GA and sintax did not agree but it was not flagged for disagreement and not corrected with LCA. This could be a more widespread problem.

The GDL method was not used. I was wondering if this output behavior is normal ?

Here is the .sh script below:
#!/bin/bash
#SBATCH --account=def-mcristes
#SBATCH --mem-per-cpu=4775M
#SBATCH --cpus-per-task=10
#SBATCH --time=24:00:00
#SBATCH --mail-user=mathilde2salamon@gmail.com
#SBATCH --mail-type=ALL

cd /home/msalamon/

module load StdEnv/2020 gcc/9.3.0 python/3.9 vsearch/2.28.1
virtualenv --no-download $SLURM_TMPDIR/env
source $SLURM_TMPDIR/env/bin/activate
pip install --no-index --upgrade pip
pip install --no-index -r amptk-reqs.txt

cd /home/msalamon/projects/def-mcristes/msalamon/scripts/AMPtk/12SMimammal/

amptk taxonomy -f ASVs_12SMimammal_length_filter_DADA2.fasta -o AMPtk_12SMimammal_Midori2_res.txt --usearch_db MIDORI2_UNIQ_NUC_GB261_srRNA_SINTAX.fasta --method hybrid --sintax_cutoff 0.8 --cpus $SLURM_CPUS_PER_TASK

AMPtk_12SMimammal_Midori2_res.taxonomy.txt
AMPtk_12SMimammal_Midori2_results.xlsx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant