Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unoise3 clustering #96

Open
nicolereynolds1 opened this issue Mar 1, 2022 · 5 comments
Open

unoise3 clustering #96

nicolereynolds1 opened this issue Mar 1, 2022 · 5 comments

Comments

@nicolereynolds1
Copy link

I am trying to cluster my data using unoise3 with the following command:
amptk unoise3 -i out_lr22.demux.fq.gz -o out_lr22 -p 98 -e 2.0 --usearch usearch10

I know usearch is no longer included with amptk, but I downloaded it and put it in my path by following the instructions from the amptk documents. However, I keep getting the error:
Traceback (most recent call last):
File "/opt/miniconda3/envs/amptk/bin/amptk", line 10, in
sys.exit(main())
File "/opt/miniconda3/envs/amptk/lib/python3.9/site-packages/amptk/amptk.py", line 784, in main
mod.main(arguments)
File "/opt/miniconda3/envs/amptk/lib/python3.9/site-packages/amptk/unoise3.py", line 111, in main
total = amptklib.countfasta(derep_out)
File "/opt/miniconda3/envs/amptk/lib/python3.9/site-packages/amptk/amptklib.py", line 445, in countfasta
with open(input, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'out_lr22_tmp/out_lr22.EE2.0.derep.fa'

and in the log file the error says:
Fatal error: FASTQ input is only allowed with the fastx_uniques command

I tried changing the command from derep_fulllength to fastx_uniques in the unoise3 script, but that also did not work and returned the same error. I cannot figure out why the dereplication continues to fail.

@nextgenusfs
Copy link
Owner

What version of amptk? I think if it is latest you need to pass --method usearch. Does it complete correctly if you use --method vsearch?

@markschl
Copy link

markschl commented Sep 4, 2022

I can confirm this issue. Looking at the logfile, it appears more of a problem with de-replication, which is independent of whether VSEARCH or USEARCH are used for denoising.

(...)
[09/04/22 11:03:53]: AMPtk v1.5.4, USEARCH v11.0.667, VSEARCH v2.21.1
(...)
[09/04/22 11:03:57]: 253,078 reads passed
[09/04/22 11:03:57]: De-replication (remove duplicate reads)
[09/04/22 11:03:57]: vsearch --derep_fulllength out_tmp/out.EE1.0.filter.fq --relabel Read_ --sizeout --output out_tmp/out.EE1.0.derep.fa --threads 16
[09/04/22 11:03:57]: WARNING: The derep_fulllength command does not support multithreading.
Only 1 thread used.
vsearch v2.21.1_linux_x86_64, 30.6GB RAM, 16 cores
https://github.com/torognes/vsearch



Fatal error: FASTQ input is only allowed with the fastx_uniques command

@markschl
Copy link

markschl commented Jan 6, 2023

This command will replace the problematic code as a workaround:

sed -i "s/--derep_fulllength', filter_out, '--relabel', 'Read_', '--sizeout', '--output/--fastx_uniques', filter_out, '--relabel', 'Read_', '--sizeout', '--fastaout/g" "$CONDA_PREFIX/lib/python3.10/site-packages/amptk/unoise3.py"

@nextgenusfs
Copy link
Owner

nextgenusfs commented Jan 8, 2023

So it looks like --fastx_uniques was added to vsearch since v2.22.0 -- I actually have an older version of that. I don't really know what the difference between --fastx_uniques and --derep_fulllength is?

So I can make this change, we'll just have to pin vsearch > 2.22.0.

Otherwise, it seems you can downgrade vsearch and existing AMPtk v1.5.4 should work: dereneaton/ipyrad#469

@markschl
Copy link

According to the VSEARCH documentation, the only difference is that --fastx_uniques can handle FASTQ files (also output) and --derep_fulllength cannot. I think that this just follows the behaviour of USEARCH, where fastx_uniques was introduced and derep_fulllength deprecated (ultimately dropped in v11). It seems like VSEARCH has become more strict and doesn't accept FASTQ files anymore with --derep_fulllength...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants