Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DIAMOND fails to produce pairwise alignments in Step 5 #38

Open
awalling opened this issue Aug 5, 2020 · 0 comments
Open

DIAMOND fails to produce pairwise alignments in Step 5 #38

awalling opened this issue Aug 5, 2020 · 0 comments

Comments

@awalling
Copy link

awalling commented Aug 5, 2020

Hello,

I am attempting to run panX on a dataset of 82 genomes from a single family of Alphaproteobacteria. Using the divide and conquer strategy, my command reads as follows:

echo "source activate panX; /nas3/awalling/software/pan-genome-analysis/panX.py -fn /nas3/awalling/software/pan-genome-analysis/data/Erythrobacteraceae -sl Erythrobacteraceae -dmdc -dcs 41 -dmsi 90 -dmsqc 90 -dmssc 90 -cg 1.0 -mi /nas3/awalling/software/pan-genome-analysis/metadata/erythrobacter_panx_metadata.tsv -mtf /nas3/awalling/software/pan-genome-analysis/metadata/erythrobacter_meta_config.tsv -t 32 > /nas3/awalling/software/pan-genome-analysis/Erythrobacteraceae2.log 2> Erythrobacteraceae.err" | qsub -V -N panX_erythrobacteraceae -q batch -e nas3/awalling/software/pan-genome-analysis/panx.erythrobacteraceae.pbs.log -o /nas3/awalling/software/pan-genome-analysis/panx.erythrobacteraceae.pbs.log -l ncpus=64 -l mem=200gb -l walltime=96:00:00

However, I receive the following error:

Traceback (most recent call last): File "/nas3/awalling/software/pan-genome-analysis/panX.py", line 272, in <module> myPangenome.clustering_protein_divide_conquer() File "/nas3/awalling/software/pan-genome-analysis/scripts/pangenome_computation.py", line 153, in clustering_protein_divide_conquer self.diamond_subject_cover_subproblem, self.mcl_inflation, self.diamond_path, self.diamond_dc_subset_size) File "/nas3/awalling/software/pan-genome-analysis/scripts/sf_cluster_protein_divide_conquer.py", line 168, in clustering_divide_conquer integrate_clusters(clustering_path,cluster_fpath) File "/nas3/awalling/software/pan-genome-analysis/scripts/sf_cluster_protein_divide_conquer.py", line 103, in integrate_clusters with open('%s%s'%(clustering_path,'subproblem_finalRound_cluster.output'))\ IOError: [Errno 2] No such file or directory: '/nas3/awalling/software/pan-genome-analysis/data/Erythrobacteraceae/protein_faa/diamond_matches/subproblem_finalRound_cluster.output'

As far as I can tell, the hangup is that during the subproblem blastp stage, no pairwise alignments are generated. From the end of /protein_faa/diamond_matches/diamond_blastp_subproblem_1.log:

Loading query sequences... [0s] Closing the input file... [0.005s] Closing the output file... [0s] Closing the database file... [0.005s] Deallocating taxonomy... [0s] Total time = 49.321s Reported 0 pairwise alignments, 0 HSPs. 0 queries aligned.

The files subproblem_1_cluster.output, subproblem_1.m8, subproblem_2_cluster.output, subproblem_2.m8, and subproblem_finalRound.faa are all blank.

I have attempted to fix this error by relaxing the e-value threshold with the -dme flag, but even with an e-value cutoff of 10 and a relaxed -cg of 0.8 this error replicates.

Is there a way to fix this issue without running an all-against-all blast and providing that matrix separately?

Best,

Alexandra

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant