Replies: 11 comments
-
Judging from the assembly length, it does not seem you're having a bacterial isolate dataset. At least I'm unaware about bacteria with genome size of 40 Mbp. Anyway, will you please post your spades.log files from these runs? |
Beta Was this translation helpful? Give feedback.
-
@asl thank you for your reply. this is for a fungal genome, if I will post the spades.log files ASAP |
Beta Was this translation helpful? Give feedback.
-
spades.log for |
Beta Was this translation helpful? Give feedback.
-
So are we or are we not supposed to use Here are quotes from https://github.com/ablab/spades : "--isolate This flag is highly recommended for high-coverage isolate and multi-cell Illumina data; improves the assembly quality and running time. We also recommend to trim your reads prior to the assembly. More details can be found here. This option is not compatible with --only-error-correction or --careful options."
|
Beta Was this translation helpful? Give feedback.
-
I can confirm the results of xonq. Removing the |
Beta Was this translation helpful? Give feedback.
-
I've run busco on several assemblies of marine fishes with and without the --isolate setting. The assemblies without --isolate score better. |
Beta Was this translation helpful? Give feedback.
-
Judging from @cbird808 datasets – the reason is low and uneven coverage plus additional coverage filtering enabled which removes significant parts of the assembly. @xonq case is similar: reads of 140 bp, custom maximum k-mer length of 121 and coverage filtering. This could easily create issues during the assembly. The number of isolated reads that did not enter the assembly is enormous. |
Beta Was this translation helpful? Give feedback.
-
thank you @asl for following up. My understanding is that for the type of genomes I'm working with (euk, Ill pe 150, no genomic resources, non model species) I should be using neither the |
Beta Was this translation helpful? Give feedback.
-
It's not that the euk genome is the problem, but rather the properties of input data: low and uneven coverage, etc. You may want to look into the possible problems during the sequencing / library preparation |
Beta Was this translation helpful? Give feedback.
-
If low and uneven coverage is the problem, then shouldn't the thresholds for the error output below be adjusted?
|
Beta Was this translation helpful? Give feedback.
-
Well, the problem is that there is no reliable way to asses whether the coverage is even post-hoc. Even more, the decisions made during the assembly might effectively "hide" the issues (at the expense of assembly quality, of course). |
Beta Was this translation helpful? Give feedback.
-
I am assembling fungal genomes from 150 bp PE Illumina short reads. I've noted that it is recommended to use
--isolate
for "high-coverage multi-cell/isolate data"; however, when specified and compared the assembly quality decreased based on standard measurements (N50, contig number, largest contig). Furthermore, I was unable to recover a known gene cluster on one contig using--isolate
, but it was recovered on a single contig when I reran without it.with
--isolate
(contigs > 1kb):without
--isolate
(contigs > 1kb):I therefore have evidence from a biological standpoint (the gene cluster recovery) and the assembly statistics (which I understand could be falsely better) that
--isolate
was detrimental to my assembly quality. Why is it recommended then?Beta Was this translation helpful? Give feedback.
All reactions