Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

confused about some text in output of stage2 stage2/*postprocess.txt #207

Open
taylorreiter opened this issue Feb 21, 2022 · 0 comments
Open

Comments

@taylorreiter
Copy link
Member

In text files stage2/*postprocess.txt, there are sections that postprocess mashmap alignments, parsing them to determine percent identity for each contig against each contaminant genome.
ex:

removing 9kb with 5kb dirty, contig name NODE_1608_length_9168_cov_9.2029.
   5kb aligns to GCA_900554435.1:USHC01000102.1 at 98.4%
   (d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Bacteroidaceae;g__Phocaeicola;s__Phocaeicola sp900554435)
   ** disagreement at rank 'phylum'; genome p__Firmicutes_A, source p__Bacteroidota

I'm confused because stage2 isn't required for target clean, but this file says that it removes contigs/kb as dirty based on mashmap alignment results. My understanding is that it would be more accurate to state, "identified 9kb with 5kb dirty, contig name NODE_1608_length_9168_cov_9.2029.
5kb aligns to GCA_900554435.1:USHC01000102.1 at 98.4%" OR "verified 9kb contaminant with 5kb dirty, contig name NODE_1608_length_9168_cov_9.2029.
5kb aligns to GCA_900554435.1:USHC01000102.1 at 98.4%"

Am i interpreting this file wrong?

line in code that produces this message: https://github.com/dib-lab/charcoal/blob/latest/charcoal/postprocess_alignments.py#L133

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant