Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Format of assembly report isn't clear #212

Open
CholoTook opened this issue Jun 24, 2021 · 2 comments
Open

Format of assembly report isn't clear #212

CholoTook opened this issue Jun 24, 2021 · 2 comments

Comments

@CholoTook
Copy link
Contributor

Which columns of the assembly report are used by the assembly checker to define synonyms?

  • Which column represents the sequence ID used in the VCF?
  • Which column represents the sequence ID used in the FASTA?
  • How is the FASTA header line parsed for the ID?

Enquiring minds demand to know! ;-)

Many thanks,
Dan.

@tcezard
Copy link
Member

tcezard commented Jun 24, 2021

It is indeed not the clearest part of the code and pretty much absent from the documentation:
The assembly report is expected to have 10 columns and it is recording the content of column 1, 5, 7, and 10

The assembly report that match this description can be found on Genbank FTP like this one

If the first column (CHROM) of the VCF and the first word (anything before the first white space) of the fasta header contains any of the synonyms found in the columns mentioned above from the assembly report then they are matched.

I hope this helps.

@CholoTook
Copy link
Contributor Author

CholoTook commented Jun 24, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants