Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
MichaelHiller authored May 25, 2023
1 parent 9a787c4 commit 9882fde
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -569,7 +569,7 @@ provide the chain of interest with --chain parameter:

## Getting assembly quality statistics

TOGA also provides a powerful way to benchmark assembly completeness and quality. TOGA’s gene classification explicitly distinguishes between genes with missing sequences (indicative of assembly incompleteness) and genes with inactivating mutations (an excess of genes with inactivating mutations indicates a higher base error rate). We used a set of 18430 ancestral placental mammal genes (file is [here](https://github.com/hillerlab/TOGA/blob/master/TOGAInput/human_hg38/Ancestral_placental.txt) to compute assembly quality statistics, as illustrated previously for the [vampire bat](https://www.science.org/doi/10.1126/sciadv.abm6494) or [Rhinolophid bat](https://europepmc.org/article/ppr/ppr616538) genomes.
TOGA also provides a powerful way to benchmark assembly completeness and quality. TOGA’s gene classification explicitly distinguishes between genes with missing sequences (indicative of assembly incompleteness) and genes with inactivating mutations (an excess of genes with inactivating mutations indicates a higher base error rate). We used a set of 18430 ancestral placental mammal genes (file is [here](https://github.com/hillerlab/TOGA/blob/master/TOGAInput/human_hg38/Ancestral_placental.txt)) to compute assembly quality statistics, as illustrated previously for the [vampire bat](https://www.science.org/doi/10.1126/sciadv.abm6494) or [Rhinolophid bat](https://europepmc.org/article/ppr/ppr616538) genomes.

To use TOGA to benchmark assembly quality, use the "supply/TOGA_assemblyStats.py" script. This script produces a TSV file with a summary of the number of genes that are intact (classified as I), having missing sequence (TOGA status PI, M, PM, PG, or absent) or inactivating mutations (L and UL). The script will also generate a PDF image of a stacked plot of the statistics (used in Figure 1 in our previous studies). The two output files will be named respectively ${TOGA_DIRS_FILE}\_stats.tsv and ${TOGA_DIRS_FILE}\_statsplot.pdf

Expand Down

0 comments on commit 9882fde

Please sign in to comment.