- Apollo genome browser: UniTato: a web server for evidence and community based Unification of poTato gene models
- Potato DM v4 to v6 gene ID translations
(all information); high confidence translation table subset at./output/Phureja_v4-v6.1_translations.xlsx
, unitato.nib.si/downloads, and github.com/NIB-SI/DiNAR/TranslationTables - Unified GFF/GTF files
and corresponding FASTA file are also avalable at unitato.nib.si/downloads
- ITAG v4:
- PGSC v4:
- DMv6.1 http://spuddb.uga.edu http://spuddb.uga.edu/data/DM_1-3_516_R44_potato.v6.1.working_models.gff3.gz
- v4 FASTA file:
- DMv6.1 http://spuddb.uga.edu http://spuddb.uga.edu/data/DM_1-3_516_R44_potato_genome_assembly.v6.1.fa.gz
- PGSC v4 annotations:
- DMv6.1 http://spuddb.uga.edu http://spuddb.uga.edu/data/DM_1-3_516_R44_potato.v6.1.working_models.func_anno.txt.gz
pan-transcriptome https://doi.org/10.1038/s41597-020-00581-4
- panTranscriptome components:
- ITAG-PGSC-pairs:
- ITAG v4 CDS len and GC content:
https://doi.org/10.1038/s41597-020-00581-4 - PGSC v4 CDS len and GC content:
https://doi.org/10.1038/s41597-020-00581-4 - Desiree, Rywal, and PW363 CDS and transcripts:
- Chromosome:
./input/pairs chroms.txt
- Unplaced:
- Arabidopsis Araport11
- Tomato ITAG4.1
- Benthi Nb HZ version 1
- Tobacco Nt SR1 version 1
- Liftoff fork with adaptation: flanks as integer (instead of percentage): https://github.com/NIB-SI/Liftoff
- Bedtools
- minimap2
- Salmon
- miniprot
For more information see README
in scripts
- data.table
- pafr
- stringr
- magrittr
- sqldf
- RColorBrewer
- randomcoloR
- circlize
- VennDiagram
- grDevices
- ggvenn
- ggVennDiagram
- intervals
- igraph
- openxlsx
For more information see sessionInfo()
in R Markdown files (.html)
- Translation table:
- GFFs:
- flank-based:
- unified v6v4 GFF:
- flank-based:
- miniprot detailed results:
many-to-many matches:
pan-transcriptome ITAG-PGSC pair matching dependent of the F parameter:
DMv6.1 wm scaffolds without gene features:
Chord diagram visualisation
Chr12 inversion visualisation
“High-confidence gene models”, as defined by Pham et al (2020), are based on the following criteria:
- Transcripts per Million value greater than 0 in at least one RNA-Seq library
- Gene models that have a match to a PFAM domain are considered high-confidence
- Gene models that are partial or have matches to transposable element-related PFAM domains are excluded from the high-confidence model set