- Apollo genome browser: UniTato: a web server for evidence and community based Unification of poTato gene models
- Potato DM v4 to v6 gene ID translations
./output/v4-v6.1_translationTable.xlsx
(all information); high confidence translation table subset at./output/Phureja_v4-v6.1_translations.xlsx
, unitato.nib.si/downloads, and github.com/NIB-SI/DiNAR/TranslationTables - Unified GFF/GTF files
./output/Unitato.GFF.zip
./output/Unitato.GTF.zip
and corresponding FASTA file are also avalable at unitato.nib.si/downloads
- ITAG v4:
./input/StPGSC4.04n_ITAG-gene-model_2020-01-17.gff3
- PGSC v4:
./input/StPGSC4.04n_PGSC-gene-model_2022-10-26.gff3
- DMv6.1 http://spuddb.uga.edu http://spuddb.uga.edu/data/DM_1-3_516_R44_potato.v6.1.working_models.gff3.gz
- v4 FASTA file:
./input/StPGSC4.04n_2018-01-18_oneliner.fasta
- DMv6.1 http://spuddb.uga.edu http://spuddb.uga.edu/data/DM_1-3_516_R44_potato_genome_assembly.v6.1.fa.gz
- PGSC v4 annotations:
./input/Solanum_tuberosum_PGSC_DM_v3.4_converterWithDescriptions.txt
- DMv6.1 http://spuddb.uga.edu http://spuddb.uga.edu/data/DM_1-3_516_R44_potato.v6.1.working_models.func_anno.txt.gz
pan-transcriptome https://doi.org/10.1038/s41597-020-00581-4
- panTranscriptome components:
./input/5cv_weak-components.txt
- ITAG-PGSC-pairs:
./input/ITAG-PGSC-pairs.xlsx
- ITAG v4 CDS len and GC content:
./input/Solanum_tuberosum-ITAG_DM_v1_cds_GC-len
https://doi.org/10.1038/s41597-020-00581-4 - PGSC v4 CDS len and GC content:
./input/Solanum_tuberosum_PGSC_merged_GC-len
https://doi.org/10.1038/s41597-020-00581-4 - Desiree, Rywal, and PW363 CDS and transcripts:
stCuSTr-D_cds_representatives.fasta
stCuSTr-D_tr_representatives.fasta
stCuSTr-P_cds_representatives.fasta
stCuSTr-P_tr_representatives.fasta
stCuSTr-R_cds_representatives.fasta
stCuSTr-R_tr_representatives.fasta
- Chromosome:
./input/pairs chroms.txt
- Unplaced:
./input/unplaced_DM404.txt
- Arabidopsis Araport11
- Tomato ITAG4.1
- Benthi Nb HZ version 1
- Tobacco Nt SR1 version 1
- Liftoff fork with adaptation: flanks as integer (instead of percentage): https://github.com/NIB-SI/Liftoff
- Bedtools
- AGAT
- minimap2
- STAR
- Salmon
- miniprot
For more information see README
in scripts
- data.table
- pafr
- stringr
- magrittr
- sqldf
- RColorBrewer
- randomcoloR
- circlize
- VennDiagram
- grDevices
- ggvenn
- ggVennDiagram
- intervals
- igraph
- openxlsx
For more information see sessionInfo()
in R Markdown files (.html)
- Translation table:
./output/v4-v6.1_translationTable.xlsx
- GFFs:
- flank-based:
./output/matched-unmatched-gff/
- unified v6v4 GFF:
UniTato.gff
- flank-based:
- miniprot detailed results:
./output/miniprot/
-
many-to-many matches:
./reports/overlaps.xlsx
-
Venn:
./reports/01_Venn_wm.tiff
-
pan-transcriptome ITAG-PGSC pair matching dependent of the F parameter:
./reports/ITAG-PGSC_F-dependent_pairs-matches.txt
-
DMv6.1 wm scaffolds without gene features:
./reports/v6scaffolds-without-v6genes.fasta
-
Chord diagram visualisation
-
Chr12 inversion visualisation
“High-confidence gene models”, as defined by Pham et al (2020), are based on the following criteria:
- Transcripts per Million value greater than 0 in at least one RNA-Seq library
- Gene models that have a match to a PFAM domain are considered high-confidence
- Gene models that are partial or have matches to transposable element-related PFAM domains are excluded from the high-confidence model set