Reporting #204
Replies: 7 comments 3 replies
-
I recommend restructuring of reports with objective to make it appealing and user friendly. I believe having a DataUnit based structure for pages. Each page can have multiple sections per each of the relevant rules. For example, the genome page can have a section of BiGSCAPE
Classifying rules to Data units
|
Beta Was this translation helpful? Give feedback.
-
|
Genome ID | Organism name | GTDB species | Fasta | Genbank | Genome Length | CDSs | rRNAs | tRNAs |
---|
Use the file located at
data/interim/prokka/{genome_id}/{genome_id}.txt
to collect above values. Is is easy to add GTDB species here`?
ncbi
Title
Strain metadata from NCBI
Description of the report
Provide information on NCBI metadata
Figures
Pie chart with genome per 4 of the assembly levels from NCBI
Tables
Genome ID | Organism name | GTDB species | Assembly level | BioProject | BioSample | Date | Isolation source | Isolation country |
---|
Use the file located at
data/processed/{project}/tables/df_ncbi_meta.csv
to collect the above values.
Can we add links to NCBI data?
Consider adding strain isolation source and country information from NCBI datasets (inspired from https://github.com/NBChub/BGC_analytics/blob/main/notebooks/15_ncbi_datasets_meta.ipynb (Code needs be updated as a rule in snakemake))
seqfu
Title
Genome assembly statistics
Description of the report
Provide information on the version of seqfu used to assess the assembly statistics
Figures
Scatter plot with genome assembly statistics
Tables
Genome ID | Organism name | GTDB species | Genome length | GC content | Contigs | N50 | N90 | AuN |
---|
Use the file located at
data/processed/{project}/tables/df_seqfu.csv
to collect the above values.
gtdbtk
Title
Taxonomic classification
Description of the report
Provide information on the version of gtdbtk and database used among other details
Figures
Two pie charts with the number of genomes per genus and species (take the top 10 if too many)
Tables
Genome ID | Organism name | Family | Genus | Species |
---|
Use the file located at
data/processed/{project}/tables/df_gtdb_meta.csv
to collect the above values.
TO ADD
mash
antismash
bigscape
query-bigslice
roary
Beta Was this translation helpful? Give feedback.
-
|
BGC ID | Type | Contig Edge | Biosynthetic genes | Genome ID | Organism name | GTDB species | Genbank | Halogenase | Oxygenase | Glycosylase |
---|
Use the BGC genbank files to extract column metadata
bigscape
Title
GCF assignment
Description of the report
Provide information on the version of bigscape used to mine genomes with the command and parameters.
GCF IDs are defined differently than the BiGSCAPE software. Here, we assign each connected network a separate GCF ID.
Figures
TBD
Tables
Choose BiGSCAPE cut-off for raw distance. (default 0.30)
BGC ID | Type | BiGSCAPE Class | GCF ID | GCF type | Known compounds | MIBIG ID | Contig Edge | Genome ID | Organism name | GTDB species |
---|
Use the tables df_bgcs.csv from cytoscape output
TO ADD
query-bigslice
Beta Was this translation helpful? Give feedback.
-
|
GCF ID | BiGSCAPE Class | GCF type | Known compounds | MIBIG ID | BGCs | Incomplete BGCs(%) | Genomes |
---|
Use the tables df_families.csv from cytoscape output
TO ADD
query-bigslice
mash
Beta Was this translation helpful? Give feedback.
-
|
Phylogroup ID | Genomes | Core genome | GCFs | Specific GCFs | Pangenome | Coregenome | Specific genome |
---|
Use the file located at mash, cytoscape, roary outputs
TO ADD
roary
eggnog-roary
Beta Was this translation helpful? Give feedback.
-
Some coding elements to determine reports:
BGCFlow wrapper:
|
Beta Was this translation helpful? Give feedback.
-
The more I think about it, I am not sure if this restructuring is necessary. Let's discuss this in person later |
Beta Was this translation helpful? Give feedback.
-
To do list:
Beta Was this translation helpful? Give feedback.
All reactions