Skip to content

Information from "kegg-column", "ko-column" and "ec-column" is now all combined

Compare
Choose a tag to compare
@iquasere iquasere released this 20 Sep 16:21
· 32 commits to master since this release

Multiple new columns are now outputted, depending on the source of information, e.g., KO (kegg-column) contains the KOs obtained from the IDs on the column specified with -keggc.
All KOs obtained are grouped into the KO (KEGGCharter) column, now the only used for charting functions.

Multiple IDs in the same cell now accepted and considered properly

Comma , is the only delimiter accepted for parsing multiple IDs inside the same cell.
Multiple KEGG IDs were accepted before, if separated by semi-comma (;). This is now deprecated, and they most come comma-separated.
"Data" dataframe extends and compresses with each cycle of ID conversion.

Simplified input of quantification columns

No more --genomic-columns nor --transcriptomic-columns, only --quantification-columns (-tcols) now.
All maps ("potential" and "differential") are produced for those columns.

"gene" features now also mapped

KEGGCharter was only considering the orthologs attribute of the Pathway instances, but some boxes are present in the KGML as gene features. Now, KEGGCharter considers those as well.

Reestructured the repo, simplified CICD, improved output to the command line, performance improvements

Maps inside resources folder, all yamls and CI files in cicd folder.
Much smaller keggcharter_input.tsv is still enough to build nice maps.
Had to specify version of libarchive (3.6.2=h039dbb9_1) in the Dockerfile.
More comprehensive messages.
Lighter progress bars.
--map-all workflow was running write_kgmls function for all taxa. Simply runs for ko now, and associates information to all taxa. Much faster, less dumber.