PPanGGOLiN release 1.0.0
New features:
- Can choose the number of partitions in the 'workflow' subcommand
- Can customize identity and coverage thresholds in the 'cluster' subcommand
- Added 4 new possible outputs :
- proteic fasta for representative sequences of the gene families
- nucleic fasta for representative sequences of the gene families
- nucleic fasta of all the CDS
- a list containing the gene family IDs and the gene IDs alike the .tsv file format of MMseqs2 - Added unit tests for the different classes thank to @sletort
bug fixes :
- Do not take into account the Markov Random Field if its criteria reaches infinity (problem of large dimensionality in statistics, PPanGGOLiN should crash less on VERY fragmented datasets.)
- now properly reading .gbff/.gbk files
- Improved compatibility for the .gexf files