Releases: jtamames/SqueezeMeta
Releases · jtamames/SqueezeMeta
v1.6.4
- This changes the way that bin disparity is calculated. Now it will be simply the ratio of contigs disagreeing with the consensus taxonomy. This is faster and leads to comparable results overall. This also fix an issue in which very large bins (such as eukaryotic bins) may consume a lot of memory during step 16.
- This is a fast release aimed to fix a single bug. We have not updated SQMtools or the PDF manual so they both reflect version 1.6.3.
v1.6.3
- Conda installations will now prioritize conda binaries instead of the vendored ones in some cases. This will hopefully fix certain issues in which SqueezeMeta was failing on certain distributions/versions.
test_install.pl
now performs additional tests to check that binaries can be executed in the current environment.- Increased speed and reduced memory usage in step 10 (read counting).
- Fixed an error in which projects created with the sequential mode would fail to restart. Note that each sample still has to be restarted individually.
- Fixed an error in which step 16 (DAStool bin merging) would be attempted even if the
--nobins
flag was provided. - SQMtools: fixed an error in
exportPathways
when the requested KEGG map had only arrows. - SQMtools: fixed an error in which figures would not generated properly when `count='percent' was selected if any sample had 0 reads (as could happen when analyzing subsets).
v1.6.2post3
- Update SPAdes to 3.15.5 so it works with python 3.10
v1.6.2post2
- Upgrade to python 3.10 and improve conda packaging, hopefully fix #705 and be more future-proof
v1.6.2post1
- Fix an issue in which pysam was not properly installed when installing SqueezeMeta through conda
v1.6.2
New features
- Added
spades-base
as a possible assembler for SqueezeMeta. This will make SqueezeMeta call SPAdes with no additional flags. Flags for SPAdes can then customized by the user by passing--assembly_options "EXTRA OPTIONS"
when calling SqueezeMeta. More information can be found in the ReadMe and the PDF manual. - Added the utility script
sqm2zip.py
, which allows to pack the essential files from a SqueezeMeta project into a single zip file. - SQMtools:
loadSQM
can now load a project directly from a zip file created bysqm2zip.py
(syntax would be `loadSQM("/path/to/my_project.zip"). - SQMtools: SQMtools is now available in CRAN and can be installed with
install.packages("SQMtools")
in Windows, Mac and Linux computers. - These changes are meant to allow users to easily transfer their data from their clusters/workstations to their personal computers and explore their results there.
- SQMtools:
mostAbundant
andmostVariable
now accept the argumentbycol = TRUE
, which will make these functions operate on columns rather than rows.
Minor changes / bugfixes
- We now use coverage variances in addition to average contig coverages when calling metabat2, which should improve the quality of the resulting bins.
- Mapping results are now stored as BAM files instead of SAM files, which should reduce disk usage.
Known issues / Other announcements
- The
make_databases.pl
script may spend a lot of time in the "Creating SQLite databases" step. We have included a patch to improve this, but still it happens inconsistently (taking a few hours in some systems, and several days in others). Having a lot (1-2 Tb) of free disk space may help.download_databases.pl
should be considered as the preferred way of quickly getting reasonably-up-to-date databases. - We are discontinuing official support for CentOS7, as its default libraries are too outdated now. We plan on supporting SqueezeMeta in Debian, WSL2-Ubuntu and (hopefully) CentOS Upstream in the not so distant future.
v1.6.1post1
- Fix for yesterday's release, which did not include all the intended features.
v1.6.1
New features
- Added the
seqvec2fasta
function toSQMtools
. It will print a named vector containing sequences (as the ones used to store contig and ORF sequences inSQM$contigs$seqs
andSQM$orfs$seqs
as a single fasta-formatted string. - The
make_databases.pl
,download_databases.pl
andconfigure_nodb.pl
scripts now perform more error checking after each database creation step, and will calltest_install.pl
before finishing. This should help detect the instances in which database creation was unsuccessful e.g. due to a failed download.
Minor changes / bugfixes
- Fixed a bug in
remap.pl
. - Fixed a bug introduced in v1.6.0 in which trimmomatic was not being called even when the
--cleaning
flag was provided. - Fixed a bug in which single reads were causing problems during assembly.
- Fixed a bug in which
cover.pl
was using the system's perl interpreter instead the one in the user environment. - Improved SQL queries in
make_databases.pl
to hopefully speed up database creation. - Fixed an issue in which mothur dependencies were not correctly fulfilled by conda.
- Fixed an issue in which restarting a sequential project failed at step 4.
- Fixed several minor issues with the restart mode.
- Fixed
remove_duplicate_markers.pl
so it works in the new binning structure. - Fixed an issue in which SPAdes was using only 400G of memory even if more was available in the system.
engine="data.table
andtax_mode="prokfilter"
are now the default options inloadSQM
.- Fixed an issue in which
subsetSamples
corrupted the binning information, making it impossible to further subset the resulting object. - The PDF SQMtools manual is back. Future availability will depend on whether I can keep getting R's clunky latex interface to produce PDF's in which the tables are rendered correctly.
Known issues
- The
make_databases.pl
may spend a lot of time in the "Creating SQLite databases" step. We have included a patch to improve this, but still it happens inconsistently (taking a few hours in some systems, and several days in others). Having a lot (1-2 Tb) of free disk space may help.download_databases.pl
should be considered as the preferred way of quickly getting reasonably-up-to-date databases.
v1.6.0 - One egg for many baskets
New features
- The script
restart.pl
has been removed. Project restart is now achieved by callingSqueezeMeta.pl --restart -p <project_name>
. The flags-step <STEP> --force-overwrite
can be added to this call in order to restart the pipeline from a specific step. - Users can now control whether the source of bin taxonomy is the LCA algorithm from SqueezeMeta, or the taxonomic assignment performed by CheckM. This can be controlled with the flag
-taxbinmode
. Options ares
(SqueezeMeta only, default),c
(CheckM),s+c
(SqueezeMeta, missing ranks will be completed with CheckM taxonomy when possible) orc+s
(CheckM, missing ranks will be completed with SqueezeMeta taxonomy when possible). - Users can now control the minimum percentage of genes from the same taxa needed in order to taxonomically annotate a contig. This can be done with the flag
-consensus
. sqm_longreads.pl
will now consider partial hits completely contained inside a long read as valid hits. Before, partial hits were only considered valid if they occurred at the beginning or end of the reads. This has a noticeable impact in the annotation percentages. The old behaviour can be reinstated with the flags-n
or-nopartialhits
.sqm2pavian.pl
now works with results fromsqm_reads.pl
andsqm_longreads.pl
.- Added the option
--filter
tosqm_mapper.pl
. When this flag is present, the script will filter a set of input sequences, returning only the ones that did not map to the reference. - SQMtools: SQM objects now track the length, abundance, mapped bases, coverage and coverage per million reads of bins. The corresponding matrices can be found under the
SQM$bins
list. When runningsubsetContigs
, these values will be updated taking in consideration only the contigs from each bin that were selected. - SQMtools: added the
subsetSamples
function to generate subsetted SQM objects containing only the requested samples. - SQMtoools: added the
plotBins
function to generate barcharts with the distribution of bins across samples. - SQMtools: unmapped reads for functions are no longer tracked, since it led to inconsistent results in some cases (see #442). This also affects the tables generated by
sqm2tables.py
. - SQMtools: added the
mostVariable
function, which will return the most variable rows (based on their coefficient of variation) from a data.frame or matrix. The interface is otherwise similar to themostAbundant
function. - SQMtools: SQM objects now track the coverage per million of reads of orfs, contigs, bins and functions. Each can be accessed inside the corresponding list under the
cpm
name."cpm"
is also a validcount
option forplotFunctions
andplotBins
.
Minor changes / bugfixes
- SQMtools will from now on follow the same version numbers as the corresponding SqueezeMeta releases.
- Updated DIAMOND version to 2.0.15.
- Fixed a bug when adding taxonomic assignments to bins, in which a lack of consensus in a high level prevented looking for consensus at deeper levels.
- Fixed a bug in which
data.table
may makeDAStool
crash if it was called with a very high number of threads. - Fixed a bug in which both reads of a pair were counted as mapped even if only one of them actually mapped to the reference. This had little impact in real datasets, but is corrected now.
- Fixed a bug in which custom arguments passed to bowtie2 with
-mapping_options
conflicted in some cases with the--very-sensitive-local
option that we use by default when calling bowtie2.--very-sensitive-local
is now skipped when the user provides custom arguments to bowtie2. - Fixed an uncommon issue in which contigs could end up being assigned to more than one bin after restarting the pipeline.
- Fixed a bug in
sqm_longreads.pl
when using several input files from the same sample. loadSQM
now removes redundant info from the orfs and contigs tables when loading a project intoSQMtools
resulting in less memory usage.- Fixed a bug in which loading a project with
loadSQM
could randomly caused an error. - We no longer provide a PDF manual for SQMtools. The documentation for each function can still be accessed from the R terminal or RStudio.
Compatibility Changes
- Results generated by previous versions of SqueezeMeta will not load into SQMtools 1.6.0 (which corresponds to SqueezeMeta release 1.6.0). Running
19.getcontigs.pl /path/to/project
will make a project generated with SqueezeMeta v1.5 compatible with the new version of SQMtools.
v1.5.2
Minor changes / bugfixes
- Fixed a bug in consensus taxonomy search during binning, in which a bin could get assigned to a low taxonomic rank even if there was no consensus at higher taxonomic ranks.
- Updated DIAMOND version to 2.0.14. This should get rid of several cases in which search against the nr database resulted in out of memory errors.
- Fixed a typo in the PDF manual in which Figure 6 was missing