Releases: mrapp-ke/Boomer
Version 0.9.0
A major update to the BOOMER algorithm that introduces the following changes.
This release comes with several API changes. For an updated overview of the available parameters and command line arguments, please refer to the documentation.
Algorithmic Enhancements
- Sparse matrices can now be used to store gradients and Hessians if supported by the loss function. The desired behavior can be specified via a new parameter
--statistic-format
. - Rules with partial heads can now be learned by setting the parameter
--head-type
to the valuepartial-fixed
, if the number of predicted labels should be predefined, orpartial-dynamic
, if the subset of predicted labels should be determined dynamically. - A beam search can now be used for the induction of individual rules by setting the parameter
--rule-induction
to the valuetop-down-beam-search
. - Variants of the squared error loss and squared hinge loss, which take all labels of an example into account at the same time, can now be used by setting the parameter
--loss
to the valuesquared-error-example-wise
orsquared-hinge-example-wise
. - Probability estimates can be obtained for each label independently or via marginalization over the label vectors encountered in the training data by setting the new parameter
--probability-predictor
to the valuelabel-wise
ormarginalized
. - Predictions that maximize the example-wise F1-measure can now be obtained by setting the parameter
--classification-predictor
to the valuegfm
. - Binary predictions can now be derived from probability estimates by specifying the new option
based_on_probabilities
. - Isotonic regression models can now be used to calibrate marginal and joint probabilities predicted by a model via the new parameters
--marginal-probability-calibration
and--joint-probability-calibration
. - The rules in a previously learned model can now be post-optimized by reconstructing each one of them in the context of the other rules via the new parameter
--sequential-post-optimization
. - Early stopping or post-pruning can now be used by setting the new parameter
--global-pruning
to the valuepre-pruning
orpost-pruning
. - Single labels can now be sampled in a round-robin fashion by setting the parameter
--feature-sampling
to the new valueround-robin
. - A fixed number of trailing features can now be retained when the parameter
--feature-sampling
is set to the valuewithout-replacement
by specifying the optionnum_retained
.
Additions to the Command Line API
- Data sets in the MEKA format are now supported.
- Certain characteristics of binary predictions can be printed or written to output files via the new arguments
--print-prediction-characteristics
and--store-prediction-characteristics
. - Unique label vectors contained in the training data can be printed or written to output files via the new arguments
--print-label-vectors
and--store-label-vectors
. - Models for the calibration of marginal or joint probabilities can be printed or written to output files via the new arguments
--print-marginal-probability-calibration-model
,--store-marginal-probability-calibration-model
,--print-joint-probability-calibration-model
and--store-joint-probability-calibration-model
. - Models can now be evaluated repeatedly, using a subset of their rules with increasing size, by specifying the argument
--incremental-prediction
. - More control of how data is split into training and test sets is now provided by the argument
--data-split
that replaces the arguments--folds
and--current-fold
. - Binary labels, regression scores, or probabilities can now be predicted, depending on the value of the new argument
--prediction-type
, which can be set to the valuesbinary
,scores
, orprobabilities
. - Individual evaluation measures can now be enabled or disabled via additional options that have been added to the arguments
--print-evaluation
and--store-evaluation
. - The presentation of values printed on the console has vastly been improved. In addition, options for controlling the presentation of values to be printed or written to output files have been added to various command line arguments.
Bugfixes
- The behavior of the parameter
--label-format
has been fixed when set to the valueauto
. - The behavior of the parameters
--holdout
and--instance-sampling
has been fixed when set to the valuestratified-label-wise
. - The behavior of the parameter
--binary-predictor
has been fixed when set to the valueexample-wise
and using a model that has been loaded from disk. - Rules are now guaranteed to not cover more examples than specified via the option
min_coverage
. The option is now also taken into account when using feature binning. Alternatively, the minimum coverage of rules can now also be specified as a fraction via the optionmin_support
.
API Changes
- The parameter
--early-stopping
has been replaced with a new parameter--global-pruning
. - The parameter
--pruning
has been renamed to--rule-pruning
. - The parameter
--classification-predictor
has been renamed to--binary-predictor
. - The command line argument
--predict-probabilities
has been replaced with a new argument--prediction-type
. - The command line argument
--predicted-label-format
has been renamed to--prediction-format
.
Quality-of-Life Improvements
- Continuous integration is now used to test the most common functionalites of the BOOMER algorithm and the corresponding command line API.
- Successful generation of the documentation is now tested via continuous integration.
- Style definitions for Python and C++ code are now enforced by applying the tools
clang-format
,yapf
, andisort
via continuous integration.
Version 0.8.2
A bugfix release that solves the following issues:
- Fixed prebuilt packages available at PyPI.
- Fixed output of nominal values when using the option
--print-rules true
.
Version 0.8.1
A bugfix release that solves the following issues:
- Missing feature values are now dealt with correctly when using feature binning.
- A rare issue that may cause segmentation faults when using instance sampling has been fixed.
Version 0.8.0
This release comes with changes to the command line API. For an updated overview of the available parameters, please refer to the documentation.
A major update to the BOOMER algorithm that introduces the following changes:
- The programmatic C++ API was redesigned for a more convenient configuration of algorithms. This does also drastically reduce the amount of wrapper code that is necessary to access the API from other programming languages and therefore facilitates the support of additional languages in the future.
- An issue that may cause segmentation faults when using stratified sampling methods for the creation of holdout sets has been fixed.
- Pre-built packages for Windows systems are now available at PyPI.
- Pre-built packages for Linux aarch64 systems are now provided.
Version 0.7.1
A bugfix release that solves the following issues:
- Fixes an issue preventing the use of dense representations of ground truth label matrices that was introduced in version 0.7.0.
- Pre-built packages for MacOS systems are now available at PyPI.
- Linux and MacOS packages for Python 3.10 are now provided.
Version 0.7.0
A major update to the BOOMER algorithm that introduces the following changes:
- L1 regularization can now be used.
- A more space-efficient data structure is now used for the sparse representation of binary predictions.
- The Python API does now allow to access the rules in a model in a programmatic way.
- It is now possible to output certain characteristics of training datasets and rule models.
- Pre-built packages for the Linux platform are now available at PyPI.
- The documentation has vastly been improved.
Version 0.6.2
A bugfix release that solves the following issues:
- Fixes a segmentation fault when a sparse feature matrix should be used for prediction that was introduced in version 0.6.0.
Version 0.6.1
A bugfix release that solves the following issues:
- Fixes a mathematical problem when calculating the quality of potential single-label rules that was introduced in version 0.6.0.
Version 0.6.0
This release comes with changes to the command line API. For brevity and consistency, some parameters and/or their values have been renamed. Moreover, some parameters have been updated to use more reasonable default values. For an updated overview of the available parameters, please refer to the documentation.
A major update to the BOOMER algorithm that introduces the following changes:
- The parameter
--instance-sampling
does now allow to use stratified sampling (stratified-label-wise
andstratified-example-wise
). - The parameter
--holdout
does now allow to use stratified sampling (stratified-label-wise
andstratified-example-wise
). - The parameter
--recalculate-predictions
does now allow to specify whether the predictions of rules should be recalculated on the entire training data, if instance sampling is used. - An additional parameter (
--prediction-format
) that allows to specify whether predictions should be stored using dense or sparse matrices has been added. - The code for the construction of rule heads has been reworked, resulting in minor performance improvements.
- The unnecessary calculation of Hessians is now avoided when used single-label rules for the minimization of a non-decomposable loss function, resulting in a significant performance improvement.
- A programmatic C++ API for configuring algorithms, including the validation of parameters, is now provided.
- A documentation is now available online.
Version 0.5.0
A major update to the BOOMER algorithm that introduces the following changes:
- Gradient-based label binning (GBLB) can be used to assign labels to a predefined number of bins.