Releases: BlasBenito/collinear
CRAN release v2.0.0
collinear 2.0.0
Warning: This version includes several breaking changes.
Main Changes
Function preference_order()
-
Now works with any combination of categorical and numeric responses and predictors. Previously, only numeric responses were considered valid.
-
Accepts a character vector with multiple response variables and returns a named list of data frames in such cases.
-
All functions used as input for the argument
f
have been rewritten, with extended coverage of cases. These functions have also been consistently renamed following these rules:- A code indicating the metric:
r2
for R-squared,auc
for area under the curve (for binomial responses), andv
for Cramer's V (for categorical responses). - A code indicating the model:
spearman
,pearson
, andv
for direct association;glm
for GLMs;gam
for GAMs;rf
for Random Forest models; andrpart
for Recursive Partition Trees. - The model family for GLMs or GAMs:
gaussian
for numeric responses,binomial
for binomial responses, andpoisson
for integer counts. - The term
poly2
for GLMs with second-degree polynomials.
- A code indicating the metric:
-
When
f = NULL
, the functionf_auto()
determines an appropriate default adapted to the types of the response and predictors. -
Now issues a warning if predictors show a suspiciously high association with the response. The sensitivity of this test is controlled by the new argument
warn_limit
. -
Parallelization setup is now managed via
future::plan()
, and a progress bar is available throughprogressr::handlers()
.
Function collinear()
-
Now works with any combination of categorical and numeric responses and predictors. Previously, only numeric responses were valid. Categorical predictors are excluded from VIF analysis but are returned in the output if they pass the pairwise correlation test.
-
Accepts a character vector with multiple response variables and returns a named list of data frames in such cases.
-
The preference order is now computed internally if
preference_order = NULL
(default). Therefore, all relevant arguments of the functionpreference_order()
have been added tocollinear()
with the prefix "preference_". -
Parallelization setup is now managed via
future::plan()
, with a progress bar provided byprogressr::handlers()
. This setup is leveraged bypreference_order()
andcor_select()
. -
Target encoding can be disabled by setting the
encoding_method
argument toNULL
. -
VIF filtering can be disabled by setting
max_vif
toNULL
. -
Pairwise correlation filtering can be disabled by setting
max_cor
toNULL
.
Function cor_select()
-
A new robust forward selection algorithm ensures that the most important predictors are retained after multicollinearity filtering when
preference_order
is used. -
Target encoding, along with the
response
andencoding_method
arguments, has been removed from this function. This change also applies tocor_df()
. -
The function now calls
validate_data_cor()
to ensure that the data is suitable for pairwise correlation multicollinearity filtering. -
Parallelization setup is now managed via
future::plan()
, with a progress bar provided byprogressr::handlers()
. This setup is used bycor_numeric_vs_categorical()
andcor_categorical_vs_categorical()
to speed up pairwise correlation computation.
Function cor_df()
- Fixed a bug that prevented
cor_numeric_vs_categorical()
andcor_categorical_vs_categorical()
from triggering properly.
Function vif_select()
-
A new robust forward selection algorithm better preserves predictors with higher preference when
preference_order
is used. -
Target encoding, along with the
response
andencoding_method
arguments, has been removed. As a result, this function now only works with numeric predictors. This change also applies tovif_df()
. -
The new function
validate_data_vif()
is called to ensure the data is suitable for VIF-based multicollinearity filtering. Attempting a VIF analysis in a data frame with more columns than rows now returns an error.
Function target_encoding_lab()
and Companion Functions
-
Completely rewritten for parallelization using
future::plan()
and a progress bar viaprogressr::handlers()
. -
The default encoding method is now "loo" (leave-one-out), as it provides more useful results in most cases.
-
The functions
target_encoding_mean()
,target_encoding_rank()
, andtarget_encoding_loo()
have been simplified to the bare minimum, with all redundant logic moved totarget_encoding_lab()
. -
NA cases in the predictor to encode are now grouped under "NA".
-
The "rnorm" method has been deprecated, and the function
target_encoding_rnorm()
has been removed from the package.
Other Changes
-
Added the function
cor_clusters()
to group predictors usingstats::hclust()
based on their pairwise correlation matrix. -
Streamlined the package documentation using roxygen methods to inherit sections and parameters.
-
Removed
dplyr
as a dependency. -
Added
mgcv
,rpart
, andranger
to Imports to support allf_xxx()
functions from the start. -
All warnings in data validation functions have been converted to messages. These messages now indicate the function that generated them, aiding in debugging and ensuring that messages and warnings are printed in the correct order.
CRAN release v1.1.1
version 1.1.1
CRAN release v1.0.1
CRAN release