Releases: alteryx/evalml
Releases · alteryx/evalml
v0.39.0
v0.39.0 Dec. 9, 2021
Enhancements
- Renamed
DelayedFeatureTransformer
toTimeSeriesFeaturizer
and enhanced it to compute rolling features #3028 - Added ability to impute only specific columns in
PerColumnImputer
#3123 - Added
TimeSeriesParametersDataCheck
to verify the time series parameters are valid given the number of splits in cross validation #3111
Fixes
- Default parameters for
RFRegressorSelectFromModel
andRFClassifierSelectFromModel
has been fixed to avoid selecting all features #3110
Changes
- Removed reliance on a datetime index for
ARIMARegressor
andProphetRegressor
#3104 - Included target leakage check when fitting
ARIMARegressor
to account for the lack ofTimeSeriesFeaturizer
inARIMARegressor
based pipelines #3104 - Cleaned up and refactored
InvalidTargetDataCheck
implementation and docstring #3122 - Removed indices information from the output of
HighlyNullDataCheck
'svalidate()
method #3092 - Added
ReplaceNullableTypes
component to prepare for handling pandas nullable types. #3090 - Removed unused
EnsembleMissingPipelinesError
exception definition #3131
Documentation Changes
Testing Changes
- Refactored tests to avoid using
importorskip
#3126 - Added
skip_during_conda
test marker to skip tests that are not supposed to run during conda build #3127 - Added
skip_if_39
test marker to skip tests that are not supposed to run during python 3.9 #3133
Breaking Changes
- Renamed
DelayedFeatureTransformer
toTimeSeriesFeaturizer
#3028 ProphetRegressor
now requires a datetime column inX
represented by thedate_index
parameter #3104- Renamed module
evalml.data_checks.invalid_target_data_check
toevalml.data_checks.invalid_targets_data_check
#3122 - Removed unused
EnsembleMissingPipelinesError
exception definition #3131
v0.38.0
v0.38.0 Nov. 29, 2021
Enhancements
- Added
data_check_name
attribute to the data check action class #3034 - Added
NumWords
andNumCharacters
primitives toTextFeaturizer
and renamedTextFeaturizer` to
NaturalLanguageFeaturizer`` #3030 - Added support for
scikit-learn > 1.0.0
#3051 - Required the
date_index
parameter to be specified for time series problems inAutoMLSearch
#3041 - Allowed time series pipelines to predict on test datasets whose length is less than or equal to the
forecast_horizon
. Also allowed the test set index to start at 0. #3071 - Enabled time series pipeline to predict on data with features that are not known-in-advanced #3094
Fixes
- Added in error message when fit and predict/predict_proba data types are different #3036
- Fixed bug where ensembling components could not get converted to JSON format #3049
- Fixed bug where components with tuned integer hyperparameters could not get converted to JSON format #3049
- Included confusion matrix at the pipeline threshold for
find_confusion_matrix_per_threshold
#3080 - Fixed bug where One Hot Encoder would error out if a non-categorical feature had a missing value #3083
- Fixed bug where features created from categorical columns by
Delayed Feature Transformer
would be inferred as categorical #3083
Changes
- Delete
predict_uses_y
estimator attribute #3069 - Change
DateTimeFeaturizer
to use corresponding Featuretools primitives #3081 - Updated
TargetDistributionDataCheck
to return metadata details as floats rather strings #3085 - Removed dependency on
psutil
package #3093
Documentation Changes
- Updated docs to use data check action methods rather than manually cleaning data #3050
Testing Changes
- Updated integration tests to use
make_pipeline_from_actions
instead of private method #3047
Breaking Changes
- Added
data_check_name
attribute to the data check action class #3034 - Renamed
TextFeaturizer` to
NaturalLanguageFeaturizer`` #3030 - Updated the
Pipeline.graph_json
function to return a dictionary of "from" and "to" edges instead of tuples #3049 - Delete
predict_uses_y
estimator attribute #3069 - Changed time series problems in
AutoMLSearch
to need a not-None
date_index
#3041 - Changed the
DelayedFeatureTransformer
to throw aValueError
during fit if thedate_index
isNone
#3041 - Passing
X=None
toDelayedFeatureTransformer
is deprecated #3041
v0.37.0
v0.37.0 Nov. 10, 2021
Enhancements
- Added
find_confusion_matrix_per_threshold
to Model Understanding #2972 - Limit computationally-intensive models during
AutoMLSearch
for certain multiclass problems, allow for opt-in with parameterallow_long_running_models
#2982 - Added support for stacked ensemble pipelines to prediction explanations module #2971
- Added integration tests for data checks and data checks actions workflow #2883
- Added a change in pipeline structure to handle categorical columns separately for pipelines in
DefaultAlgorithm
#2986 - Added an algorithm to
DelayedFeatureTransformer
to select better lags #3005 - Added AutoML function to access ensemble pipeline's input pipelines IDs #3011
Fixes
- Fixed bug where
Oversampler
didn't consider boolean columns to be categorical #2980 - Fixed permutation importance failing when target is categorical #3017
- Updated estimator and pipelines'
predict
,predict_proba
,transform
,inverse_transform
methods to preserve input indices #2979 - Updated demo dataset link for daily min temperatures #3023
Changes
- Updated
OutliersDataCheck
andUniquenessDataCheck
and allow for the suspension of the Nullable types error #3018
Documentation Changes
v0.36.0
v0.36.0 Oct. 27, 2021
Enhancements
- Added LIME as an algorithm option for
explain_predictions
andexplain_predictions_best_worst
#2905 - Standardized data check messages and added default "rows" and "columns" to data check message details dictionary #2869
- Added
rows_of_interest
to pipeline utils #2908 - Added support for woodwork version
0.8.2
#2909 - Enhanced the
DateTimeFeaturizer
to handleNaNs
in date features #2909 - Added support for woodwork logical types
PostalCode
,SubRegionCode
, andCountryCode
in model understanding tools #2946 - Added Vowpal Wabbit regressor and classifiers #2846
Fixes
- Fixed bug where partial dependence was not respecting the ww schema #2929
- Fixed
calculate_permutation_importance
for datetimes onStandardScaler
#2938 - Fixed
SelectColumns
to only select available features for feature selection inDefaultAlgorithm
#2944 - Fixed
DropColumns
component not receiving parameters inDefaultAlgorithm
#2945 - Fixed bug where trained binary thresholds were not being returned by
get_pipeline
orclone
#2948 - Fixed bug where
Oversampler
selected ww logical categorical instead of ww semantic category #2946
Changes
- Changed
make_pipeline
function to place theDateTimeFeaturizer
prior to theImputer
so thatNaN
dates can be imputed #2909 - Refactored
OutliersDataCheck
andHighlyNullDataCheck
to add more descriptive metadata #2907
Documentation Changes
- Added back Future Release section to release notes #2927
- Updated CI to run doctest (docstring tests) and apply necessary fixes to docstrings #2933
- Added documentation for
BinaryClassificationPipeline
thresholding #2937
Testing Changes
- Fixed dependency checker to catch full names of packages #2930
- Refactored
build_conda_pkg
to work from a local recipe #2925
Breaking Changes
- Standardized data check messages and added default "rows" and "columns" to data check message details dictionary. This may change the number of messages returned from a data check. #2869
v0.35.0
v0.35.0 Oct. 14, 2021
Enhancements
- Added human-readable pipeline explanations to model understanding #2861
- Updated to support Featuretools 1.0.0 and nlp-primitives 2.0.0 #2848
Fixes
- Fixed bug where
long
mode for the top level search method was not respected #2875 - Pinned
cmdstan
to0.28.0
incmdstan-builder
to prevent future breaking of support for Prophet #2880 - Added
Jarque-Bera
to theTargetDistributionDataCheck
#2891
Changes
- Updated pipelines to use a label encoder component instead of doing encoding on the pipeline level #2821
- Deleted scikit-learn ensembler #2819
- Refactored pipeline building logic out of
AutoMLSearch
and intoIterativeAlgorithm
#2854 - Refactored names for methods in
ComponentGraph
andPipelineBase
#2902
Documentation Changes
- Updated
install.ipynb
to reflect flexibility forcmdstan
version installation #2880 - Updated the conda section of our contributing guide #2899
Testing Changes
- Updated
test_all_estimators
to account for Prophet being allowed for Python 3.9 #2892 - Updated linux tests to use
cmdstan-builder==0.0.8
#2880
Breaking Changes
- Updated pipelines to use a label encoder component instead of doing encoding on the pipeline level. This means that pipelines will no longer automatically encode non-numerical targets. Please use a label encoder if working with classification problems and non-numeric targets. #2821
- Deleted scikit-learn ensembler #2819
IterativeAlgorithm
now requires X, y, problem_type as required arguments as well as sampler_name, allowed_model_families, allowed_component_graphs, max_batches, and verbose as optional arguments #2854- Changed method names of
fit_features
andcompute_final_component_features
tofit_and_transform_all_but_final
andtransform_all_but_final
inComponentGraph
, andcompute_estimator_features
totransform_all_but_final
in pipeline classes #2902
v0.34.1rc1
v0.34.1rc1 Oct. 1, 2021
Enhancements
- Updated to support Featuretools 1.0.0 and nlp-primitives 2.0.0 #2848
v0.34.0
v0.34.0 Oct. 1, 2021
Enhancements
- Updated to work with Woodwork 0.8.1 #2783
- Added validation that
training_data
andtraining_target
are notNone
in prediction explanations #2787 - Added support for training-only components in pipelines and component graphs #2776
- Added default argument for the parameters value for
ComponentGraph.instantiate
#2796 - Added
TIME_SERIES_REGRESSION
toLightGBMRegressor's
supported problem types #2793 - Added validation to holdout data passed to
predict
andpredict_proba
for time series #2804 - Added information about which row indices are outliers in
OutliersDataCheck
#2818 - Added verbose flag to top level
search()
method #2813 - Added support for linting jupyter notebooks and clearing the executed cells and empty cells #2829 #2837
- Added "DROP_ROWS" action to output of
OutliersDataCheck.validate()
#2820 - Added the ability of
AutoMLSearch
to accept aSequentialEngine
instance as engine input #2838 - Added new label encoder component to EvalML #2853
- Added our own partial dependence implementation #2834
Fixes
- Fixed bug where
calculate_permutation_importance
was not calculating the right value for pipelines with target transformers #2782 - Fixed bug where transformed target values were not used in
fit
for time series pipelines #2780 - Fixed bug where
score_pipelines
method ofAutoMLSearch
would not work for time series problems #2786 - Removed
TargetTransformer
class #2833 - Added tests to verify
ComponentGraph
support by pipelines #2830 - Fixed incorrect parameter for baseline regression pipeline in
AutoMLSearch
#2847
Changes
- Changed woodwork initialization to use partial schemas #2774
- Made
Transformer.transform()
an abstract method #2744 - Deleted
EmptyDataChecks
class #2794 - Removed data check for checking log distributions in
make_pipeline
#2806 - Changed the minimum
woodwork
version to 0.8.0 #2783 - Pinned
woodwork
version to 0.8.0 #2832 - Removed
model_family
attribute fromComponentBase
and transformers #2828 - Limited
scikit-learn
until new features and errors can be addressed #2842 - Show DeprecationWarning when Sklearn Ensemblers are called #2859
Testing Changes
- Updated matched assertion message regarding monotonic indices in polynomial detrender tests #2811
- Added a test to make sure pip versions match conda versions #2851
Breaking Changes
v0.33.0
v0.32.1
v0.32.1 Sep. 10, 2021
Enhancements
- Added
verbose
flag toAutoMLSearch
to run search in silent mode by default #2645 - Added label encoder to
XGBoostClassifier
to remove the warning #2701 - Set
eval_metric
tologloss
forXGBoostClassifier
#2741 - Added support for
woodwork
versions0.7.0
and0.7.1
#2743 - Changed
explain_predictions
functions to display original feature values #2759 - Added
X_train
andy_train
tograph_prediction_vs_actual_over_time
andget_prediction_vs_actual_over_time_data
#2762 - Added
forecast_horizon
as a required parameter to time series pipelines andAutoMLSearch
#2697 - Added
predict_in_sample
andpredict_proba_in_sample
methods to time series pipelines to predict on data where the target is known, e.g. cross-validation #2697
Fixes
- Fixed bug where
_catch_warnings
assumed all warnings werePipelineNotUsed
#2753 - Fixed bug where
Imputer.transform
would erase ww typing information prior to handing data to theSimpleImputer
#2752 - Fixed bug where
Oversampler
could not be copied #2755
Changes
- Deleted
drop_nan_target_rows
utility method #2737 - Removed default logging setup and debugging log file #2645
- Changed the default n_jobs value for
XGBoostClassifier
andXGBoostRegressor
to 12 #2757 - Changed
TimeSeriesBaselineEstimator
to only work on a time series pipeline with aDelayedFeaturesTransformer
#2697 - Added
X_train
andy_train
as optional parameters to pipelinepredict
,predict_proba
. Only used for time series pipelines #2697 - Added
training_data
andtraining_target
as optional parameters toexplain_predictions
andexplain_predictions_best_worst
to support time series pipelines #2697 - Changed time series pipeline predictions to no longer output series/dataframes padded with NaNs. A prediction will be returned for every row in the
X
input #2697
Documentation Changes
- Specified installation steps for Prophet #2713
- Added documentation for data exploration on data check actions #2696
- Added a user guide entry for time series modelling #2697
Testing Changes
- Fixed flaky
TargetDistributionDataCheck
test for very_lognormal distribution #2748
Breaking Changes
- Removed default logging setup and debugging log file #2645
- Added
X_train
andy_train
tograph_prediction_vs_actual_over_time
andget_prediction_vs_actual_over_time_data
#2762 - Added
forecast_horizon
as a required parameter to time series pipelines andAutoMLSearch
#2697 - Changed
TimeSeriesBaselineEstimator
to only work on a time series pipeline with aDelayedFeaturesTransformer
#2697 - Added
X_train
andy_train
as required parameters forpredict
andpredict_proba
in time series pipelines #2697 - Added
training_data
andtraining_target
as required parameters toexplain_predictions
andexplain_predictions_best_worst
for time series pipelines #2697
v0.32.0
v0.32.0 Sep. 1, 2021
Enhancements
- Allow string for
engine
parameter forAutoMLSearch
#2667 - Add
ProphetRegressor
to AutoML #2619 - Integrated
DefaultAlgorithm
intoAutoMLSearch
#2634 - Removed SVM "linear" and "precomputed" kernel hyperparameter options, and improved default parameters #2651
- Updated
ComponentGraph
initalization to raiseValueError
when user attempts to use.y
for a component that does not produce a tuple output #2662 - Updated to support Woodwork 0.6.0 #2690
- Updated pipeline
graph()
to distingush X and y edges #2654 - Added
DropRowsTransformer
component #2692 - Added
DROP_ROWS
to_make_component_list_from_actions
and clean up metadata #2694
Fixes
- Updated Oversampler logic to select best SMOTE based on component input instead of pipeline input #2695
- Added ability to explicitly close DaskEngine resources to improve runtime and reduce Dask warnings #2667
- Fixed partial dependence bug for ensemble pipelines #2714
- Updated
TargetLeakageDataCheck
to maintain user-selected logical types #2711
Changes
- Replaced
SMOTEOversampler
,SMOTENOversampler
andSMOTENCOversampler
with consolidatedOversampler
component #2695 - Removed
LinearRegressor
from the list of defaultAutoMLSearch
estimators due to poor performance #2660
Documentation Changes
- Updated documentation to make parallelization of AutoML clearer #2667
Testing Changes
- Removes the process-level parallelism from the
test_cancel_job
test #2666 - Installed numba 0.53 in windows CI to prevent problems installing version 0.54 #2710
Breaking Changes
- Renamed the current top level
search
method tosearch_iterative
and defined a newsearch
method for theDefaultAlgorithm
#2634 - Replaced
SMOTEOversampler
,SMOTENOversampler
andSMOTENCOversampler
with consolidatedOversampler
component #2695 - Removed
LinearRegressor
from the list of defaultAutoMLSearch
estimators due to poor performance #2660