diff --git a/dev/api/index.html b/dev/api/index.html
index 923283f7..83e1c7ba 100644
--- a/dev/api/index.html
+++ b/dev/api/index.html
@@ -12,9 +12,9 @@
     print_every_n=9999,
     verbosity=1,
     return_logger=false,
-    device=&quot;cpu&quot;)</code></pre><p>Main training function. Performs model fitting given configuration <code>params</code>, <code>dtrain</code>, <code>target_name</code> and other optional kwargs. </p><p><strong>Arguments</strong></p><ul><li><code>params::EvoTypes</code>: configuration info providing hyper-paramters. <code>EvoTypes</code> can be one of: <ul><li><a href="../models/#EvoTrees.EvoTreeRegressor"><code>EvoTreeRegressor</code></a></li><li><a href="../models/#EvoTrees.EvoTreeClassifier"><code>EvoTreeClassifier</code></a></li><li><a href="../models/#EvoTrees.EvoTreeCount"><code>EvoTreeCount</code></a></li><li><a href="../models/#EvoTrees.EvoTreeMLE"><code>EvoTreeMLE</code></a></li></ul></li><li><code>dtrain</code>: A Tables compatible training data (named tuples, DataFrame...) containing features and target variables. </li></ul><p><strong>Keyword arguments</strong></p><ul><li><code>target_name</code>: name of target variable. </li><li><code>fnames = nothing</code>: the names of the <code>x_train</code> features. If provided, should be a vector of string with <code>length(fnames) = size(x_train, 2)</code>.</li><li><code>w_name = nothing</code>: name of the variable containing weights. If <code>nothing</code>, common weights on one will be used.</li><li><code>offset_name = nothing</code>: name of the offset variable.</li><li><code>deval</code>: A Tables compatible evaluation data containing features and target variables. </li><li><code>metric</code>: The evaluation metric that wil be tracked on <code>deval</code>.    Supported metrics are: <ul><li><code>:mse</code>: mean-squared error. Adapted for general regression models.</li><li><code>:rmse</code>: root-mean-squared error (CPU only). Adapted for general regression models.</li><li><code>:mae</code>: mean absolute error. Adapted for general regression models.</li><li><code>:logloss</code>: Adapted for <code>:logistic</code> regression models.</li><li><code>:mlogloss</code>: Multi-class cross entropy. Adapted to <code>EvoTreeClassifier</code> classification models. </li><li><code>:poisson</code>: Poisson deviance. Adapted to <code>EvoTreeCount</code> count models.</li><li><code>:gamma</code>: Gamma deviance. Adapted to regression problem on Gamma like, positively distributed targets.</li><li><code>:tweedie</code>: Tweedie deviance. Adapted to regression problem on Tweedie like, positively distributed targets with probability mass at <code>y == 0</code>.</li><li><code>:gaussian_mle</code>: Gaussian maximum log-likelihood. Adapted to <code>EvoTreeMLE</code> models with <code>loss = :gaussian_mle</code>. </li><li><code>:logistic_mle</code>: Logistic maximum log-likelihood. Adapted to <code>EvoTreeMLE</code> models with <code>loss = :logistic_mle</code>. </li></ul></li><li><code>early_stopping_rounds::Integer</code>: number of consecutive rounds without metric improvement after which fitting in stopped. </li><li><code>print_every_n</code>: sets at which frequency logging info should be printed. </li><li><code>verbosity</code>: set to 1 to print logging info during training.</li><li><code>return_logger::Bool = false</code>: if set to true (default), <code>fit_evotree</code> return a tuple <code>(m, logger)</code> where logger is a dict containing various tracking information.</li><li><code>device=&quot;cpu&quot;</code>: Hardware device to use for computations. Can be either <code>&quot;cpu&quot;</code> or <code>&quot;gpu&quot;</code>. Following losses are not GPU supported at the moment<code>:l1</code>, <code>:quantile</code>, <code>:logistic_mle</code>.</li></ul></div><a class="docs-sourcelink" target="_blank" href="https://github.com/Evovest/EvoTrees.jl/blob/dd0a4430da63c293fbb04fce58da71ddfaf406ff/src/fit.jl#L298-L350">source</a></section><section><div><pre><code class="nohighlight hljs">fit_evotree(params::EvoTypes{L};
+    device=&quot;cpu&quot;)</code></pre><p>Main training function. Performs model fitting given configuration <code>params</code>, <code>dtrain</code>, <code>target_name</code> and other optional kwargs. </p><p><strong>Arguments</strong></p><ul><li><code>params::EvoTypes</code>: configuration info providing hyper-paramters. <code>EvoTypes</code> can be one of: <ul><li><a href="../models/#EvoTrees.EvoTreeRegressor"><code>EvoTreeRegressor</code></a></li><li><a href="../models/#EvoTrees.EvoTreeClassifier"><code>EvoTreeClassifier</code></a></li><li><a href="../models/#EvoTrees.EvoTreeCount"><code>EvoTreeCount</code></a></li><li><a href="../models/#EvoTrees.EvoTreeMLE"><code>EvoTreeMLE</code></a></li></ul></li><li><code>dtrain</code>: A Tables compatible training data (named tuples, DataFrame...) containing features and target variables. </li></ul><p><strong>Keyword arguments</strong></p><ul><li><code>target_name</code>: name of target variable. </li><li><code>fnames = nothing</code>: the names of the <code>x_train</code> features. If provided, should be a vector of string with <code>length(fnames) = size(x_train, 2)</code>.</li><li><code>w_name = nothing</code>: name of the variable containing weights. If <code>nothing</code>, common weights on one will be used.</li><li><code>offset_name = nothing</code>: name of the offset variable.</li><li><code>deval</code>: A Tables compatible evaluation data containing features and target variables. </li><li><code>metric</code>: The evaluation metric that wil be tracked on <code>deval</code>.    Supported metrics are: <ul><li><code>:mse</code>: mean-squared error. Adapted for general regression models.</li><li><code>:rmse</code>: root-mean-squared error (CPU only). Adapted for general regression models.</li><li><code>:mae</code>: mean absolute error. Adapted for general regression models.</li><li><code>:logloss</code>: Adapted for <code>:logistic</code> regression models.</li><li><code>:mlogloss</code>: Multi-class cross entropy. Adapted to <code>EvoTreeClassifier</code> classification models. </li><li><code>:poisson</code>: Poisson deviance. Adapted to <code>EvoTreeCount</code> count models.</li><li><code>:gamma</code>: Gamma deviance. Adapted to regression problem on Gamma like, positively distributed targets.</li><li><code>:tweedie</code>: Tweedie deviance. Adapted to regression problem on Tweedie like, positively distributed targets with probability mass at <code>y == 0</code>.</li><li><code>:gaussian_mle</code>: Gaussian maximum log-likelihood. Adapted to <code>EvoTreeMLE</code> models with <code>loss = :gaussian_mle</code>. </li><li><code>:logistic_mle</code>: Logistic maximum log-likelihood. Adapted to <code>EvoTreeMLE</code> models with <code>loss = :logistic_mle</code>. </li></ul></li><li><code>early_stopping_rounds::Integer</code>: number of consecutive rounds without metric improvement after which fitting in stopped. </li><li><code>print_every_n</code>: sets at which frequency logging info should be printed. </li><li><code>verbosity</code>: set to 1 to print logging info during training.</li><li><code>return_logger::Bool = false</code>: if set to true (default), <code>fit_evotree</code> return a tuple <code>(m, logger)</code> where logger is a dict containing various tracking information.</li><li><code>device=&quot;cpu&quot;</code>: Hardware device to use for computations. Can be either <code>&quot;cpu&quot;</code> or <code>&quot;gpu&quot;</code>. Following losses are not GPU supported at the moment<code>:l1</code>, <code>:quantile</code>, <code>:logistic_mle</code>.</li></ul></div><a class="docs-sourcelink" target="_blank" href="https://github.com/Evovest/EvoTrees.jl/blob/7f70bafae65cdc06117c37f80b74b19ed4bc54e5/src/fit.jl#L298-L350">source</a></section><section><div><pre><code class="nohighlight hljs">fit_evotree(params::EvoTypes{L};
     x_train::AbstractMatrix, y_train::AbstractVector, w_train=nothing, offset_train=nothing,
     x_eval=nothing, y_eval=nothing, w_eval=nothing, offset_eval=nothing,
     early_stopping_rounds=9999,
     print_every_n=9999,
-    verbosity=1)</code></pre><p>Main training function. Performs model fitting given configuration <code>params</code>, <code>x_train</code>, <code>y_train</code> and other optional kwargs. </p><p><strong>Arguments</strong></p><ul><li><code>params::EvoTypes</code>: configuration info providing hyper-paramters. <code>EvoTypes</code> can be one of: <ul><li><a href="../models/#EvoTrees.EvoTreeRegressor"><code>EvoTreeRegressor</code></a></li><li><a href="../models/#EvoTrees.EvoTreeClassifier"><code>EvoTreeClassifier</code></a></li><li><a href="../models/#EvoTrees.EvoTreeCount"><code>EvoTreeCount</code></a></li><li><a href="../models/#EvoTrees.EvoTreeMLE"><code>EvoTreeMLE</code></a></li></ul></li></ul><p><strong>Keyword arguments</strong></p><ul><li><code>x_train::Matrix</code>: training data of size <code>[#observations, #features]</code>. </li><li><code>y_train::Vector</code>: vector of train targets of length <code>#observations</code>.</li><li><code>w_train::Vector</code>: vector of train weights of length <code>#observations</code>. If <code>nothing</code>, a vector of ones is assumed.</li><li><code>offset_train::VecOrMat</code>: offset for the training data. Should match the size of the predictions.</li><li><code>x_eval::Matrix</code>: evaluation data of size <code>[#observations, #features]</code>. </li><li><code>y_eval::Vector</code>: vector of evaluation targets of length <code>#observations</code>.</li><li><code>w_eval::Vector</code>: vector of evaluation weights of length <code>#observations</code>. Defaults to <code>nothing</code> (assumes a vector of 1s).</li><li><code>offset_eval::VecOrMat</code>: evaluation data offset. Should match the size of the predictions.</li><li><code>metric</code>: The evaluation metric that wil be tracked on <code>x_eval</code>, <code>y_eval</code> and optionally <code>w_eval</code> / <code>offset_eval</code> data.    Supported metrics are: <ul><li><code>:mse</code>: mean-squared error. Adapted for general regression models.</li><li><code>:rmse</code>: root-mean-squared error (CPU only). Adapted for general regression models.</li><li><code>:mae</code>: mean absolute error. Adapted for general regression models.</li><li><code>:logloss</code>: Adapted for <code>:logistic</code> regression models.</li><li><code>:mlogloss</code>: Multi-class cross entropy. Adapted to <code>EvoTreeClassifier</code> classification models. </li><li><code>:poisson</code>: Poisson deviance. Adapted to <code>EvoTreeCount</code> count models.</li><li><code>:gamma</code>: Gamma deviance. Adapted to regression problem on Gamma like, positively distributed targets.</li><li><code>:tweedie</code>: Tweedie deviance. Adapted to regression problem on Tweedie like, positively distributed targets with probability mass at <code>y == 0</code>.</li><li><code>:gaussian_mle</code>: Gaussian maximum log-likelihood. Adapted to <code>EvoTreeMLE</code> models with <code>loss = :gaussian_mle</code>. </li><li><code>:logistic_mle</code>: Logistic maximum log-likelihood. Adapted to <code>EvoTreeMLE</code> models with <code>loss = :logistic_mle</code>. </li></ul></li><li><code>early_stopping_rounds::Integer</code>: number of consecutive rounds without metric improvement after which fitting in stopped. </li><li><code>print_every_n</code>: sets at which frequency logging info should be printed. </li><li><code>verbosity</code>: set to 1 to print logging info during training.</li><li><code>fnames</code>: the names of the <code>x_train</code> features. If provided, should be a vector of string with <code>length(fnames) = size(x_train, 2)</code>.</li><li><code>return_logger::Bool = false</code>: if set to true (default), <code>fit_evotree</code> return a tuple <code>(m, logger)</code> where logger is a dict containing various tracking information.</li><li><code>device=&quot;cpu&quot;</code>: Hardware device to use for computations. Can be either <code>&quot;cpu&quot;</code> or <code>&quot;gpu&quot;</code>. Following losses are not GPU supported at the moment<code>:l1</code>, <code>:quantile</code>, <code>:logistic_mle</code>.</li></ul></div><a class="docs-sourcelink" target="_blank" href="https://github.com/Evovest/EvoTrees.jl/blob/dd0a4430da63c293fbb04fce58da71ddfaf406ff/src/fit.jl#L414-L461">source</a></section></article><h2 id="Predict"><a class="docs-heading-anchor" href="#Predict">Predict</a><a id="Predict-1"></a><a class="docs-heading-anchor-permalink" href="#Predict" title="Permalink"></a></h2><article class="docstring"><header><a class="docstring-binding" id="MLJModelInterface.predict" href="#MLJModelInterface.predict"><code>MLJModelInterface.predict</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">predict(model::EvoTree, X::AbstractMatrix; ntree_limit = length(model.trees))</code></pre><p>Predictions from an EvoTree model - sums the predictions from all trees composing the model. Use <code>ntree_limit=N</code> to only predict with the first <code>N</code> trees.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/Evovest/EvoTrees.jl/blob/dd0a4430da63c293fbb04fce58da71ddfaf406ff/src/predict.jl#L77-L82">source</a></section></article><h2 id="Features-Importance"><a class="docs-heading-anchor" href="#Features-Importance">Features Importance</a><a id="Features-Importance-1"></a><a class="docs-heading-anchor-permalink" href="#Features-Importance" title="Permalink"></a></h2><article class="docstring"><header><a class="docstring-binding" id="EvoTrees.importance" href="#EvoTrees.importance"><code>EvoTrees.importance</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">importance(model::EvoTree; fnames=model.info[:fnames])</code></pre><p>Sorted normalized feature importance based on loss function gain. Feature names associated to the model are stored in <code>model.info[:fnames]</code> as a string <code>Vector</code> and can be updated at any time. Eg: <code>model.info[:fnames] = new_fnames_vec</code>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/Evovest/EvoTrees.jl/blob/dd0a4430da63c293fbb04fce58da71ddfaf406ff/src/importance.jl#L9-L14">source</a></section></article></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../models/">« Models</a><a class="docs-footer-nextpage" href="../tutorials/regression-boston/">Regression - Boston »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 0.27.25 on <span class="colophon-date" title="Monday 11 September 2023 00:43">Monday 11 September 2023</span>. Using Julia version 1.6.7.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
+    verbosity=1)</code></pre><p>Main training function. Performs model fitting given configuration <code>params</code>, <code>x_train</code>, <code>y_train</code> and other optional kwargs. </p><p><strong>Arguments</strong></p><ul><li><code>params::EvoTypes</code>: configuration info providing hyper-paramters. <code>EvoTypes</code> can be one of: <ul><li><a href="../models/#EvoTrees.EvoTreeRegressor"><code>EvoTreeRegressor</code></a></li><li><a href="../models/#EvoTrees.EvoTreeClassifier"><code>EvoTreeClassifier</code></a></li><li><a href="../models/#EvoTrees.EvoTreeCount"><code>EvoTreeCount</code></a></li><li><a href="../models/#EvoTrees.EvoTreeMLE"><code>EvoTreeMLE</code></a></li></ul></li></ul><p><strong>Keyword arguments</strong></p><ul><li><code>x_train::Matrix</code>: training data of size <code>[#observations, #features]</code>. </li><li><code>y_train::Vector</code>: vector of train targets of length <code>#observations</code>.</li><li><code>w_train::Vector</code>: vector of train weights of length <code>#observations</code>. If <code>nothing</code>, a vector of ones is assumed.</li><li><code>offset_train::VecOrMat</code>: offset for the training data. Should match the size of the predictions.</li><li><code>x_eval::Matrix</code>: evaluation data of size <code>[#observations, #features]</code>. </li><li><code>y_eval::Vector</code>: vector of evaluation targets of length <code>#observations</code>.</li><li><code>w_eval::Vector</code>: vector of evaluation weights of length <code>#observations</code>. Defaults to <code>nothing</code> (assumes a vector of 1s).</li><li><code>offset_eval::VecOrMat</code>: evaluation data offset. Should match the size of the predictions.</li><li><code>metric</code>: The evaluation metric that wil be tracked on <code>x_eval</code>, <code>y_eval</code> and optionally <code>w_eval</code> / <code>offset_eval</code> data.    Supported metrics are: <ul><li><code>:mse</code>: mean-squared error. Adapted for general regression models.</li><li><code>:rmse</code>: root-mean-squared error (CPU only). Adapted for general regression models.</li><li><code>:mae</code>: mean absolute error. Adapted for general regression models.</li><li><code>:logloss</code>: Adapted for <code>:logistic</code> regression models.</li><li><code>:mlogloss</code>: Multi-class cross entropy. Adapted to <code>EvoTreeClassifier</code> classification models. </li><li><code>:poisson</code>: Poisson deviance. Adapted to <code>EvoTreeCount</code> count models.</li><li><code>:gamma</code>: Gamma deviance. Adapted to regression problem on Gamma like, positively distributed targets.</li><li><code>:tweedie</code>: Tweedie deviance. Adapted to regression problem on Tweedie like, positively distributed targets with probability mass at <code>y == 0</code>.</li><li><code>:gaussian_mle</code>: Gaussian maximum log-likelihood. Adapted to <code>EvoTreeMLE</code> models with <code>loss = :gaussian_mle</code>. </li><li><code>:logistic_mle</code>: Logistic maximum log-likelihood. Adapted to <code>EvoTreeMLE</code> models with <code>loss = :logistic_mle</code>. </li></ul></li><li><code>early_stopping_rounds::Integer</code>: number of consecutive rounds without metric improvement after which fitting in stopped. </li><li><code>print_every_n</code>: sets at which frequency logging info should be printed. </li><li><code>verbosity</code>: set to 1 to print logging info during training.</li><li><code>fnames</code>: the names of the <code>x_train</code> features. If provided, should be a vector of string with <code>length(fnames) = size(x_train, 2)</code>.</li><li><code>return_logger::Bool = false</code>: if set to true (default), <code>fit_evotree</code> return a tuple <code>(m, logger)</code> where logger is a dict containing various tracking information.</li><li><code>device=&quot;cpu&quot;</code>: Hardware device to use for computations. Can be either <code>&quot;cpu&quot;</code> or <code>&quot;gpu&quot;</code>. Following losses are not GPU supported at the moment<code>:l1</code>, <code>:quantile</code>, <code>:logistic_mle</code>.</li></ul></div><a class="docs-sourcelink" target="_blank" href="https://github.com/Evovest/EvoTrees.jl/blob/7f70bafae65cdc06117c37f80b74b19ed4bc54e5/src/fit.jl#L414-L461">source</a></section></article><h2 id="Predict"><a class="docs-heading-anchor" href="#Predict">Predict</a><a id="Predict-1"></a><a class="docs-heading-anchor-permalink" href="#Predict" title="Permalink"></a></h2><article class="docstring"><header><a class="docstring-binding" id="MLJModelInterface.predict" href="#MLJModelInterface.predict"><code>MLJModelInterface.predict</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">predict(model::EvoTree, X::AbstractMatrix; ntree_limit = length(model.trees))</code></pre><p>Predictions from an EvoTree model - sums the predictions from all trees composing the model. Use <code>ntree_limit=N</code> to only predict with the first <code>N</code> trees.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/Evovest/EvoTrees.jl/blob/7f70bafae65cdc06117c37f80b74b19ed4bc54e5/src/predict.jl#L77-L82">source</a></section></article><h2 id="Features-Importance"><a class="docs-heading-anchor" href="#Features-Importance">Features Importance</a><a id="Features-Importance-1"></a><a class="docs-heading-anchor-permalink" href="#Features-Importance" title="Permalink"></a></h2><article class="docstring"><header><a class="docstring-binding" id="EvoTrees.importance" href="#EvoTrees.importance"><code>EvoTrees.importance</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">importance(model::EvoTree; fnames=model.info[:fnames])</code></pre><p>Sorted normalized feature importance based on loss function gain. Feature names associated to the model are stored in <code>model.info[:fnames]</code> as a string <code>Vector</code> and can be updated at any time. Eg: <code>model.info[:fnames] = new_fnames_vec</code>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/Evovest/EvoTrees.jl/blob/7f70bafae65cdc06117c37f80b74b19ed4bc54e5/src/importance.jl#L9-L14">source</a></section></article></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../models/">« Models</a><a class="docs-footer-nextpage" href="../tutorials/regression-boston/">Regression - Boston »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 0.27.25 on <span class="colophon-date" title="Tuesday 12 September 2023 01:45">Tuesday 12 September 2023</span>. Using Julia version 1.6.7.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
diff --git a/dev/index.html b/dev/index.html
index 16b2fffa..17b3933d 100644
--- a/dev/index.html
+++ b/dev/index.html
@@ -20,4 +20,4 @@
 m2 = fit_evotree(config, df; target_name=&quot;y&quot;);</code></pre><p>However, the following <code>m1</code> and <code>m2</code> models won&#39;t be because the there&#39;s stochasticity involved in the model from <code>rowsample</code> and the random generator in the <code>config</code> isn&#39;t reset between the fits:</p><pre><code class="language-julia hljs">config = EvoTreeRegressor(rowsample=0.5, rng=123)
 m1 = fit_evotree(config, df; target_name=&quot;y&quot;);
 m2 = fit_evotree(config, df; target_name=&quot;y&quot;);</code></pre><p>Note that in presence of multiple identical or very highly correlated features, model may not be reproducible if features are permuted since in situation where 2 features provide identical gains, the first one will be selected. Therefore, if the identity relationship doesn&#39;t hold on new data, different predictions will be returned from models trained on different features order. </p><p>At the moment, there&#39;s no reproducibility guarantee on GPU, although this may change in the future. </p><h2 id="Save/Load"><a class="docs-heading-anchor" href="#Save/Load">Save/Load</a><a id="Save/Load-1"></a><a class="docs-heading-anchor-permalink" href="#Save/Load" title="Permalink"></a></h2><pre><code class="language-julia hljs">EvoTrees.save(m, &quot;data/model.bson&quot;)
-m = EvoTrees.load(&quot;data/model.bson&quot;);</code></pre></article><nav class="docs-footer"><a class="docs-footer-nextpage" href="models/">Models »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 0.27.25 on <span class="colophon-date" title="Monday 11 September 2023 00:43">Monday 11 September 2023</span>. Using Julia version 1.6.7.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
+m = EvoTrees.load(&quot;data/model.bson&quot;);</code></pre></article><nav class="docs-footer"><a class="docs-footer-nextpage" href="models/">Models »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 0.27.25 on <span class="colophon-date" title="Tuesday 12 September 2023 01:45">Tuesday 12 September 2023</span>. Using Julia version 1.6.7.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
diff --git a/dev/models/index.html b/dev/models/index.html
index 84ceaf1a..5f3ad24b 100644
--- a/dev/models/index.html
+++ b/dev/models/index.html
@@ -11,7 +11,7 @@
 model = EvoTreeRegressor(max_depth=5, nbins=32, nrounds=100)
 X, y = @load_boston
 mach = machine(model, X, y) |&gt; fit!
-preds = predict(mach, X)</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/Evovest/EvoTrees.jl/blob/dd0a4430da63c293fbb04fce58da71ddfaf406ff/src/MLJ.jl#L164-L290">source</a></section></article><h2 id="EvoTreeClassifier"><a class="docs-heading-anchor" href="#EvoTreeClassifier">EvoTreeClassifier</a><a id="EvoTreeClassifier-1"></a><a class="docs-heading-anchor-permalink" href="#EvoTreeClassifier" title="Permalink"></a></h2><article class="docstring"><header><a class="docstring-binding" id="EvoTrees.EvoTreeClassifier" href="#EvoTrees.EvoTreeClassifier"><code>EvoTrees.EvoTreeClassifier</code></a> — <span class="docstring-category">Type</span></header><section><div><p>EvoTreeClassifier(;kwargs...)</p><p>A model type for constructing a EvoTreeClassifier, based on <a href="https://github.com/Evovest/EvoTrees.jl">EvoTrees.jl</a>, and implementing both an internal API and the MLJ model interface. EvoTreeClassifier is used to perform multi-class classification, using cross-entropy loss.</p><p><strong>Hyper-parameters</strong></p><ul><li><code>nrounds=10</code>:                 Number of rounds. It corresponds to the number of trees that will be sequentially stacked. Must be &gt;= 1.</li><li><code>eta=0.1</code>:              Learning rate. Each tree raw predictions are scaled by <code>eta</code> prior to be added to the stack of predictions. Must be &gt; 0. A lower <code>eta</code> results in slower learning, requiring a higher <code>nrounds</code> but typically improves model performance.  </li><li><code>lambda::T=0.0</code>:              L2 regularization term on weights. Must be &gt;= 0. Higher lambda can result in a more robust model.</li><li><code>gamma::T=0.0</code>:               Minimum gain improvement needed to perform a node split. Higher gamma can result in a more robust model. Must be &gt;= 0.</li><li><code>max_depth=5</code>:                Maximum depth of a tree. Must be &gt;= 1. A tree of depth 1 is made of a single prediction leaf. A complete tree of depth N contains <code>2^(N - 1)</code> terminal leaves and <code>2^(N - 1) - 1</code> split nodes. Compute cost is proportional to <code>2^max_depth</code>. Typical optimal values are in the 3 to 9 range.</li><li><code>min_weight=1.0</code>:             Minimum weight needed in a node to perform a split. Matches the number of observations by default or the sum of weights as provided by the <code>weights</code> vector. Must be &gt; 0.</li><li><code>rowsample=1.0</code>:              Proportion of rows that are sampled at each iteration to build the tree. Should be in <code>]0, 1]</code>.</li><li><code>colsample=1.0</code>:              Proportion of columns / features that are sampled at each iteration to build the tree. Should be in <code>]0, 1]</code>.</li><li><code>nbins=32</code>:                   Number of bins into which each feature is quantized. Buckets are defined based on quantiles, hence resulting in equal weight bins. Should be between 2 and 255.</li><li><code>tree_type=&quot;binary&quot;</code>    Tree structure to be used. One of:<ul><li><code>binary</code>:       Each node of a tree is grown independently. Tree are built depthwise until max depth is reach or if min weight or gain (see <code>gamma</code>) stops further node splits.  </li><li><code>oblivious</code>:    A common splitting condition is imposed to all nodes of a given depth. </li></ul></li><li><code>rng=123</code>:                    Either an integer used as a seed to the random number generator or an actual random number generator (<code>::Random.AbstractRNG</code>).</li></ul><p><strong>Internal API</strong></p><p>Do <code>config = EvoTreeClassifier()</code> to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in EvoTreeClassifier(max_depth=...).</p><p><strong>Training model</strong></p><p>A model is built using <a href="../api/#EvoTrees.fit_evotree"><code>fit_evotree</code></a>:</p><pre><code class="language-julia hljs">model = fit_evotree(config; x_train, y_train, kwargs...)</code></pre><p><strong>Inference</strong></p><p>Predictions are obtained using <a href="../api/#MLJModelInterface.predict"><code>predict</code></a> which returns a <code>Matrix</code> of size <code>[nobs, K]</code> where <code>K</code> is the number of classes:</p><pre><code class="language-julia hljs">EvoTrees.predict(model, X)</code></pre><p>Alternatively, models act as a functor, returning predictions when called as a function with features as argument:</p><pre><code class="language-julia hljs">model(X)</code></pre><p><strong>MLJ</strong></p><p>From MLJ, the type can be imported using:</p><pre><code class="language-julia hljs">EvoTreeClassifier = @load EvoTreeClassifier pkg=EvoTrees</code></pre><p>Do <code>model = EvoTreeClassifier()</code> to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in <code>EvoTreeClassifier(loss=...)</code>.</p><p><strong>Training data</strong></p><p>In MLJ or MLJBase, bind an instance <code>model</code> to data with</p><pre><code class="nohighlight hljs">mach = machine(model, X, y)</code></pre><p>where</p><ul><li><code>X</code>: any table of input features (eg, a <code>DataFrame</code>) whose columns each have one of the following element scitypes: <code>Continuous</code>, <code>Count</code>, or <code>&lt;:OrderedFactor</code>; check column scitypes with <code>schema(X)</code></li><li><code>y</code>: is the target, which can be any <code>AbstractVector</code> whose element scitype is <code>&lt;:Multiclas</code> or <code>&lt;:OrderedFactor</code>; check the scitype with <code>scitype(y)</code></li></ul><p>Train the machine using <code>fit!(mach, rows=...)</code>.</p><p><strong>Operations</strong></p><ul><li><p><code>predict(mach, Xnew)</code>: return predictions of the target given features <code>Xnew</code> having the same scitype as <code>X</code> above. Predictions are probabilistic.</p></li><li><p><code>predict_mode(mach, Xnew)</code>: returns the mode of each of the prediction above.</p></li></ul><p><strong>Fitted parameters</strong></p><p>The fields of <code>fitted_params(mach)</code> are:</p><ul><li><code>:fitresult</code>: The <code>GBTree</code> object returned by EvoTrees.jl fitting algorithm.</li></ul><p><strong>Report</strong></p><p>The fields of <code>report(mach)</code> are:</p><ul><li><code>:features</code>: The names of the features encountered in training.</li></ul><p><strong>Examples</strong></p><pre><code class="nohighlight hljs"># Internal API
+preds = predict(mach, X)</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/Evovest/EvoTrees.jl/blob/7f70bafae65cdc06117c37f80b74b19ed4bc54e5/src/MLJ.jl#L164-L290">source</a></section></article><h2 id="EvoTreeClassifier"><a class="docs-heading-anchor" href="#EvoTreeClassifier">EvoTreeClassifier</a><a id="EvoTreeClassifier-1"></a><a class="docs-heading-anchor-permalink" href="#EvoTreeClassifier" title="Permalink"></a></h2><article class="docstring"><header><a class="docstring-binding" id="EvoTrees.EvoTreeClassifier" href="#EvoTrees.EvoTreeClassifier"><code>EvoTrees.EvoTreeClassifier</code></a> — <span class="docstring-category">Type</span></header><section><div><p>EvoTreeClassifier(;kwargs...)</p><p>A model type for constructing a EvoTreeClassifier, based on <a href="https://github.com/Evovest/EvoTrees.jl">EvoTrees.jl</a>, and implementing both an internal API and the MLJ model interface. EvoTreeClassifier is used to perform multi-class classification, using cross-entropy loss.</p><p><strong>Hyper-parameters</strong></p><ul><li><code>nrounds=10</code>:                 Number of rounds. It corresponds to the number of trees that will be sequentially stacked. Must be &gt;= 1.</li><li><code>eta=0.1</code>:              Learning rate. Each tree raw predictions are scaled by <code>eta</code> prior to be added to the stack of predictions. Must be &gt; 0. A lower <code>eta</code> results in slower learning, requiring a higher <code>nrounds</code> but typically improves model performance.  </li><li><code>lambda::T=0.0</code>:              L2 regularization term on weights. Must be &gt;= 0. Higher lambda can result in a more robust model.</li><li><code>gamma::T=0.0</code>:               Minimum gain improvement needed to perform a node split. Higher gamma can result in a more robust model. Must be &gt;= 0.</li><li><code>max_depth=5</code>:                Maximum depth of a tree. Must be &gt;= 1. A tree of depth 1 is made of a single prediction leaf. A complete tree of depth N contains <code>2^(N - 1)</code> terminal leaves and <code>2^(N - 1) - 1</code> split nodes. Compute cost is proportional to <code>2^max_depth</code>. Typical optimal values are in the 3 to 9 range.</li><li><code>min_weight=1.0</code>:             Minimum weight needed in a node to perform a split. Matches the number of observations by default or the sum of weights as provided by the <code>weights</code> vector. Must be &gt; 0.</li><li><code>rowsample=1.0</code>:              Proportion of rows that are sampled at each iteration to build the tree. Should be in <code>]0, 1]</code>.</li><li><code>colsample=1.0</code>:              Proportion of columns / features that are sampled at each iteration to build the tree. Should be in <code>]0, 1]</code>.</li><li><code>nbins=32</code>:                   Number of bins into which each feature is quantized. Buckets are defined based on quantiles, hence resulting in equal weight bins. Should be between 2 and 255.</li><li><code>tree_type=&quot;binary&quot;</code>    Tree structure to be used. One of:<ul><li><code>binary</code>:       Each node of a tree is grown independently. Tree are built depthwise until max depth is reach or if min weight or gain (see <code>gamma</code>) stops further node splits.  </li><li><code>oblivious</code>:    A common splitting condition is imposed to all nodes of a given depth. </li></ul></li><li><code>rng=123</code>:                    Either an integer used as a seed to the random number generator or an actual random number generator (<code>::Random.AbstractRNG</code>).</li></ul><p><strong>Internal API</strong></p><p>Do <code>config = EvoTreeClassifier()</code> to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in EvoTreeClassifier(max_depth=...).</p><p><strong>Training model</strong></p><p>A model is built using <a href="../api/#EvoTrees.fit_evotree"><code>fit_evotree</code></a>:</p><pre><code class="language-julia hljs">model = fit_evotree(config; x_train, y_train, kwargs...)</code></pre><p><strong>Inference</strong></p><p>Predictions are obtained using <a href="../api/#MLJModelInterface.predict"><code>predict</code></a> which returns a <code>Matrix</code> of size <code>[nobs, K]</code> where <code>K</code> is the number of classes:</p><pre><code class="language-julia hljs">EvoTrees.predict(model, X)</code></pre><p>Alternatively, models act as a functor, returning predictions when called as a function with features as argument:</p><pre><code class="language-julia hljs">model(X)</code></pre><p><strong>MLJ</strong></p><p>From MLJ, the type can be imported using:</p><pre><code class="language-julia hljs">EvoTreeClassifier = @load EvoTreeClassifier pkg=EvoTrees</code></pre><p>Do <code>model = EvoTreeClassifier()</code> to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in <code>EvoTreeClassifier(loss=...)</code>.</p><p><strong>Training data</strong></p><p>In MLJ or MLJBase, bind an instance <code>model</code> to data with</p><pre><code class="nohighlight hljs">mach = machine(model, X, y)</code></pre><p>where</p><ul><li><code>X</code>: any table of input features (eg, a <code>DataFrame</code>) whose columns each have one of the following element scitypes: <code>Continuous</code>, <code>Count</code>, or <code>&lt;:OrderedFactor</code>; check column scitypes with <code>schema(X)</code></li><li><code>y</code>: is the target, which can be any <code>AbstractVector</code> whose element scitype is <code>&lt;:Multiclas</code> or <code>&lt;:OrderedFactor</code>; check the scitype with <code>scitype(y)</code></li></ul><p>Train the machine using <code>fit!(mach, rows=...)</code>.</p><p><strong>Operations</strong></p><ul><li><p><code>predict(mach, Xnew)</code>: return predictions of the target given features <code>Xnew</code> having the same scitype as <code>X</code> above. Predictions are probabilistic.</p></li><li><p><code>predict_mode(mach, Xnew)</code>: returns the mode of each of the prediction above.</p></li></ul><p><strong>Fitted parameters</strong></p><p>The fields of <code>fitted_params(mach)</code> are:</p><ul><li><code>:fitresult</code>: The <code>GBTree</code> object returned by EvoTrees.jl fitting algorithm.</li></ul><p><strong>Report</strong></p><p>The fields of <code>report(mach)</code> are:</p><ul><li><code>:features</code>: The names of the features encountered in training.</li></ul><p><strong>Examples</strong></p><pre><code class="nohighlight hljs"># Internal API
 using EvoTrees
 config = EvoTreeClassifier(max_depth=5, nbins=32, nrounds=100)
 nobs, nfeats = 1_000, 5
@@ -24,7 +24,7 @@
 X, y = @load_iris
 mach = machine(model, X, y) |&gt; fit!
 preds = predict(mach, X)
-preds = predict_mode(mach, X)</code></pre><p>See also <a href="https://github.com/Evovest/EvoTrees.jl">EvoTrees.jl</a>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/Evovest/EvoTrees.jl/blob/dd0a4430da63c293fbb04fce58da71ddfaf406ff/src/MLJ.jl#L294-L415">source</a></section></article><h2 id="EvoTreeCount"><a class="docs-heading-anchor" href="#EvoTreeCount">EvoTreeCount</a><a id="EvoTreeCount-1"></a><a class="docs-heading-anchor-permalink" href="#EvoTreeCount" title="Permalink"></a></h2><article class="docstring"><header><a class="docstring-binding" id="EvoTrees.EvoTreeCount" href="#EvoTrees.EvoTreeCount"><code>EvoTrees.EvoTreeCount</code></a> — <span class="docstring-category">Type</span></header><section><div><p>EvoTreeCount(;kwargs...)</p><p>A model type for constructing a EvoTreeCount, based on <a href="https://github.com/Evovest/EvoTrees.jl">EvoTrees.jl</a>, and implementing both an internal API the MLJ model interface. EvoTreeCount is used to perform Poisson probabilistic regression on count target.</p><p><strong>Hyper-parameters</strong></p><ul><li><code>nrounds=10</code>:                 Number of rounds. It corresponds to the number of trees that will be sequentially stacked. Must be &gt;= 1.</li><li><code>eta=0.1</code>:              Learning rate. Each tree raw predictions are scaled by <code>eta</code> prior to be added to the stack of predictions. Must be &gt; 0. A lower <code>eta</code> results in slower learning, requiring a higher <code>nrounds</code> but typically improves model performance.  </li><li><code>lambda::T=0.0</code>:              L2 regularization term on weights. Must be &gt;= 0. Higher lambda can result in a more robust model. Must be &gt;= 0.</li><li><code>gamma::T=0.0</code>:               Minimum gain imprvement needed to perform a node split. Higher gamma can result in a more robust model.</li><li><code>max_depth=5</code>:                Maximum depth of a tree. Must be &gt;= 1. A tree of depth 1 is made of a single prediction leaf. A complete tree of depth N contains <code>2^(N - 1)</code> terminal leaves and <code>2^(N - 1) - 1</code> split nodes. Compute cost is proportional to 2^max_depth. Typical optimal values are in the 3 to 9 range.</li><li><code>min_weight=1.0</code>:             Minimum weight needed in a node to perform a split. Matches the number of observations by default or the sum of weights as provided by the <code>weights</code> vector. Must be &gt; 0.</li><li><code>rowsample=1.0</code>:              Proportion of rows that are sampled at each iteration to build the tree. Should be <code>]0, 1]</code>.</li><li><code>colsample=1.0</code>:              Proportion of columns / features that are sampled at each iteration to build the tree. Should be <code>]0, 1]</code>.</li><li><code>nbins=32</code>:                   Number of bins into which each feature is quantized. Buckets are defined based on quantiles, hence resulting in equal weight bins. Should be between 2 and 255.</li><li><code>monotone_constraints=Dict{Int, Int}()</code>: Specify monotonic constraints using a dict where the key is the feature index and the value the applicable constraint (-1=decreasing, 0=none, 1=increasing).</li><li><code>tree_type=&quot;binary&quot;</code>    Tree structure to be used. One of:<ul><li><code>binary</code>:       Each node of a tree is grown independently. Tree are built depthwise until max depth is reach or if min weight or gain (see <code>gamma</code>) stops further node splits.  </li><li><code>oblivious</code>:    A common splitting condition is imposed to all nodes of a given depth. </li></ul></li><li><code>rng=123</code>:                    Either an integer used as a seed to the random number generator or an actual random number generator (<code>::Random.AbstractRNG</code>).</li></ul><p><strong>Internal API</strong></p><p>Do <code>config = EvoTreeCount()</code> to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in EvoTreeCount(max_depth=...).</p><p><strong>Training model</strong></p><p>A model is built using <a href="../api/#EvoTrees.fit_evotree"><code>fit_evotree</code></a>:</p><pre><code class="language-julia hljs">model = fit_evotree(config; x_train, y_train, kwargs...)</code></pre><p><strong>Inference</strong></p><p>Predictions are obtained using <a href="../api/#MLJModelInterface.predict"><code>predict</code></a> which returns a <code>Matrix</code> of size <code>[nobs, 1]</code>:</p><pre><code class="language-julia hljs">EvoTrees.predict(model, X)</code></pre><p>Alternatively, models act as a functor, returning predictions when called as a function with features as argument:</p><pre><code class="language-julia hljs">model(X)</code></pre><p><strong>MLJ</strong></p><p>From MLJ, the type can be imported using:</p><pre><code class="language-julia hljs">EvoTreeCount = @load EvoTreeCount pkg=EvoTrees</code></pre><p>Do <code>model = EvoTreeCount()</code> to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in <code>EvoTreeCount(loss=...)</code>.</p><p><strong>Training data</strong></p><p>In MLJ or MLJBase, bind an instance <code>model</code> to data with     mach = machine(model, X, y) where</p><ul><li><code>X</code>: any table of input features (eg, a <code>DataFrame</code>) whose columns each have one of the following element scitypes: <code>Continuous</code>, <code>Count</code>, or <code>&lt;:OrderedFactor</code>; check column scitypes with <code>schema(X)</code></li><li><code>y</code>: is the target, which can be any <code>AbstractVector</code> whose element scitype is <code>&lt;:Count</code>; check the scitype with <code>scitype(y)</code></li></ul><p>Train the machine using <code>fit!(mach, rows=...)</code>.</p><p><strong>Operations</strong></p><ul><li><code>predict(mach, Xnew)</code>: returns a vector of Poisson distributions given features <code>Xnew</code> having the same scitype as <code>X</code> above. Predictions are probabilistic.</li></ul><p>Specific metrics can also be predicted using:</p><ul><li><code>predict_mean(mach, Xnew)</code></li><li><code>predict_mode(mach, Xnew)</code></li><li><code>predict_median(mach, Xnew)</code></li></ul><p><strong>Fitted parameters</strong></p><p>The fields of <code>fitted_params(mach)</code> are:</p><ul><li><code>:fitresult</code>: The <code>GBTree</code> object returned by EvoTrees.jl fitting algorithm.</li></ul><p><strong>Report</strong></p><p>The fields of <code>report(mach)</code> are:</p><ul><li><code>:features</code>: The names of the features encountered in training.</li></ul><p><strong>Examples</strong></p><pre><code class="nohighlight hljs"># Internal API
+preds = predict_mode(mach, X)</code></pre><p>See also <a href="https://github.com/Evovest/EvoTrees.jl">EvoTrees.jl</a>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/Evovest/EvoTrees.jl/blob/7f70bafae65cdc06117c37f80b74b19ed4bc54e5/src/MLJ.jl#L294-L415">source</a></section></article><h2 id="EvoTreeCount"><a class="docs-heading-anchor" href="#EvoTreeCount">EvoTreeCount</a><a id="EvoTreeCount-1"></a><a class="docs-heading-anchor-permalink" href="#EvoTreeCount" title="Permalink"></a></h2><article class="docstring"><header><a class="docstring-binding" id="EvoTrees.EvoTreeCount" href="#EvoTrees.EvoTreeCount"><code>EvoTrees.EvoTreeCount</code></a> — <span class="docstring-category">Type</span></header><section><div><p>EvoTreeCount(;kwargs...)</p><p>A model type for constructing a EvoTreeCount, based on <a href="https://github.com/Evovest/EvoTrees.jl">EvoTrees.jl</a>, and implementing both an internal API the MLJ model interface. EvoTreeCount is used to perform Poisson probabilistic regression on count target.</p><p><strong>Hyper-parameters</strong></p><ul><li><code>nrounds=10</code>:                 Number of rounds. It corresponds to the number of trees that will be sequentially stacked. Must be &gt;= 1.</li><li><code>eta=0.1</code>:              Learning rate. Each tree raw predictions are scaled by <code>eta</code> prior to be added to the stack of predictions. Must be &gt; 0. A lower <code>eta</code> results in slower learning, requiring a higher <code>nrounds</code> but typically improves model performance.  </li><li><code>lambda::T=0.0</code>:              L2 regularization term on weights. Must be &gt;= 0. Higher lambda can result in a more robust model. Must be &gt;= 0.</li><li><code>gamma::T=0.0</code>:               Minimum gain imprvement needed to perform a node split. Higher gamma can result in a more robust model.</li><li><code>max_depth=5</code>:                Maximum depth of a tree. Must be &gt;= 1. A tree of depth 1 is made of a single prediction leaf. A complete tree of depth N contains <code>2^(N - 1)</code> terminal leaves and <code>2^(N - 1) - 1</code> split nodes. Compute cost is proportional to 2^max_depth. Typical optimal values are in the 3 to 9 range.</li><li><code>min_weight=1.0</code>:             Minimum weight needed in a node to perform a split. Matches the number of observations by default or the sum of weights as provided by the <code>weights</code> vector. Must be &gt; 0.</li><li><code>rowsample=1.0</code>:              Proportion of rows that are sampled at each iteration to build the tree. Should be <code>]0, 1]</code>.</li><li><code>colsample=1.0</code>:              Proportion of columns / features that are sampled at each iteration to build the tree. Should be <code>]0, 1]</code>.</li><li><code>nbins=32</code>:                   Number of bins into which each feature is quantized. Buckets are defined based on quantiles, hence resulting in equal weight bins. Should be between 2 and 255.</li><li><code>monotone_constraints=Dict{Int, Int}()</code>: Specify monotonic constraints using a dict where the key is the feature index and the value the applicable constraint (-1=decreasing, 0=none, 1=increasing).</li><li><code>tree_type=&quot;binary&quot;</code>    Tree structure to be used. One of:<ul><li><code>binary</code>:       Each node of a tree is grown independently. Tree are built depthwise until max depth is reach or if min weight or gain (see <code>gamma</code>) stops further node splits.  </li><li><code>oblivious</code>:    A common splitting condition is imposed to all nodes of a given depth. </li></ul></li><li><code>rng=123</code>:                    Either an integer used as a seed to the random number generator or an actual random number generator (<code>::Random.AbstractRNG</code>).</li></ul><p><strong>Internal API</strong></p><p>Do <code>config = EvoTreeCount()</code> to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in EvoTreeCount(max_depth=...).</p><p><strong>Training model</strong></p><p>A model is built using <a href="../api/#EvoTrees.fit_evotree"><code>fit_evotree</code></a>:</p><pre><code class="language-julia hljs">model = fit_evotree(config; x_train, y_train, kwargs...)</code></pre><p><strong>Inference</strong></p><p>Predictions are obtained using <a href="../api/#MLJModelInterface.predict"><code>predict</code></a> which returns a <code>Matrix</code> of size <code>[nobs, 1]</code>:</p><pre><code class="language-julia hljs">EvoTrees.predict(model, X)</code></pre><p>Alternatively, models act as a functor, returning predictions when called as a function with features as argument:</p><pre><code class="language-julia hljs">model(X)</code></pre><p><strong>MLJ</strong></p><p>From MLJ, the type can be imported using:</p><pre><code class="language-julia hljs">EvoTreeCount = @load EvoTreeCount pkg=EvoTrees</code></pre><p>Do <code>model = EvoTreeCount()</code> to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in <code>EvoTreeCount(loss=...)</code>.</p><p><strong>Training data</strong></p><p>In MLJ or MLJBase, bind an instance <code>model</code> to data with     mach = machine(model, X, y) where</p><ul><li><code>X</code>: any table of input features (eg, a <code>DataFrame</code>) whose columns each have one of the following element scitypes: <code>Continuous</code>, <code>Count</code>, or <code>&lt;:OrderedFactor</code>; check column scitypes with <code>schema(X)</code></li><li><code>y</code>: is the target, which can be any <code>AbstractVector</code> whose element scitype is <code>&lt;:Count</code>; check the scitype with <code>scitype(y)</code></li></ul><p>Train the machine using <code>fit!(mach, rows=...)</code>.</p><p><strong>Operations</strong></p><ul><li><code>predict(mach, Xnew)</code>: returns a vector of Poisson distributions given features <code>Xnew</code> having the same scitype as <code>X</code> above. Predictions are probabilistic.</li></ul><p>Specific metrics can also be predicted using:</p><ul><li><code>predict_mean(mach, Xnew)</code></li><li><code>predict_mode(mach, Xnew)</code></li><li><code>predict_median(mach, Xnew)</code></li></ul><p><strong>Fitted parameters</strong></p><p>The fields of <code>fitted_params(mach)</code> are:</p><ul><li><code>:fitresult</code>: The <code>GBTree</code> object returned by EvoTrees.jl fitting algorithm.</li></ul><p><strong>Report</strong></p><p>The fields of <code>report(mach)</code> are:</p><ul><li><code>:features</code>: The names of the features encountered in training.</li></ul><p><strong>Examples</strong></p><pre><code class="nohighlight hljs"># Internal API
 using EvoTrees
 config = EvoTreeCount(max_depth=5, nbins=32, nrounds=100)
 nobs, nfeats = 1_000, 5
@@ -40,7 +40,7 @@
 preds = predict_mean(mach, X)
 preds = predict_mode(mach, X)
 preds = predict_median(mach, X)
-</code></pre><p>See also <a href="https://github.com/Evovest/EvoTrees.jl">EvoTrees.jl</a>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/Evovest/EvoTrees.jl/blob/dd0a4430da63c293fbb04fce58da71ddfaf406ff/src/MLJ.jl#L418-L544">source</a></section></article><h2 id="EvoTreeMLE"><a class="docs-heading-anchor" href="#EvoTreeMLE">EvoTreeMLE</a><a id="EvoTreeMLE-1"></a><a class="docs-heading-anchor-permalink" href="#EvoTreeMLE" title="Permalink"></a></h2><article class="docstring"><header><a class="docstring-binding" id="EvoTrees.EvoTreeMLE" href="#EvoTrees.EvoTreeMLE"><code>EvoTrees.EvoTreeMLE</code></a> — <span class="docstring-category">Type</span></header><section><div><p>EvoTreeMLE(;kwargs...)</p><p>A model type for constructing a EvoTreeMLE, based on <a href="https://github.com/Evovest/EvoTrees.jl">EvoTrees.jl</a>, and implementing both an internal API the MLJ model interface. EvoTreeMLE performs maximum likelihood estimation. Assumed distribution is specified through <code>loss</code> kwargs. Both Gaussian and Logistic distributions are supported.</p><p><strong>Hyper-parameters</strong></p><p><code>loss=:gaussian</code>:         Loss to be be minimized during training. One of:</p><ul><li><code>:gaussian</code> / <code>:gaussian_mle</code></li><li><code>:logistic</code> / <code>:logistic_mle</code></li><li><code>nrounds=10</code>:                 Number of rounds. It corresponds to the number of trees that will be sequentially stacked. Must be &gt;= 1.</li><li><code>eta=0.1</code>:              Learning rate. Each tree raw predictions are scaled by <code>eta</code> prior to be added to the stack of predictions. Must be &gt; 0.</li></ul><p>A lower <code>eta</code> results in slower learning, requiring a higher <code>nrounds</code> but typically improves model performance.  </p><ul><li><code>lambda::T=0.0</code>:              L2 regularization term on weights. Must be &gt;= 0. Higher lambda can result in a more robust model.</li><li><code>gamma::T=0.0</code>:               Minimum gain imprvement needed to perform a node split. Higher gamma can result in a more robust model. Must be &gt;= 0.</li><li><code>max_depth=5</code>:                Maximum depth of a tree. Must be &gt;= 1. A tree of depth 1 is made of a single prediction leaf. A complete tree of depth N contains <code>2^(N - 1)</code> terminal leaves and <code>2^(N - 1) - 1</code> split nodes. Compute cost is proportional to 2^max_depth. Typical optimal values are in the 3 to 9 range.</li><li><code>min_weight=8.0</code>:             Minimum weight needed in a node to perform a split. Matches the number of observations by default or the sum of weights as provided by the <code>weights</code> vector. Must be &gt; 0.</li><li><code>rowsample=1.0</code>:              Proportion of rows that are sampled at each iteration to build the tree. Should be in <code>]0, 1]</code>.</li><li><code>colsample=1.0</code>:              Proportion of columns / features that are sampled at each iteration to build the tree. Should be in <code>]0, 1]</code>.</li><li><code>nbins=32</code>:                   Number of bins into which each feature is quantized. Buckets are defined based on quantiles, hence resulting in equal weight bins. Should be between 2 and 255.</li><li><code>monotone_constraints=Dict{Int, Int}()</code>: Specify monotonic constraints using a dict where the key is the feature index and the value the applicable constraint (-1=decreasing, 0=none, 1=increasing).  !Experimental feature: note that for MLE regression, constraints may not be enforced systematically.</li><li><code>tree_type=&quot;binary&quot;</code>          Tree structure to be used. One of:<ul><li><code>binary</code>:       Each node of a tree is grown independently. Tree are built depthwise until max depth is reach or if min weight or gain (see <code>gamma</code>) stops further node splits.  </li><li><code>oblivious</code>:    A common splitting condition is imposed to all nodes of a given depth. </li></ul></li><li><code>rng=123</code>:                    Either an integer used as a seed to the random number generator or an actual random number generator (<code>::Random.AbstractRNG</code>).</li></ul><p><strong>Internal API</strong></p><p>Do <code>config = EvoTreeMLE()</code> to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in EvoTreeMLE(max_depth=...).</p><p><strong>Training model</strong></p><p>A model is built using <a href="../api/#EvoTrees.fit_evotree"><code>fit_evotree</code></a>:</p><pre><code class="language-julia hljs">model = fit_evotree(config; x_train, y_train, kwargs...)</code></pre><p><strong>Inference</strong></p><p>Predictions are obtained using <a href="../api/#MLJModelInterface.predict"><code>predict</code></a> which returns a <code>Matrix</code> of size <code>[nobs, nparams]</code> where the second dimensions refer to <code>μ</code> &amp; <code>σ</code> for Normal/Gaussian and <code>μ</code> &amp; <code>s</code> for Logistic.</p><pre><code class="language-julia hljs">EvoTrees.predict(model, X)</code></pre><p>Alternatively, models act as a functor, returning predictions when called as a function with features as argument:</p><pre><code class="language-julia hljs">model(X)</code></pre><p><strong>MLJ</strong></p><p>From MLJ, the type can be imported using:</p><pre><code class="language-julia hljs">EvoTreeMLE = @load EvoTreeMLE pkg=EvoTrees</code></pre><p>Do <code>model = EvoTreeMLE()</code> to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in <code>EvoTreeMLE(loss=...)</code>.</p><p><strong>Training data</strong></p><p>In MLJ or MLJBase, bind an instance <code>model</code> to data with</p><pre><code class="nohighlight hljs">mach = machine(model, X, y)</code></pre><p>where</p><ul><li><p><code>X</code>: any table of input features (eg, a <code>DataFrame</code>) whose columns each have one of the following element scitypes: <code>Continuous</code>, <code>Count</code>, or <code>&lt;:OrderedFactor</code>; check column scitypes with <code>schema(X)</code></p></li><li><p><code>y</code>: is the target, which can be any <code>AbstractVector</code> whose element scitype is <code>&lt;:Continuous</code>; check the scitype with <code>scitype(y)</code></p></li></ul><p>Train the machine using <code>fit!(mach, rows=...)</code>.</p><p><strong>Operations</strong></p><ul><li><code>predict(mach, Xnew)</code>: returns a vector of Gaussian or Logistic distributions (according to provided <code>loss</code>) given features <code>Xnew</code> having the same scitype as <code>X</code> above.</li></ul><p>Predictions are probabilistic.</p><p>Specific metrics can also be predicted using:</p><ul><li><code>predict_mean(mach, Xnew)</code></li><li><code>predict_mode(mach, Xnew)</code></li><li><code>predict_median(mach, Xnew)</code></li></ul><p><strong>Fitted parameters</strong></p><p>The fields of <code>fitted_params(mach)</code> are:</p><ul><li><code>:fitresult</code>: The <code>GBTree</code> object returned by EvoTrees.jl fitting algorithm.</li></ul><p><strong>Report</strong></p><p>The fields of <code>report(mach)</code> are:</p><ul><li><code>:features</code>: The names of the features encountered in training.</li></ul><p><strong>Examples</strong></p><pre><code class="nohighlight hljs"># Internal API
+</code></pre><p>See also <a href="https://github.com/Evovest/EvoTrees.jl">EvoTrees.jl</a>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/Evovest/EvoTrees.jl/blob/7f70bafae65cdc06117c37f80b74b19ed4bc54e5/src/MLJ.jl#L418-L544">source</a></section></article><h2 id="EvoTreeMLE"><a class="docs-heading-anchor" href="#EvoTreeMLE">EvoTreeMLE</a><a id="EvoTreeMLE-1"></a><a class="docs-heading-anchor-permalink" href="#EvoTreeMLE" title="Permalink"></a></h2><article class="docstring"><header><a class="docstring-binding" id="EvoTrees.EvoTreeMLE" href="#EvoTrees.EvoTreeMLE"><code>EvoTrees.EvoTreeMLE</code></a> — <span class="docstring-category">Type</span></header><section><div><p>EvoTreeMLE(;kwargs...)</p><p>A model type for constructing a EvoTreeMLE, based on <a href="https://github.com/Evovest/EvoTrees.jl">EvoTrees.jl</a>, and implementing both an internal API the MLJ model interface. EvoTreeMLE performs maximum likelihood estimation. Assumed distribution is specified through <code>loss</code> kwargs. Both Gaussian and Logistic distributions are supported.</p><p><strong>Hyper-parameters</strong></p><p><code>loss=:gaussian</code>:         Loss to be be minimized during training. One of:</p><ul><li><code>:gaussian</code> / <code>:gaussian_mle</code></li><li><code>:logistic</code> / <code>:logistic_mle</code></li><li><code>nrounds=10</code>:                 Number of rounds. It corresponds to the number of trees that will be sequentially stacked. Must be &gt;= 1.</li><li><code>eta=0.1</code>:              Learning rate. Each tree raw predictions are scaled by <code>eta</code> prior to be added to the stack of predictions. Must be &gt; 0.</li></ul><p>A lower <code>eta</code> results in slower learning, requiring a higher <code>nrounds</code> but typically improves model performance.  </p><ul><li><code>lambda::T=0.0</code>:              L2 regularization term on weights. Must be &gt;= 0. Higher lambda can result in a more robust model.</li><li><code>gamma::T=0.0</code>:               Minimum gain imprvement needed to perform a node split. Higher gamma can result in a more robust model. Must be &gt;= 0.</li><li><code>max_depth=5</code>:                Maximum depth of a tree. Must be &gt;= 1. A tree of depth 1 is made of a single prediction leaf. A complete tree of depth N contains <code>2^(N - 1)</code> terminal leaves and <code>2^(N - 1) - 1</code> split nodes. Compute cost is proportional to 2^max_depth. Typical optimal values are in the 3 to 9 range.</li><li><code>min_weight=8.0</code>:             Minimum weight needed in a node to perform a split. Matches the number of observations by default or the sum of weights as provided by the <code>weights</code> vector. Must be &gt; 0.</li><li><code>rowsample=1.0</code>:              Proportion of rows that are sampled at each iteration to build the tree. Should be in <code>]0, 1]</code>.</li><li><code>colsample=1.0</code>:              Proportion of columns / features that are sampled at each iteration to build the tree. Should be in <code>]0, 1]</code>.</li><li><code>nbins=32</code>:                   Number of bins into which each feature is quantized. Buckets are defined based on quantiles, hence resulting in equal weight bins. Should be between 2 and 255.</li><li><code>monotone_constraints=Dict{Int, Int}()</code>: Specify monotonic constraints using a dict where the key is the feature index and the value the applicable constraint (-1=decreasing, 0=none, 1=increasing).  !Experimental feature: note that for MLE regression, constraints may not be enforced systematically.</li><li><code>tree_type=&quot;binary&quot;</code>          Tree structure to be used. One of:<ul><li><code>binary</code>:       Each node of a tree is grown independently. Tree are built depthwise until max depth is reach or if min weight or gain (see <code>gamma</code>) stops further node splits.  </li><li><code>oblivious</code>:    A common splitting condition is imposed to all nodes of a given depth. </li></ul></li><li><code>rng=123</code>:                    Either an integer used as a seed to the random number generator or an actual random number generator (<code>::Random.AbstractRNG</code>).</li></ul><p><strong>Internal API</strong></p><p>Do <code>config = EvoTreeMLE()</code> to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in EvoTreeMLE(max_depth=...).</p><p><strong>Training model</strong></p><p>A model is built using <a href="../api/#EvoTrees.fit_evotree"><code>fit_evotree</code></a>:</p><pre><code class="language-julia hljs">model = fit_evotree(config; x_train, y_train, kwargs...)</code></pre><p><strong>Inference</strong></p><p>Predictions are obtained using <a href="../api/#MLJModelInterface.predict"><code>predict</code></a> which returns a <code>Matrix</code> of size <code>[nobs, nparams]</code> where the second dimensions refer to <code>μ</code> &amp; <code>σ</code> for Normal/Gaussian and <code>μ</code> &amp; <code>s</code> for Logistic.</p><pre><code class="language-julia hljs">EvoTrees.predict(model, X)</code></pre><p>Alternatively, models act as a functor, returning predictions when called as a function with features as argument:</p><pre><code class="language-julia hljs">model(X)</code></pre><p><strong>MLJ</strong></p><p>From MLJ, the type can be imported using:</p><pre><code class="language-julia hljs">EvoTreeMLE = @load EvoTreeMLE pkg=EvoTrees</code></pre><p>Do <code>model = EvoTreeMLE()</code> to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in <code>EvoTreeMLE(loss=...)</code>.</p><p><strong>Training data</strong></p><p>In MLJ or MLJBase, bind an instance <code>model</code> to data with</p><pre><code class="nohighlight hljs">mach = machine(model, X, y)</code></pre><p>where</p><ul><li><p><code>X</code>: any table of input features (eg, a <code>DataFrame</code>) whose columns each have one of the following element scitypes: <code>Continuous</code>, <code>Count</code>, or <code>&lt;:OrderedFactor</code>; check column scitypes with <code>schema(X)</code></p></li><li><p><code>y</code>: is the target, which can be any <code>AbstractVector</code> whose element scitype is <code>&lt;:Continuous</code>; check the scitype with <code>scitype(y)</code></p></li></ul><p>Train the machine using <code>fit!(mach, rows=...)</code>.</p><p><strong>Operations</strong></p><ul><li><code>predict(mach, Xnew)</code>: returns a vector of Gaussian or Logistic distributions (according to provided <code>loss</code>) given features <code>Xnew</code> having the same scitype as <code>X</code> above.</li></ul><p>Predictions are probabilistic.</p><p>Specific metrics can also be predicted using:</p><ul><li><code>predict_mean(mach, Xnew)</code></li><li><code>predict_mode(mach, Xnew)</code></li><li><code>predict_median(mach, Xnew)</code></li></ul><p><strong>Fitted parameters</strong></p><p>The fields of <code>fitted_params(mach)</code> are:</p><ul><li><code>:fitresult</code>: The <code>GBTree</code> object returned by EvoTrees.jl fitting algorithm.</li></ul><p><strong>Report</strong></p><p>The fields of <code>report(mach)</code> are:</p><ul><li><code>:features</code>: The names of the features encountered in training.</li></ul><p><strong>Examples</strong></p><pre><code class="nohighlight hljs"># Internal API
 using EvoTrees
 config = EvoTreeMLE(max_depth=5, nbins=32, nrounds=100)
 nobs, nfeats = 1_000, 5
@@ -55,7 +55,7 @@
 preds = predict(mach, X)
 preds = predict_mean(mach, X)
 preds = predict_mode(mach, X)
-preds = predict_median(mach, X)</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/Evovest/EvoTrees.jl/blob/dd0a4430da63c293fbb04fce58da71ddfaf406ff/src/MLJ.jl#L681-L813">source</a></section></article><h2 id="EvoTreeGaussian"><a class="docs-heading-anchor" href="#EvoTreeGaussian">EvoTreeGaussian</a><a id="EvoTreeGaussian-1"></a><a class="docs-heading-anchor-permalink" href="#EvoTreeGaussian" title="Permalink"></a></h2><p><code>EvoTreeGaussian</code> is to be deprecated. Please use EvoTreeMLE with <code>loss = :gaussian_mle</code>. </p><article class="docstring"><header><a class="docstring-binding" id="EvoTrees.EvoTreeGaussian" href="#EvoTrees.EvoTreeGaussian"><code>EvoTrees.EvoTreeGaussian</code></a> — <span class="docstring-category">Type</span></header><section><div><p>EvoTreeGaussian(;kwargs...)</p><p>A model type for constructing a EvoTreeGaussian, based on <a href="https://github.com/Evovest/EvoTrees.jl">EvoTrees.jl</a>, and implementing both an internal API the MLJ model interface. EvoTreeGaussian is used to perform Gaussian probabilistic regression, fitting μ and σ parameters to maximize likelihood.</p><p><strong>Hyper-parameters</strong></p><ul><li><code>nrounds=10</code>:                 Number of rounds. It corresponds to the number of trees that will be sequentially stacked. Must be &gt;= 1.</li><li><code>eta=0.1</code>:              Learning rate. Each tree raw predictions are scaled by <code>eta</code> prior to be added to the stack of predictions. Must be &gt; 0. A lower <code>eta</code> results in slower learning, requiring a higher <code>nrounds</code> but typically improves model performance.  </li><li><code>lambda::T=0.0</code>:              L2 regularization term on weights. Must be &gt;= 0. Higher lambda can result in a more robust model.</li><li><code>gamma::T=0.0</code>:               Minimum gain imprvement needed to perform a node split. Higher gamma can result in a more robust model. Must be &gt;= 0.</li><li><code>max_depth=5</code>:                Maximum depth of a tree. Must be &gt;= 1. A tree of depth 1 is made of a single prediction leaf. A complete tree of depth N contains <code>2^(N - 1)</code> terminal leaves and <code>2^(N - 1) - 1</code> split nodes. Compute cost is proportional to 2^max_depth. Typical optimal values are in the 3 to 9 range.</li><li><code>min_weight=8.0</code>:             Minimum weight needed in a node to perform a split. Matches the number of observations by default or the sum of weights as provided by the <code>weights</code> vector. Must be &gt; 0.</li><li><code>rowsample=1.0</code>:              Proportion of rows that are sampled at each iteration to build the tree. Should be in <code>]0, 1]</code>.</li><li><code>colsample=1.0</code>:              Proportion of columns / features that are sampled at each iteration to build the tree. Should be in <code>]0, 1]</code>.</li><li><code>nbins=32</code>:                   Number of bins into which each feature is quantized. Buckets are defined based on quantiles, hence resulting in equal weight bins. Should be between 2 and 255.</li><li><code>monotone_constraints=Dict{Int, Int}()</code>: Specify monotonic constraints using a dict where the key is the feature index and the value the applicable constraint (-1=decreasing, 0=none, 1=increasing).  !Experimental feature: note that for Gaussian regression, constraints may not be enforce systematically.</li><li><code>tree_type=&quot;binary&quot;</code>    Tree structure to be used. One of:<ul><li><code>binary</code>:       Each node of a tree is grown independently. Tree are built depthwise until max depth is reach or if min weight or gain (see <code>gamma</code>) stops further node splits.  </li><li><code>oblivious</code>:    A common splitting condition is imposed to all nodes of a given depth. </li></ul></li><li><code>rng=123</code>:                    Either an integer used as a seed to the random number generator or an actual random number generator (<code>::Random.AbstractRNG</code>).</li></ul><p><strong>Internal API</strong></p><p>Do <code>config = EvoTreeGaussian()</code> to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in EvoTreeGaussian(max_depth=...).</p><p><strong>Training model</strong></p><p>A model is built using <a href="../api/#EvoTrees.fit_evotree"><code>fit_evotree</code></a>:</p><pre><code class="language-julia hljs">model = fit_evotree(config; x_train, y_train, kwargs...)</code></pre><p><strong>Inference</strong></p><p>Predictions are obtained using <a href="../api/#MLJModelInterface.predict"><code>predict</code></a> which returns a <code>Matrix</code> of size <code>[nobs, 2]</code> where the second dimensions refer to <code>μ</code> and <code>σ</code> respectively:</p><pre><code class="language-julia hljs">EvoTrees.predict(model, X)</code></pre><p>Alternatively, models act as a functor, returning predictions when called as a function with features as argument:</p><pre><code class="language-julia hljs">model(X)</code></pre><p><strong>MLJ</strong></p><p>From MLJ, the type can be imported using:</p><pre><code class="language-julia hljs">EvoTreeGaussian = @load EvoTreeGaussian pkg=EvoTrees</code></pre><p>Do <code>model = EvoTreeGaussian()</code> to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in <code>EvoTreeGaussian(loss=...)</code>.</p><p><strong>Training data</strong></p><p>In MLJ or MLJBase, bind an instance <code>model</code> to data with</p><pre><code class="nohighlight hljs">mach = machine(model, X, y)</code></pre><p>where</p><ul><li><p><code>X</code>: any table of input features (eg, a <code>DataFrame</code>) whose columns each have one of the following element scitypes: <code>Continuous</code>, <code>Count</code>, or <code>&lt;:OrderedFactor</code>; check column scitypes with <code>schema(X)</code></p></li><li><p><code>y</code>: is the target, which can be any <code>AbstractVector</code> whose element scitype is <code>&lt;:Continuous</code>; check the scitype with <code>scitype(y)</code></p></li></ul><p>Train the machine using <code>fit!(mach, rows=...)</code>.</p><p><strong>Operations</strong></p><ul><li><code>predict(mach, Xnew)</code>: returns a vector of Gaussian distributions given features <code>Xnew</code> having the same scitype as <code>X</code> above.</li></ul><p>Predictions are probabilistic.</p><p>Specific metrics can also be predicted using:</p><ul><li><code>predict_mean(mach, Xnew)</code></li><li><code>predict_mode(mach, Xnew)</code></li><li><code>predict_median(mach, Xnew)</code></li></ul><p><strong>Fitted parameters</strong></p><p>The fields of <code>fitted_params(mach)</code> are:</p><ul><li><code>:fitresult</code>: The <code>GBTree</code> object returned by EvoTrees.jl fitting algorithm.</li></ul><p><strong>Report</strong></p><p>The fields of <code>report(mach)</code> are:</p><ul><li><code>:features</code>: The names of the features encountered in training.</li></ul><p><strong>Examples</strong></p><pre><code class="nohighlight hljs"># Internal API
+preds = predict_median(mach, X)</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/Evovest/EvoTrees.jl/blob/7f70bafae65cdc06117c37f80b74b19ed4bc54e5/src/MLJ.jl#L681-L813">source</a></section></article><h2 id="EvoTreeGaussian"><a class="docs-heading-anchor" href="#EvoTreeGaussian">EvoTreeGaussian</a><a id="EvoTreeGaussian-1"></a><a class="docs-heading-anchor-permalink" href="#EvoTreeGaussian" title="Permalink"></a></h2><p><code>EvoTreeGaussian</code> is to be deprecated. Please use EvoTreeMLE with <code>loss = :gaussian_mle</code>. </p><article class="docstring"><header><a class="docstring-binding" id="EvoTrees.EvoTreeGaussian" href="#EvoTrees.EvoTreeGaussian"><code>EvoTrees.EvoTreeGaussian</code></a> — <span class="docstring-category">Type</span></header><section><div><p>EvoTreeGaussian(;kwargs...)</p><p>A model type for constructing a EvoTreeGaussian, based on <a href="https://github.com/Evovest/EvoTrees.jl">EvoTrees.jl</a>, and implementing both an internal API the MLJ model interface. EvoTreeGaussian is used to perform Gaussian probabilistic regression, fitting μ and σ parameters to maximize likelihood.</p><p><strong>Hyper-parameters</strong></p><ul><li><code>nrounds=10</code>:                 Number of rounds. It corresponds to the number of trees that will be sequentially stacked. Must be &gt;= 1.</li><li><code>eta=0.1</code>:              Learning rate. Each tree raw predictions are scaled by <code>eta</code> prior to be added to the stack of predictions. Must be &gt; 0. A lower <code>eta</code> results in slower learning, requiring a higher <code>nrounds</code> but typically improves model performance.  </li><li><code>lambda::T=0.0</code>:              L2 regularization term on weights. Must be &gt;= 0. Higher lambda can result in a more robust model.</li><li><code>gamma::T=0.0</code>:               Minimum gain imprvement needed to perform a node split. Higher gamma can result in a more robust model. Must be &gt;= 0.</li><li><code>max_depth=5</code>:                Maximum depth of a tree. Must be &gt;= 1. A tree of depth 1 is made of a single prediction leaf. A complete tree of depth N contains <code>2^(N - 1)</code> terminal leaves and <code>2^(N - 1) - 1</code> split nodes. Compute cost is proportional to 2^max_depth. Typical optimal values are in the 3 to 9 range.</li><li><code>min_weight=8.0</code>:             Minimum weight needed in a node to perform a split. Matches the number of observations by default or the sum of weights as provided by the <code>weights</code> vector. Must be &gt; 0.</li><li><code>rowsample=1.0</code>:              Proportion of rows that are sampled at each iteration to build the tree. Should be in <code>]0, 1]</code>.</li><li><code>colsample=1.0</code>:              Proportion of columns / features that are sampled at each iteration to build the tree. Should be in <code>]0, 1]</code>.</li><li><code>nbins=32</code>:                   Number of bins into which each feature is quantized. Buckets are defined based on quantiles, hence resulting in equal weight bins. Should be between 2 and 255.</li><li><code>monotone_constraints=Dict{Int, Int}()</code>: Specify monotonic constraints using a dict where the key is the feature index and the value the applicable constraint (-1=decreasing, 0=none, 1=increasing).  !Experimental feature: note that for Gaussian regression, constraints may not be enforce systematically.</li><li><code>tree_type=&quot;binary&quot;</code>    Tree structure to be used. One of:<ul><li><code>binary</code>:       Each node of a tree is grown independently. Tree are built depthwise until max depth is reach or if min weight or gain (see <code>gamma</code>) stops further node splits.  </li><li><code>oblivious</code>:    A common splitting condition is imposed to all nodes of a given depth. </li></ul></li><li><code>rng=123</code>:                    Either an integer used as a seed to the random number generator or an actual random number generator (<code>::Random.AbstractRNG</code>).</li></ul><p><strong>Internal API</strong></p><p>Do <code>config = EvoTreeGaussian()</code> to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in EvoTreeGaussian(max_depth=...).</p><p><strong>Training model</strong></p><p>A model is built using <a href="../api/#EvoTrees.fit_evotree"><code>fit_evotree</code></a>:</p><pre><code class="language-julia hljs">model = fit_evotree(config; x_train, y_train, kwargs...)</code></pre><p><strong>Inference</strong></p><p>Predictions are obtained using <a href="../api/#MLJModelInterface.predict"><code>predict</code></a> which returns a <code>Matrix</code> of size <code>[nobs, 2]</code> where the second dimensions refer to <code>μ</code> and <code>σ</code> respectively:</p><pre><code class="language-julia hljs">EvoTrees.predict(model, X)</code></pre><p>Alternatively, models act as a functor, returning predictions when called as a function with features as argument:</p><pre><code class="language-julia hljs">model(X)</code></pre><p><strong>MLJ</strong></p><p>From MLJ, the type can be imported using:</p><pre><code class="language-julia hljs">EvoTreeGaussian = @load EvoTreeGaussian pkg=EvoTrees</code></pre><p>Do <code>model = EvoTreeGaussian()</code> to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in <code>EvoTreeGaussian(loss=...)</code>.</p><p><strong>Training data</strong></p><p>In MLJ or MLJBase, bind an instance <code>model</code> to data with</p><pre><code class="nohighlight hljs">mach = machine(model, X, y)</code></pre><p>where</p><ul><li><p><code>X</code>: any table of input features (eg, a <code>DataFrame</code>) whose columns each have one of the following element scitypes: <code>Continuous</code>, <code>Count</code>, or <code>&lt;:OrderedFactor</code>; check column scitypes with <code>schema(X)</code></p></li><li><p><code>y</code>: is the target, which can be any <code>AbstractVector</code> whose element scitype is <code>&lt;:Continuous</code>; check the scitype with <code>scitype(y)</code></p></li></ul><p>Train the machine using <code>fit!(mach, rows=...)</code>.</p><p><strong>Operations</strong></p><ul><li><code>predict(mach, Xnew)</code>: returns a vector of Gaussian distributions given features <code>Xnew</code> having the same scitype as <code>X</code> above.</li></ul><p>Predictions are probabilistic.</p><p>Specific metrics can also be predicted using:</p><ul><li><code>predict_mean(mach, Xnew)</code></li><li><code>predict_mode(mach, Xnew)</code></li><li><code>predict_median(mach, Xnew)</code></li></ul><p><strong>Fitted parameters</strong></p><p>The fields of <code>fitted_params(mach)</code> are:</p><ul><li><code>:fitresult</code>: The <code>GBTree</code> object returned by EvoTrees.jl fitting algorithm.</li></ul><p><strong>Report</strong></p><p>The fields of <code>report(mach)</code> are:</p><ul><li><code>:features</code>: The names of the features encountered in training.</li></ul><p><strong>Examples</strong></p><pre><code class="nohighlight hljs"># Internal API
 using EvoTrees
 params = EvoTreeGaussian(max_depth=5, nbins=32, nrounds=100)
 nobs, nfeats = 1_000, 5
@@ -70,4 +70,4 @@
 preds = predict(mach, X)
 preds = predict_mean(mach, X)
 preds = predict_mode(mach, X)
-preds = predict_median(mach, X)</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/Evovest/EvoTrees.jl/blob/dd0a4430da63c293fbb04fce58da71ddfaf406ff/src/MLJ.jl#L547-L676">source</a></section></article></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../">« Introduction</a><a class="docs-footer-nextpage" href="../api/">API »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 0.27.25 on <span class="colophon-date" title="Monday 11 September 2023 00:43">Monday 11 September 2023</span>. Using Julia version 1.6.7.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
+preds = predict_median(mach, X)</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/Evovest/EvoTrees.jl/blob/7f70bafae65cdc06117c37f80b74b19ed4bc54e5/src/MLJ.jl#L547-L676">source</a></section></article></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../">« Introduction</a><a class="docs-footer-nextpage" href="../api/">API »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 0.27.25 on <span class="colophon-date" title="Tuesday 12 September 2023 01:45">Tuesday 12 September 2023</span>. Using Julia version 1.6.7.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
diff --git a/dev/search/index.html b/dev/search/index.html
index 6a778fe7..801f58de 100644
--- a/dev/search/index.html
+++ b/dev/search/index.html
@@ -1,2 +1,2 @@
 <!DOCTYPE html>
-<html lang="en"><head><meta charset="UTF-8"/><meta name="viewport" content="width=device-width, initial-scale=1.0"/><title>Search · EvoTrees.jl</title><script data-outdated-warner src="../assets/warner.js"></script><link href="https://cdnjs.cloudflare.com/ajax/libs/lato-font/3.0.0/css/lato-font.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/juliamono/0.045/juliamono.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.15.4/css/fontawesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.15.4/css/solid.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.15.4/css/brands.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/KaTeX/0.13.24/katex.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL=".."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.6/require.min.js" data-main="../assets/documenter.js"></script><script src="../siteinfo.js"></script><script src="../../versions.js"></script><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../assets/themes/documenter-dark.css" data-theme-name="documenter-dark" data-theme-primary-dark/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../assets/themes/documenter-light.css" data-theme-name="documenter-light" data-theme-primary/><script src="../assets/themeswap.js"></script><link href="../assets/style.css" rel="stylesheet" type="text/css"/></head><body><div id="documenter"><nav class="docs-sidebar"><a class="docs-logo" href="../"><img src="../assets/logo.png" alt="EvoTrees.jl logo"/></a><form class="docs-search" action><input class="docs-search-query" id="documenter-search-query" name="q" type="text" placeholder="Search docs"/></form><ul class="docs-menu"><li><a class="tocitem" href="../">Introduction</a></li><li><a class="tocitem" href="../models/">Models</a></li><li><a class="tocitem" href="../api/">API</a></li><li><span class="tocitem">Tutorials</span><ul><li><a class="tocitem" href="../tutorials/regression-boston/">Regression - Boston</a></li><li><a class="tocitem" href="../tutorials/logistic-regression-titanic/">Logistic Regression - Titanic</a></li><li><a class="tocitem" href="../tutorials/classification-iris/">Classification - IRIS</a></li><li><a class="tocitem" href="../tutorials/ranking-LTRC/">Ranking - Yahoo! LTRC</a></li><li><a class="tocitem" href="../tutorials/examples-API/">Internal API</a></li><li><a class="tocitem" href="../tutorials/examples-MLJ/">MLJ API</a></li></ul></li></ul><div class="docs-version-selector field has-addons"><div class="control"><span class="docs-label button is-static is-size-7">Version</span></div><div class="docs-selector control is-expanded"><div class="select is-fullwidth is-size-7"><select id="documenter-version-selector"></select></div></div></div></nav><div class="docs-main"><header class="docs-navbar"><nav class="breadcrumb"><ul class="is-hidden-mobile"><li class="is-active"><a href>Search</a></li></ul><ul class="is-hidden-tablet"><li class="is-active"><a href>Search</a></li></ul></nav><div class="docs-right"><a class="docs-settings-button fas fa-cog" id="documenter-settings-button" href="#" title="Settings"></a><a class="docs-sidebar-button fa fa-bars is-hidden-desktop" id="documenter-sidebar-button" href="#"></a></div></header><article><p id="documenter-search-info">Loading search...</p><ul id="documenter-search-results"></ul></article><nav class="docs-footer"><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 0.27.25 on <span class="colophon-date" title="Monday 11 September 2023 00:43">Monday 11 September 2023</span>. Using Julia version 1.6.7.</p></section><footer class="modal-card-foot"></footer></div></div></div></body><script src="../search_index.js"></script><script src="../assets/search.js"></script></html>
+<html lang="en"><head><meta charset="UTF-8"/><meta name="viewport" content="width=device-width, initial-scale=1.0"/><title>Search · EvoTrees.jl</title><script data-outdated-warner src="../assets/warner.js"></script><link href="https://cdnjs.cloudflare.com/ajax/libs/lato-font/3.0.0/css/lato-font.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/juliamono/0.045/juliamono.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.15.4/css/fontawesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.15.4/css/solid.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.15.4/css/brands.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/KaTeX/0.13.24/katex.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL=".."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.6/require.min.js" data-main="../assets/documenter.js"></script><script src="../siteinfo.js"></script><script src="../../versions.js"></script><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../assets/themes/documenter-dark.css" data-theme-name="documenter-dark" data-theme-primary-dark/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../assets/themes/documenter-light.css" data-theme-name="documenter-light" data-theme-primary/><script src="../assets/themeswap.js"></script><link href="../assets/style.css" rel="stylesheet" type="text/css"/></head><body><div id="documenter"><nav class="docs-sidebar"><a class="docs-logo" href="../"><img src="../assets/logo.png" alt="EvoTrees.jl logo"/></a><form class="docs-search" action><input class="docs-search-query" id="documenter-search-query" name="q" type="text" placeholder="Search docs"/></form><ul class="docs-menu"><li><a class="tocitem" href="../">Introduction</a></li><li><a class="tocitem" href="../models/">Models</a></li><li><a class="tocitem" href="../api/">API</a></li><li><span class="tocitem">Tutorials</span><ul><li><a class="tocitem" href="../tutorials/regression-boston/">Regression - Boston</a></li><li><a class="tocitem" href="../tutorials/logistic-regression-titanic/">Logistic Regression - Titanic</a></li><li><a class="tocitem" href="../tutorials/classification-iris/">Classification - IRIS</a></li><li><a class="tocitem" href="../tutorials/ranking-LTRC/">Ranking - Yahoo! LTRC</a></li><li><a class="tocitem" href="../tutorials/examples-API/">Internal API</a></li><li><a class="tocitem" href="../tutorials/examples-MLJ/">MLJ API</a></li></ul></li></ul><div class="docs-version-selector field has-addons"><div class="control"><span class="docs-label button is-static is-size-7">Version</span></div><div class="docs-selector control is-expanded"><div class="select is-fullwidth is-size-7"><select id="documenter-version-selector"></select></div></div></div></nav><div class="docs-main"><header class="docs-navbar"><nav class="breadcrumb"><ul class="is-hidden-mobile"><li class="is-active"><a href>Search</a></li></ul><ul class="is-hidden-tablet"><li class="is-active"><a href>Search</a></li></ul></nav><div class="docs-right"><a class="docs-settings-button fas fa-cog" id="documenter-settings-button" href="#" title="Settings"></a><a class="docs-sidebar-button fa fa-bars is-hidden-desktop" id="documenter-sidebar-button" href="#"></a></div></header><article><p id="documenter-search-info">Loading search...</p><ul id="documenter-search-results"></ul></article><nav class="docs-footer"><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 0.27.25 on <span class="colophon-date" title="Tuesday 12 September 2023 01:45">Tuesday 12 September 2023</span>. Using Julia version 1.6.7.</p></section><footer class="modal-card-foot"></footer></div></div></div></body><script src="../search_index.js"></script><script src="../assets/search.js"></script></html>
diff --git a/dev/search_index.js b/dev/search_index.js
index 4d3d0a9e..c8a56cb0 100644
--- a/dev/search_index.js
+++ b/dev/search_index.js
@@ -1,3 +1,3 @@
 var documenterSearchIndex = {"docs":
-[{"location":"tutorials/ranking-LTRC/#Ranking-with-Yahoo!-Learning-to-Rank-Challenge.","page":"Ranking - Yahoo! LTRC","title":"Ranking with Yahoo! Learning to Rank Challenge.","text":"","category":"section"},{"location":"tutorials/ranking-LTRC/","page":"Ranking - Yahoo! LTRC","title":"Ranking - Yahoo! LTRC","text":"In this tutorial, we we walk through how a ranking task can be tackled using regular regression techniques without compromise on performance compared to specialized ranking learners.  The data used is from the C14 - Yahoo! Learning to Rank Challenge, which can be obtained following a request to https://webscope.sandbox.yahoo.com.","category":"page"},{"location":"tutorials/ranking-LTRC/#Getting-started","page":"Ranking - Yahoo! LTRC","title":"Getting started","text":"","category":"section"},{"location":"tutorials/ranking-LTRC/","page":"Ranking - Yahoo! LTRC","title":"Ranking - Yahoo! LTRC","text":"To begin, we load the required packages:","category":"page"},{"location":"tutorials/ranking-LTRC/","page":"Ranking - Yahoo! LTRC","title":"Ranking - Yahoo! LTRC","text":"using EvoTrees\nusing DataFrames\nusing Statistics: mean\nusing CategoricalArrays\nusing Random","category":"page"},{"location":"tutorials/ranking-LTRC/#Load-LIBSVM-format-data","page":"Ranking - Yahoo! LTRC","title":"Load LIBSVM format data","text":"","category":"section"},{"location":"tutorials/ranking-LTRC/","page":"Ranking - Yahoo! LTRC","title":"Ranking - Yahoo! LTRC","text":"Some datasets come in the so called LIBSVM format, which stores data using a sparse representation: ","category":"page"},{"location":"tutorials/ranking-LTRC/","page":"Ranking - Yahoo! LTRC","title":"Ranking - Yahoo! LTRC","text":"<label> <query> <feature_id_1>:<feature_value_1> <feature_id_2>:<feature_value_2>","category":"page"},{"location":"tutorials/ranking-LTRC/","page":"Ranking - Yahoo! LTRC","title":"Ranking - Yahoo! LTRC","text":"We use the ReadLIBSVM.jl package to perform parsing: ","category":"page"},{"location":"tutorials/ranking-LTRC/","page":"Ranking - Yahoo! LTRC","title":"Ranking - Yahoo! LTRC","text":"using ReadLIBSVM\ndtrain = read_libsvm(\"set1.train.txt\"; has_query=true)\ndeval = read_libsvm(\"set1.valid.txt\"; has_query=true)\ndtest = read_libsvm(\"set1.test.txt\"; has_query=true)","category":"page"},{"location":"tutorials/ranking-LTRC/#Preprocessing","page":"Ranking - Yahoo! LTRC","title":"Preprocessing","text":"","category":"section"},{"location":"tutorials/ranking-LTRC/","page":"Ranking - Yahoo! LTRC","title":"Ranking - Yahoo! LTRC","text":"Preprocessing is minimal since all features are parsed as floats and specific files are provided for each of the train, eval and test splits. ","category":"page"},{"location":"tutorials/ranking-LTRC/","page":"Ranking - Yahoo! LTRC","title":"Ranking - Yahoo! LTRC","text":"Several features are fully missing (contain only 0s) in the training dataset. They are removed from all datasets since they cannot bring value to the model.","category":"page"},{"location":"tutorials/ranking-LTRC/","page":"Ranking - Yahoo! LTRC","title":"Ranking - Yahoo! LTRC","text":"Then, the features, targets and query ids are extracted from the parsed LIBSVM format. ","category":"page"},{"location":"tutorials/ranking-LTRC/","page":"Ranking - Yahoo! LTRC","title":"Ranking - Yahoo! LTRC","text":"colsums_train = map(sum, eachcol(dtrain[:x]))\ndrop_cols = colsums_train .== 0\n\nx_train = dtrain[:x][:, .!drop_cols]\nx_eval = deval[:x][:, .!drop_cols]\nx_test = dtest[:x][:, .!drop_cols]\n\n# assign queries\nq_train = dtrain[:q]\nq_eval = deval[:q]\nq_test = dtest[:q]\n\n# assign targets\ny_train = dtrain[:y]\ny_eval = deval[:y]\ny_test = dtest[:y]","category":"page"},{"location":"tutorials/ranking-LTRC/#Training","page":"Ranking - Yahoo! LTRC","title":"Training","text":"","category":"section"},{"location":"tutorials/ranking-LTRC/","page":"Ranking - Yahoo! LTRC","title":"Ranking - Yahoo! LTRC","text":"Now we are ready to train our model. We first define a model configuration using the EvoTreeRegressor model constructor.  Then, we use fit_evotree to train a boosted tree model. The optional x_eval and y_eval arguments are provided to enable the usage of early stopping. ","category":"page"},{"location":"tutorials/ranking-LTRC/","page":"Ranking - Yahoo! LTRC","title":"Ranking - Yahoo! LTRC","text":"config = EvoTreeRegressor(\n    nrounds=6000,\n    loss=:mse,\n    eta=0.02,\n    nbins=64,\n    max_depth=11,\n    rowsample=0.9,\n    colsample=0.9,\n)\n\nm_mse, logger_mse = fit_evotree(\n    config;\n    x_train=x_train,\n    y_train=y_train,\n    x_eval=x_eval,\n    y_eval=y_eval,\n    early_stopping_rounds=200,\n    print_every_n=50,\n    metric=:mse,\n    return_logger=true\n);\n\np_test = m_mse(x_test);","category":"page"},{"location":"tutorials/ranking-LTRC/#Model-evaluation","page":"Ranking - Yahoo! LTRC","title":"Model evaluation","text":"","category":"section"},{"location":"tutorials/ranking-LTRC/","page":"Ranking - Yahoo! LTRC","title":"Ranking - Yahoo! LTRC","text":"For ranking problems, a commonly used metric is the Normalized Discounted Cumulative Gain. It essentially considers whether the model is good at identifying the top K outcomes within a group. There are various flavors to its implementation, though the most commonly used one is the following:","category":"page"},{"location":"tutorials/ranking-LTRC/","page":"Ranking - Yahoo! LTRC","title":"Ranking - Yahoo! LTRC","text":"function ndcg(p, y, k=10)\n    k = min(k, length(p))\n    p_order = partialsortperm(p, 1:k, rev=true)\n    y_order = partialsortperm(y, 1:k, rev=true)\n    _y = y[p_order]\n    gains = 2 .^ _y .- 1\n    discounts = log2.((1:k) .+ 1)\n    ndcg = sum(gains ./ discounts)\n\n    y_order = partialsortperm(y, 1:k, rev=true)\n    _y = y[y_order]\n    gains = 2 .^ _y .- 1\n    discounts = log2.((1:k) .+ 1)\n    idcg = sum(gains ./ discounts)\n    return idcg == 0 ? 1.0 : ndcg / idcg\nend","category":"page"},{"location":"tutorials/ranking-LTRC/","page":"Ranking - Yahoo! LTRC","title":"Ranking - Yahoo! LTRC","text":"To compute the NDCG over a collection of groups, it is handy to leverage DataFrames' combine and groupby functionalities: ","category":"page"},{"location":"tutorials/ranking-LTRC/","page":"Ranking - Yahoo! LTRC","title":"Ranking - Yahoo! LTRC","text":"test_df = DataFrame(p=p_test, y=y_test, q=q_test)\ntest_df_agg = combine(groupby(test_df, \"q\"), [\"p\", \"y\"] => ndcg => \"ndcg\")\nndcg_test = round(mean(test_df_agg.ndcg), sigdigits=5)\n@info \"ndcg_test MSE\" ndcg_test\n\n┌ Info: ndcg_test MSE\n└   ndcg_test = 0.8008","category":"page"},{"location":"tutorials/ranking-LTRC/#Logistic-regression-alternative","page":"Ranking - Yahoo! LTRC","title":"Logistic regression alternative","text":"","category":"section"},{"location":"tutorials/ranking-LTRC/","page":"Ranking - Yahoo! LTRC","title":"Ranking - Yahoo! LTRC","text":"The above regression experiment shows a performance competitive with the results outlined in CatBoost's ranking benchmarks. ","category":"page"},{"location":"tutorials/ranking-LTRC/","page":"Ranking - Yahoo! LTRC","title":"Ranking - Yahoo! LTRC","text":"Another approach is to use a scaling of the the target ranking scores to perform a logistic regression.","category":"page"},{"location":"tutorials/ranking-LTRC/","page":"Ranking - Yahoo! LTRC","title":"Ranking - Yahoo! LTRC","text":"max_rank = 4\ny_train = dtrain[:y] ./ max_rank\ny_eval = deval[:y] ./ max_rank\ny_test = dtest[:y] ./ max_rank\n\nconfig = EvoTreeRegressor(\n    nrounds=6000,\n    loss=:logloss,\n    eta=0.01,\n    nbins=64,\n    max_depth=11,\n    rowsample=0.9,\n    colsample=0.9,\n)\n\nm_logloss, logger_logloss = fit_evotree(\n    config;\n    x_train=x_train,\n    y_train=y_train,\n    x_eval=x_eval,\n    y_eval=y_eval,\n    early_stopping_rounds=200,\n    print_every_n=50,\n    metric=:logloss,\n    return_logger=true\n);","category":"page"},{"location":"tutorials/ranking-LTRC/","page":"Ranking - Yahoo! LTRC","title":"Ranking - Yahoo! LTRC","text":"To measure the NDCG, the original targets must be used since NDCG is a scale sensitive measure.","category":"page"},{"location":"tutorials/ranking-LTRC/","page":"Ranking - Yahoo! LTRC","title":"Ranking - Yahoo! LTRC","text":"y_train = dtrain[:y]\ny_eval = deval[:y]\ny_test = dtest[:y]\n\np_test = m_logloss(x_test);\ntest_df = DataFrame(p=p_test, y=y_test, q=q_test)\ntest_df_agg = combine(groupby(test_df, \"q\"), [\"p\", \"y\"] => ndcg => \"ndcg\")\nndcg_test = round(mean(test_df_agg.ndcg), sigdigits=5)\n@info \"ndcg_test LogLoss\" ndcg_test\n\n┌ Info: ndcg_test LogLoss\n└   ndcg_test = 0.80267","category":"page"},{"location":"tutorials/ranking-LTRC/#Conclusion","page":"Ranking - Yahoo! LTRC","title":"Conclusion","text":"","category":"section"},{"location":"tutorials/ranking-LTRC/","page":"Ranking - Yahoo! LTRC","title":"Ranking - Yahoo! LTRC","text":"We've seen that a ranking problem can be efficiently handled with generic regression tasks, yet achieve comparable performance to specialized ranking loss functions. Below, we present the NDCG obtained from the above experiments along those presented by CatBoost's benchmarks.","category":"page"},{"location":"tutorials/ranking-LTRC/","page":"Ranking - Yahoo! LTRC","title":"Ranking - Yahoo! LTRC","text":"Model NDCG\nEvoTrees - mse 0.80080\nEvoTrees - logistic 0.80267\ncat-rmse 0.802115\ncat-query-rmse 0.802229\ncat-pair-logit 0.797318\ncat-pair-logit-pairwise 0.790396\ncat-yeti-rank 0.802972\nxgb-rmse 0.798892\nxgb-pairwise 0.800048\nxgb-lambdamart-ndcg 0.800048\nlgb-rmse 0.8013675\nlgb-pairwise 0.801347","category":"page"},{"location":"tutorials/ranking-LTRC/","page":"Ranking - Yahoo! LTRC","title":"Ranking - Yahoo! LTRC","text":"It should be noted that the later results were not reproduced in the scope of current tutorial, so one should be careful about any claim of model superiority. The results from CatBoost's benchmarks were however already indicative of strong performance of non-specialized ranking loss functions, to which this tutorial brings further support. ","category":"page"},{"location":"api/#fit_evotree","page":"API","title":"fit_evotree","text":"","category":"section"},{"location":"api/","page":"API","title":"API","text":"fit_evotree","category":"page"},{"location":"api/#EvoTrees.fit_evotree","page":"API","title":"EvoTrees.fit_evotree","text":"fit_evotree(\n    params::EvoTypes{L}, \n    dtrain;\n    target_name,\n    fnames=nothing,\n    w_name=nothing,\n    offset_name=nothing,\n    deval=nothing,\n    metric=nothing,\n    early_stopping_rounds=9999,\n    print_every_n=9999,\n    verbosity=1,\n    return_logger=false,\n    device=\"cpu\")\n\nMain training function. Performs model fitting given configuration params, dtrain, target_name and other optional kwargs. \n\nArguments\n\nparams::EvoTypes: configuration info providing hyper-paramters. EvoTypes can be one of: \nEvoTreeRegressor\nEvoTreeClassifier\nEvoTreeCount\nEvoTreeMLE\ndtrain: A Tables compatible training data (named tuples, DataFrame...) containing features and target variables. \n\nKeyword arguments\n\ntarget_name: name of target variable. \nfnames = nothing: the names of the x_train features. If provided, should be a vector of string with length(fnames) = size(x_train, 2).\nw_name = nothing: name of the variable containing weights. If nothing, common weights on one will be used.\noffset_name = nothing: name of the offset variable.\ndeval: A Tables compatible evaluation data containing features and target variables. \nmetric: The evaluation metric that wil be tracked on deval.    Supported metrics are: \n:mse: mean-squared error. Adapted for general regression models.\n:rmse: root-mean-squared error (CPU only). Adapted for general regression models.\n:mae: mean absolute error. Adapted for general regression models.\n:logloss: Adapted for :logistic regression models.\n:mlogloss: Multi-class cross entropy. Adapted to EvoTreeClassifier classification models. \n:poisson: Poisson deviance. Adapted to EvoTreeCount count models.\n:gamma: Gamma deviance. Adapted to regression problem on Gamma like, positively distributed targets.\n:tweedie: Tweedie deviance. Adapted to regression problem on Tweedie like, positively distributed targets with probability mass at y == 0.\n:gaussian_mle: Gaussian maximum log-likelihood. Adapted to EvoTreeMLE models with loss = :gaussian_mle. \n:logistic_mle: Logistic maximum log-likelihood. Adapted to EvoTreeMLE models with loss = :logistic_mle. \nearly_stopping_rounds::Integer: number of consecutive rounds without metric improvement after which fitting in stopped. \nprint_every_n: sets at which frequency logging info should be printed. \nverbosity: set to 1 to print logging info during training.\nreturn_logger::Bool = false: if set to true (default), fit_evotree return a tuple (m, logger) where logger is a dict containing various tracking information.\ndevice=\"cpu\": Hardware device to use for computations. Can be either \"cpu\" or \"gpu\". Following losses are not GPU supported at the moment:l1, :quantile, :logistic_mle.\n\n\n\n\n\nfit_evotree(params::EvoTypes{L};\n    x_train::AbstractMatrix, y_train::AbstractVector, w_train=nothing, offset_train=nothing,\n    x_eval=nothing, y_eval=nothing, w_eval=nothing, offset_eval=nothing,\n    early_stopping_rounds=9999,\n    print_every_n=9999,\n    verbosity=1)\n\nMain training function. Performs model fitting given configuration params, x_train, y_train and other optional kwargs. \n\nArguments\n\nparams::EvoTypes: configuration info providing hyper-paramters. EvoTypes can be one of: \nEvoTreeRegressor\nEvoTreeClassifier\nEvoTreeCount\nEvoTreeMLE\n\nKeyword arguments\n\nx_train::Matrix: training data of size [#observations, #features]. \ny_train::Vector: vector of train targets of length #observations.\nw_train::Vector: vector of train weights of length #observations. If nothing, a vector of ones is assumed.\noffset_train::VecOrMat: offset for the training data. Should match the size of the predictions.\nx_eval::Matrix: evaluation data of size [#observations, #features]. \ny_eval::Vector: vector of evaluation targets of length #observations.\nw_eval::Vector: vector of evaluation weights of length #observations. Defaults to nothing (assumes a vector of 1s).\noffset_eval::VecOrMat: evaluation data offset. Should match the size of the predictions.\nmetric: The evaluation metric that wil be tracked on x_eval, y_eval and optionally w_eval / offset_eval data.    Supported metrics are: \n:mse: mean-squared error. Adapted for general regression models.\n:rmse: root-mean-squared error (CPU only). Adapted for general regression models.\n:mae: mean absolute error. Adapted for general regression models.\n:logloss: Adapted for :logistic regression models.\n:mlogloss: Multi-class cross entropy. Adapted to EvoTreeClassifier classification models. \n:poisson: Poisson deviance. Adapted to EvoTreeCount count models.\n:gamma: Gamma deviance. Adapted to regression problem on Gamma like, positively distributed targets.\n:tweedie: Tweedie deviance. Adapted to regression problem on Tweedie like, positively distributed targets with probability mass at y == 0.\n:gaussian_mle: Gaussian maximum log-likelihood. Adapted to EvoTreeMLE models with loss = :gaussian_mle. \n:logistic_mle: Logistic maximum log-likelihood. Adapted to EvoTreeMLE models with loss = :logistic_mle. \nearly_stopping_rounds::Integer: number of consecutive rounds without metric improvement after which fitting in stopped. \nprint_every_n: sets at which frequency logging info should be printed. \nverbosity: set to 1 to print logging info during training.\nfnames: the names of the x_train features. If provided, should be a vector of string with length(fnames) = size(x_train, 2).\nreturn_logger::Bool = false: if set to true (default), fit_evotree return a tuple (m, logger) where logger is a dict containing various tracking information.\ndevice=\"cpu\": Hardware device to use for computations. Can be either \"cpu\" or \"gpu\". Following losses are not GPU supported at the moment:l1, :quantile, :logistic_mle.\n\n\n\n\n\n","category":"function"},{"location":"api/#Predict","page":"API","title":"Predict","text":"","category":"section"},{"location":"api/","page":"API","title":"API","text":"EvoTrees.predict","category":"page"},{"location":"api/#MLJModelInterface.predict","page":"API","title":"MLJModelInterface.predict","text":"predict(model::EvoTree, X::AbstractMatrix; ntree_limit = length(model.trees))\n\nPredictions from an EvoTree model - sums the predictions from all trees composing the model. Use ntree_limit=N to only predict with the first N trees.\n\n\n\n\n\n","category":"function"},{"location":"api/#Features-Importance","page":"API","title":"Features Importance","text":"","category":"section"},{"location":"api/","page":"API","title":"API","text":"EvoTrees.importance","category":"page"},{"location":"api/#EvoTrees.importance","page":"API","title":"EvoTrees.importance","text":"importance(model::EvoTree; fnames=model.info[:fnames])\n\nSorted normalized feature importance based on loss function gain. Feature names associated to the model are stored in model.info[:fnames] as a string Vector and can be updated at any time. Eg: model.info[:fnames] = new_fnames_vec.\n\n\n\n\n\n","category":"function"},{"location":"tutorials/logistic-regression-titanic/#Logistic-Regression-on-Titanic-Dataset","page":"Logistic Regression - Titanic","title":"Logistic Regression on Titanic Dataset","text":"","category":"section"},{"location":"tutorials/logistic-regression-titanic/","page":"Logistic Regression - Titanic","title":"Logistic Regression - Titanic","text":"We will use the Titanic dataset, which is included in the MLDatasets package. It describes the survival status of individual passengers on the Titanic. The model will be approached as a logistic regression problem, although a Classifier model could also have been used (see the Classification - Iris tutorial). ","category":"page"},{"location":"tutorials/logistic-regression-titanic/#Getting-started","page":"Logistic Regression - Titanic","title":"Getting started","text":"","category":"section"},{"location":"tutorials/logistic-regression-titanic/","page":"Logistic Regression - Titanic","title":"Logistic Regression - Titanic","text":"To begin, we will load the required packages and the dataset:","category":"page"},{"location":"tutorials/logistic-regression-titanic/","page":"Logistic Regression - Titanic","title":"Logistic Regression - Titanic","text":"using EvoTrees\nusing MLDatasets\nusing DataFrames\nusing Statistics: mean\nusing CategoricalArrays\nusing Random\n\ndf = MLDatasets.Titanic().dataframe","category":"page"},{"location":"tutorials/logistic-regression-titanic/#Preprocessing","page":"Logistic Regression - Titanic","title":"Preprocessing","text":"","category":"section"},{"location":"tutorials/logistic-regression-titanic/","page":"Logistic Regression - Titanic","title":"Logistic Regression - Titanic","text":"A first step in data processing is to prepare the input features in a model compatible format. ","category":"page"},{"location":"tutorials/logistic-regression-titanic/","page":"Logistic Regression - Titanic","title":"Logistic Regression - Titanic","text":"EvoTrees' Tables API supports input that are either Real, Bool or Categorical. A recommended approach for String features such as Sex is to convert them into an unordered Categorical. ","category":"page"},{"location":"tutorials/logistic-regression-titanic/","page":"Logistic Regression - Titanic","title":"Logistic Regression - Titanic","text":"For dealing with features withh missing values such as Age, a common approach is to first create an Bool indicator variable capturing the info on whether a value is missing. Then, the missing values can be inputed (replaced by some default values such as mean or median, or more sophisticated approach such as predictions from another model).","category":"page"},{"location":"tutorials/logistic-regression-titanic/","page":"Logistic Regression - Titanic","title":"Logistic Regression - Titanic","text":"# convert string feature to Categorical\ntransform!(df, :Sex => categorical => :Sex)\n\n# treat string feature and missing values\ntransform!(df, :Age => ByRow(ismissing) => :Age_ismissing)\ntransform!(df, :Age => (x -> coalesce.(x, median(skipmissing(x)))) => :Age);\n\n# remove unneeded variables\ndf = df[:, Not([:PassengerId, :Name, :Embarked, :Cabin, :Ticket])]\n","category":"page"},{"location":"tutorials/logistic-regression-titanic/","page":"Logistic Regression - Titanic","title":"Logistic Regression - Titanic","text":"The full data can now be split according to train and eval indices.  Target and feature names are also set.","category":"page"},{"location":"tutorials/logistic-regression-titanic/","page":"Logistic Regression - Titanic","title":"Logistic Regression - Titanic","text":"Random.seed!(123)\n\ntrain_ratio = 0.8\ntrain_indices = randperm(nrow(df))[1:Int(round(train_ratio * nrow(df)))]\n\ndtrain = df[train_indices, :]\ndeval = df[setdiff(1:nrow(df), train_indices), :]\n\ntarget_name = \"Survived\"\nfnames = setdiff(names(df), [target_name])","category":"page"},{"location":"tutorials/logistic-regression-titanic/#Training","page":"Logistic Regression - Titanic","title":"Training","text":"","category":"section"},{"location":"tutorials/logistic-regression-titanic/","page":"Logistic Regression - Titanic","title":"Logistic Regression - Titanic","text":"Now we are ready to train our model. We will first define a model configuration using the EvoTreeRegressor model constructor.  Then, we'll use fit_evotree to train a boosted tree model. We'll pass optional deval arguments, which enables the tracking of an evaluation metric and early stopping. ","category":"page"},{"location":"tutorials/logistic-regression-titanic/","page":"Logistic Regression - Titanic","title":"Logistic Regression - Titanic","text":"config = EvoTreeRegressor(\n  loss=:logistic, \n  nrounds=200, \n  eta=0.05, \n  nbins=128, \n  max_depth=5, \n  rowsample=0.5, \n  colsample=0.9)\n\nmodel = fit_evotree(\n    config, dtrain; \n    deval,\n    target_name,\n    fnames,\n    metric = :logloss,\n    early_stopping_rounds=10,\n    print_every_n=10)","category":"page"},{"location":"tutorials/logistic-regression-titanic/#Diagnosis","page":"Logistic Regression - Titanic","title":"Diagnosis","text":"","category":"section"},{"location":"tutorials/logistic-regression-titanic/","page":"Logistic Regression - Titanic","title":"Logistic Regression - Titanic","text":"We can get predictions by passing training and testing data to our model. We can then evaluate the accuracy of our model, which should be around 85%. ","category":"page"},{"location":"tutorials/logistic-regression-titanic/","page":"Logistic Regression - Titanic","title":"Logistic Regression - Titanic","text":"pred_train = model(dtrain)\npred_eval = model(deval)","category":"page"},{"location":"tutorials/logistic-regression-titanic/","page":"Logistic Regression - Titanic","title":"Logistic Regression - Titanic","text":"julia> mean((pred_train .> 0.5) .== dtrain[!, target_name])\n0.8821879382889201\n\njulia> mean((pred_eval .> 0.5) .== deval[!, target_name])\n0.8426966292134831","category":"page"},{"location":"tutorials/logistic-regression-titanic/","page":"Logistic Regression - Titanic","title":"Logistic Regression - Titanic","text":"Finally, features importance can be inspected using EvoTrees.importance.","category":"page"},{"location":"tutorials/logistic-regression-titanic/","page":"Logistic Regression - Titanic","title":"Logistic Regression - Titanic","text":"julia> EvoTrees.importance(model)\n7-element Vector{Pair{String, Float64}}:\n           \"Sex\" => 0.29612654189959403\n           \"Age\" => 0.25487324307720827\n          \"Fare\" => 0.2530947969323613\n        \"Pclass\" => 0.11354283043193575\n         \"SibSp\" => 0.05129209383816148\n         \"Parch\" => 0.017385183317069588\n \"Age_ismissing\" => 0.013685310503669728","category":"page"},{"location":"tutorials/regression-boston/#Regression-on-Boston-Housing-Dataset","page":"Regression - Boston","title":"Regression on Boston Housing Dataset","text":"","category":"section"},{"location":"tutorials/regression-boston/","page":"Regression - Boston","title":"Regression - Boston","text":"We will use the Boston Housing dataset, which is included in the MLDatasets package. It's derived from information collected by the U.S. Census Service concerning housing in the area of Boston. Target variable represents the median housing value.","category":"page"},{"location":"tutorials/regression-boston/#Getting-started","page":"Regression - Boston","title":"Getting started","text":"","category":"section"},{"location":"tutorials/regression-boston/","page":"Regression - Boston","title":"Regression - Boston","text":"To begin, we will load the required packages and the dataset:","category":"page"},{"location":"tutorials/regression-boston/","page":"Regression - Boston","title":"Regression - Boston","text":"using EvoTrees\nusing MLDatasets\nusing DataFrames\nusing Statistics: mean\nusing CategoricalArrays\nusing Random\n\ndf = MLDatasets.BostonHousing().dataframe","category":"page"},{"location":"tutorials/regression-boston/#Preprocessing","page":"Regression - Boston","title":"Preprocessing","text":"","category":"section"},{"location":"tutorials/regression-boston/","page":"Regression - Boston","title":"Regression - Boston","text":"Before we can train our model, we need to preprocess the dataset. We will split our data according to train and eval indices, and separate features from the target variable.","category":"page"},{"location":"tutorials/regression-boston/","page":"Regression - Boston","title":"Regression - Boston","text":"Random.seed!(123)\n\ntrain_ratio = 0.8\ntrain_indices = randperm(nrow(df))[1:Int(round(train_ratio * nrow(df)))]\n\ntrain_data = df[train_indices, :]\neval_data = df[setdiff(1:nrow(df), train_indices), :]\n\nx_train, y_train = Matrix(train_data[:, Not(:MEDV)]), train_data[:, :MEDV]\nx_eval, y_eval = Matrix(eval_data[:, Not(:MEDV)]), eval_data[:, :MEDV]","category":"page"},{"location":"tutorials/regression-boston/#Training","page":"Regression - Boston","title":"Training","text":"","category":"section"},{"location":"tutorials/regression-boston/","page":"Regression - Boston","title":"Regression - Boston","text":"Now we are ready to train our model. We will first define a model configuration using the EvoTreeRegressor model constructor.  Then, we'll use fit_evotree to train a boosted tree model. We'll pass optional x_eval and y_eval arguments, which enable the usage of early stopping. ","category":"page"},{"location":"tutorials/regression-boston/","page":"Regression - Boston","title":"Regression - Boston","text":"config = EvoTreeRegressor(\n    nrounds=200, \n    eta=0.1, \n    max_depth=4, \n    lambda=0.1, \n    rowsample=0.9, \n    colsample=0.9)\n\nmodel = fit_evotree(config;\n    x_train, y_train,\n    x_eval, y_eval,\n    metric = :mse,\n    early_stopping_rounds=10,\n    print_every_n=10)","category":"page"},{"location":"tutorials/regression-boston/","page":"Regression - Boston","title":"Regression - Boston","text":"Finally, we can get predictions by passing training and testing data to our model. We can then apply various evaluation metric, such as the MAE (mean absolute error):  ","category":"page"},{"location":"tutorials/regression-boston/","page":"Regression - Boston","title":"Regression - Boston","text":"pred_train = model(x_train)\npred_eval = model(x_eval)","category":"page"},{"location":"tutorials/regression-boston/","page":"Regression - Boston","title":"Regression - Boston","text":"julia> mean(abs.(pred_train .- y_train))\n1.056997874224627\n\njulia> mean(abs.(pred_eval .- y_eval))\n2.3298767665825264","category":"page"},{"location":"models/#Models","page":"Models","title":"Models","text":"","category":"section"},{"location":"models/#EvoTreeRegressor","page":"Models","title":"EvoTreeRegressor","text":"","category":"section"},{"location":"models/","page":"Models","title":"Models","text":"EvoTreeRegressor","category":"page"},{"location":"models/#EvoTrees.EvoTreeRegressor","page":"Models","title":"EvoTrees.EvoTreeRegressor","text":"EvoTreeRegressor(;kwargs...)\n\nA model type for constructing a EvoTreeRegressor, based on EvoTrees.jl, and implementing both an internal API and the MLJ model interface.\n\nHyper-parameters\n\nloss=:mse:         Loss to be be minimized during training. One of:\n:mse\n:logloss\n:gamma\n:tweedie\n:quantile\n:l1\nnrounds=10:           Number of rounds. It corresponds to the number of trees that will be sequentially stacked. Must be >= 1.\neta=0.1:              Learning rate. Each tree raw predictions are scaled by eta prior to be added to the stack of predictions. Must be > 0. A lower eta results in slower learning, requiring a higher nrounds but typically improves model performance.   \nlambda::T=0.0:        L2 regularization term on weights. Must be >= 0. Higher lambda can result in a more robust model.\ngamma::T=0.0:         Minimum gain improvement needed to perform a node split. Higher gamma can result in a more robust model. Must be >= 0.\nalpha::T=0.5:         Loss specific parameter in the [0, 1] range:                           - :quantile: target quantile for the regression.                           - :l1: weighting parameters to positive vs negative residuals.                                 - Positive residual weights = alpha                                 - Negative residual weights = (1 - alpha)\nmax_depth=5:          Maximum depth of a tree. Must be >= 1. A tree of depth 1 is made of a single prediction leaf. A complete tree of depth N contains 2^(N - 1) terminal leaves and 2^(N - 1) - 1 split nodes. Compute cost is proportional to 2^max_depth. Typical optimal values are in the 3 to 9 range.\nmin_weight=1.0:       Minimum weight needed in a node to perform a split. Matches the number of observations by default or the sum of weights as provided by the weights vector. Must be > 0.\nrowsample=1.0:        Proportion of rows that are sampled at each iteration to build the tree. Should be in ]0, 1].\ncolsample=1.0:        Proportion of columns / features that are sampled at each iteration to build the tree. Should be in ]0, 1].\nnbins=32:             Number of bins into which each feature is quantized. Buckets are defined based on quantiles, hence resulting in equal weight bins. Should be between 2 and 255.\nmonotone_constraints=Dict{Int, Int}(): Specify monotonic constraints using a dict where the key is the feature index and the value the applicable constraint (-1=decreasing, 0=none, 1=increasing).  Only :linear, :logistic, :gamma and tweedie losses are supported at the moment.\ntree_type=\"binary\"    Tree structure to be used. One of:\nbinary:       Each node of a tree is grown independently. Tree are built depthwise until max depth is reach or if min weight or gain (see gamma) stops further node splits.  \noblivious:    A common splitting condition is imposed to all nodes of a given depth. \nrng=123:              Either an integer used as a seed to the random number generator or an actual random number generator (::Random.AbstractRNG).\n\nInternal API\n\nDo config = EvoTreeRegressor() to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in EvoTreeRegressor(loss=...).\n\nTraining model\n\nA model is built using fit_evotree:\n\nmodel = fit_evotree(config; x_train, y_train, kwargs...)\n\nInference\n\nPredictions are obtained using predict which returns a Matrix of size [nobs, 1]:\n\nEvoTrees.predict(model, X)\n\nAlternatively, models act as a functor, returning predictions when called as a function with features as argument:\n\nmodel(X)\n\nMLJ Interface\n\nFrom MLJ, the type can be imported using:\n\nEvoTreeRegressor = @load EvoTreeRegressor pkg=EvoTrees\n\nDo model = EvoTreeRegressor() to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in EvoTreeRegressor(loss=...).\n\nTraining model\n\nIn MLJ or MLJBase, bind an instance model to data with     mach = machine(model, X, y) where\n\nX: any table of input features (eg, a DataFrame) whose columns each have one of the following element scitypes: Continuous, Count, or <:OrderedFactor; check column scitypes with schema(X)\ny: is the target, which can be any AbstractVector whose element scitype is <:Continuous; check the scitype with scitype(y)\n\nTrain the machine using fit!(mach, rows=...).\n\nOperations\n\npredict(mach, Xnew): return predictions of the target given features Xnew having the same scitype as X above. Predictions are deterministic.\n\nFitted parameters\n\nThe fields of fitted_params(mach) are:\n\n:fitresult: The GBTree object returned by EvoTrees.jl fitting algorithm.\n\nReport\n\nThe fields of report(mach) are:\n\n:features: The names of the features encountered in training.\n\nExamples\n\n# Internal API\nusing EvoTrees\nconfig = EvoTreeRegressor(max_depth=5, nbins=32, nrounds=100)\nnobs, nfeats = 1_000, 5\nx_train, y_train = randn(nobs, nfeats), rand(nobs)\nmodel = fit_evotree(config; x_train, y_train)\npreds = EvoTrees.predict(model, x_train)\n\n# MLJ Interface\nusing MLJ\nEvoTreeRegressor = @load EvoTreeRegressor pkg=EvoTrees\nmodel = EvoTreeRegressor(max_depth=5, nbins=32, nrounds=100)\nX, y = @load_boston\nmach = machine(model, X, y) |> fit!\npreds = predict(mach, X)\n\n\n\n\n\n","category":"type"},{"location":"models/#EvoTreeClassifier","page":"Models","title":"EvoTreeClassifier","text":"","category":"section"},{"location":"models/","page":"Models","title":"Models","text":"EvoTreeClassifier","category":"page"},{"location":"models/#EvoTrees.EvoTreeClassifier","page":"Models","title":"EvoTrees.EvoTreeClassifier","text":"EvoTreeClassifier(;kwargs...)\n\nA model type for constructing a EvoTreeClassifier, based on EvoTrees.jl, and implementing both an internal API and the MLJ model interface. EvoTreeClassifier is used to perform multi-class classification, using cross-entropy loss.\n\nHyper-parameters\n\nnrounds=10:                 Number of rounds. It corresponds to the number of trees that will be sequentially stacked. Must be >= 1.\neta=0.1:              Learning rate. Each tree raw predictions are scaled by eta prior to be added to the stack of predictions. Must be > 0. A lower eta results in slower learning, requiring a higher nrounds but typically improves model performance.  \nlambda::T=0.0:              L2 regularization term on weights. Must be >= 0. Higher lambda can result in a more robust model.\ngamma::T=0.0:               Minimum gain improvement needed to perform a node split. Higher gamma can result in a more robust model. Must be >= 0.\nmax_depth=5:                Maximum depth of a tree. Must be >= 1. A tree of depth 1 is made of a single prediction leaf. A complete tree of depth N contains 2^(N - 1) terminal leaves and 2^(N - 1) - 1 split nodes. Compute cost is proportional to 2^max_depth. Typical optimal values are in the 3 to 9 range.\nmin_weight=1.0:             Minimum weight needed in a node to perform a split. Matches the number of observations by default or the sum of weights as provided by the weights vector. Must be > 0.\nrowsample=1.0:              Proportion of rows that are sampled at each iteration to build the tree. Should be in ]0, 1].\ncolsample=1.0:              Proportion of columns / features that are sampled at each iteration to build the tree. Should be in ]0, 1].\nnbins=32:                   Number of bins into which each feature is quantized. Buckets are defined based on quantiles, hence resulting in equal weight bins. Should be between 2 and 255.\ntree_type=\"binary\"    Tree structure to be used. One of:\nbinary:       Each node of a tree is grown independently. Tree are built depthwise until max depth is reach or if min weight or gain (see gamma) stops further node splits.  \noblivious:    A common splitting condition is imposed to all nodes of a given depth. \nrng=123:                    Either an integer used as a seed to the random number generator or an actual random number generator (::Random.AbstractRNG).\n\nInternal API\n\nDo config = EvoTreeClassifier() to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in EvoTreeClassifier(max_depth=...).\n\nTraining model\n\nA model is built using fit_evotree:\n\nmodel = fit_evotree(config; x_train, y_train, kwargs...)\n\nInference\n\nPredictions are obtained using predict which returns a Matrix of size [nobs, K] where K is the number of classes:\n\nEvoTrees.predict(model, X)\n\nAlternatively, models act as a functor, returning predictions when called as a function with features as argument:\n\nmodel(X)\n\nMLJ\n\nFrom MLJ, the type can be imported using:\n\nEvoTreeClassifier = @load EvoTreeClassifier pkg=EvoTrees\n\nDo model = EvoTreeClassifier() to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in EvoTreeClassifier(loss=...).\n\nTraining data\n\nIn MLJ or MLJBase, bind an instance model to data with\n\nmach = machine(model, X, y)\n\nwhere\n\nX: any table of input features (eg, a DataFrame) whose columns each have one of the following element scitypes: Continuous, Count, or <:OrderedFactor; check column scitypes with schema(X)\ny: is the target, which can be any AbstractVector whose element scitype is <:Multiclas or <:OrderedFactor; check the scitype with scitype(y)\n\nTrain the machine using fit!(mach, rows=...).\n\nOperations\n\npredict(mach, Xnew): return predictions of the target given features Xnew having the same scitype as X above. Predictions are probabilistic.\npredict_mode(mach, Xnew): returns the mode of each of the prediction above.\n\nFitted parameters\n\nThe fields of fitted_params(mach) are:\n\n:fitresult: The GBTree object returned by EvoTrees.jl fitting algorithm.\n\nReport\n\nThe fields of report(mach) are:\n\n:features: The names of the features encountered in training.\n\nExamples\n\n# Internal API\nusing EvoTrees\nconfig = EvoTreeClassifier(max_depth=5, nbins=32, nrounds=100)\nnobs, nfeats = 1_000, 5\nx_train, y_train = randn(nobs, nfeats), rand(1:3, nobs)\nmodel = fit_evotree(config; x_train, y_train)\npreds = EvoTrees.predict(model, x_train)\n\n# MLJ Interface\nusing MLJ\nEvoTreeClassifier = @load EvoTreeClassifier pkg=EvoTrees\nmodel = EvoTreeClassifier(max_depth=5, nbins=32, nrounds=100)\nX, y = @load_iris\nmach = machine(model, X, y) |> fit!\npreds = predict(mach, X)\npreds = predict_mode(mach, X)\n\nSee also EvoTrees.jl.\n\n\n\n\n\n","category":"type"},{"location":"models/#EvoTreeCount","page":"Models","title":"EvoTreeCount","text":"","category":"section"},{"location":"models/","page":"Models","title":"Models","text":"EvoTreeCount","category":"page"},{"location":"models/#EvoTrees.EvoTreeCount","page":"Models","title":"EvoTrees.EvoTreeCount","text":"EvoTreeCount(;kwargs...)\n\nA model type for constructing a EvoTreeCount, based on EvoTrees.jl, and implementing both an internal API the MLJ model interface. EvoTreeCount is used to perform Poisson probabilistic regression on count target.\n\nHyper-parameters\n\nnrounds=10:                 Number of rounds. It corresponds to the number of trees that will be sequentially stacked. Must be >= 1.\neta=0.1:              Learning rate. Each tree raw predictions are scaled by eta prior to be added to the stack of predictions. Must be > 0. A lower eta results in slower learning, requiring a higher nrounds but typically improves model performance.  \nlambda::T=0.0:              L2 regularization term on weights. Must be >= 0. Higher lambda can result in a more robust model. Must be >= 0.\ngamma::T=0.0:               Minimum gain imprvement needed to perform a node split. Higher gamma can result in a more robust model.\nmax_depth=5:                Maximum depth of a tree. Must be >= 1. A tree of depth 1 is made of a single prediction leaf. A complete tree of depth N contains 2^(N - 1) terminal leaves and 2^(N - 1) - 1 split nodes. Compute cost is proportional to 2^max_depth. Typical optimal values are in the 3 to 9 range.\nmin_weight=1.0:             Minimum weight needed in a node to perform a split. Matches the number of observations by default or the sum of weights as provided by the weights vector. Must be > 0.\nrowsample=1.0:              Proportion of rows that are sampled at each iteration to build the tree. Should be ]0, 1].\ncolsample=1.0:              Proportion of columns / features that are sampled at each iteration to build the tree. Should be ]0, 1].\nnbins=32:                   Number of bins into which each feature is quantized. Buckets are defined based on quantiles, hence resulting in equal weight bins. Should be between 2 and 255.\nmonotone_constraints=Dict{Int, Int}(): Specify monotonic constraints using a dict where the key is the feature index and the value the applicable constraint (-1=decreasing, 0=none, 1=increasing).\ntree_type=\"binary\"    Tree structure to be used. One of:\nbinary:       Each node of a tree is grown independently. Tree are built depthwise until max depth is reach or if min weight or gain (see gamma) stops further node splits.  \noblivious:    A common splitting condition is imposed to all nodes of a given depth. \nrng=123:                    Either an integer used as a seed to the random number generator or an actual random number generator (::Random.AbstractRNG).\n\nInternal API\n\nDo config = EvoTreeCount() to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in EvoTreeCount(max_depth=...).\n\nTraining model\n\nA model is built using fit_evotree:\n\nmodel = fit_evotree(config; x_train, y_train, kwargs...)\n\nInference\n\nPredictions are obtained using predict which returns a Matrix of size [nobs, 1]:\n\nEvoTrees.predict(model, X)\n\nAlternatively, models act as a functor, returning predictions when called as a function with features as argument:\n\nmodel(X)\n\nMLJ\n\nFrom MLJ, the type can be imported using:\n\nEvoTreeCount = @load EvoTreeCount pkg=EvoTrees\n\nDo model = EvoTreeCount() to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in EvoTreeCount(loss=...).\n\nTraining data\n\nIn MLJ or MLJBase, bind an instance model to data with     mach = machine(model, X, y) where\n\nX: any table of input features (eg, a DataFrame) whose columns each have one of the following element scitypes: Continuous, Count, or <:OrderedFactor; check column scitypes with schema(X)\ny: is the target, which can be any AbstractVector whose element scitype is <:Count; check the scitype with scitype(y)\n\nTrain the machine using fit!(mach, rows=...).\n\nOperations\n\npredict(mach, Xnew): returns a vector of Poisson distributions given features Xnew having the same scitype as X above. Predictions are probabilistic.\n\nSpecific metrics can also be predicted using:\n\npredict_mean(mach, Xnew)\npredict_mode(mach, Xnew)\npredict_median(mach, Xnew)\n\nFitted parameters\n\nThe fields of fitted_params(mach) are:\n\n:fitresult: The GBTree object returned by EvoTrees.jl fitting algorithm.\n\nReport\n\nThe fields of report(mach) are:\n\n:features: The names of the features encountered in training.\n\nExamples\n\n# Internal API\nusing EvoTrees\nconfig = EvoTreeCount(max_depth=5, nbins=32, nrounds=100)\nnobs, nfeats = 1_000, 5\nx_train, y_train = randn(nobs, nfeats), rand(0:2, nobs)\nmodel = fit_evotree(config; x_train, y_train)\npreds = EvoTrees.predict(model, x_train)\n\nusing MLJ\nEvoTreeCount = @load EvoTreeCount pkg=EvoTrees\nmodel = EvoTreeCount(max_depth=5, nbins=32, nrounds=100)\nnobs, nfeats = 1_000, 5\nX, y = randn(nobs, nfeats), rand(0:2, nobs)\nmach = machine(model, X, y) |> fit!\npreds = predict(mach, X)\npreds = predict_mean(mach, X)\npreds = predict_mode(mach, X)\npreds = predict_median(mach, X)\n\n\nSee also EvoTrees.jl.\n\n\n\n\n\n","category":"type"},{"location":"models/#EvoTreeMLE","page":"Models","title":"EvoTreeMLE","text":"","category":"section"},{"location":"models/","page":"Models","title":"Models","text":"EvoTreeMLE","category":"page"},{"location":"models/#EvoTrees.EvoTreeMLE","page":"Models","title":"EvoTrees.EvoTreeMLE","text":"EvoTreeMLE(;kwargs...)\n\nA model type for constructing a EvoTreeMLE, based on EvoTrees.jl, and implementing both an internal API the MLJ model interface. EvoTreeMLE performs maximum likelihood estimation. Assumed distribution is specified through loss kwargs. Both Gaussian and Logistic distributions are supported.\n\nHyper-parameters\n\nloss=:gaussian:         Loss to be be minimized during training. One of:\n\n:gaussian / :gaussian_mle\n:logistic / :logistic_mle\nnrounds=10:                 Number of rounds. It corresponds to the number of trees that will be sequentially stacked. Must be >= 1.\neta=0.1:              Learning rate. Each tree raw predictions are scaled by eta prior to be added to the stack of predictions. Must be > 0.\n\nA lower eta results in slower learning, requiring a higher nrounds but typically improves model performance.  \n\nlambda::T=0.0:              L2 regularization term on weights. Must be >= 0. Higher lambda can result in a more robust model.\ngamma::T=0.0:               Minimum gain imprvement needed to perform a node split. Higher gamma can result in a more robust model. Must be >= 0.\nmax_depth=5:                Maximum depth of a tree. Must be >= 1. A tree of depth 1 is made of a single prediction leaf. A complete tree of depth N contains 2^(N - 1) terminal leaves and 2^(N - 1) - 1 split nodes. Compute cost is proportional to 2^max_depth. Typical optimal values are in the 3 to 9 range.\nmin_weight=8.0:             Minimum weight needed in a node to perform a split. Matches the number of observations by default or the sum of weights as provided by the weights vector. Must be > 0.\nrowsample=1.0:              Proportion of rows that are sampled at each iteration to build the tree. Should be in ]0, 1].\ncolsample=1.0:              Proportion of columns / features that are sampled at each iteration to build the tree. Should be in ]0, 1].\nnbins=32:                   Number of bins into which each feature is quantized. Buckets are defined based on quantiles, hence resulting in equal weight bins. Should be between 2 and 255.\nmonotone_constraints=Dict{Int, Int}(): Specify monotonic constraints using a dict where the key is the feature index and the value the applicable constraint (-1=decreasing, 0=none, 1=increasing).  !Experimental feature: note that for MLE regression, constraints may not be enforced systematically.\ntree_type=\"binary\"          Tree structure to be used. One of:\nbinary:       Each node of a tree is grown independently. Tree are built depthwise until max depth is reach or if min weight or gain (see gamma) stops further node splits.  \noblivious:    A common splitting condition is imposed to all nodes of a given depth. \nrng=123:                    Either an integer used as a seed to the random number generator or an actual random number generator (::Random.AbstractRNG).\n\nInternal API\n\nDo config = EvoTreeMLE() to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in EvoTreeMLE(max_depth=...).\n\nTraining model\n\nA model is built using fit_evotree:\n\nmodel = fit_evotree(config; x_train, y_train, kwargs...)\n\nInference\n\nPredictions are obtained using predict which returns a Matrix of size [nobs, nparams] where the second dimensions refer to μ & σ for Normal/Gaussian and μ & s for Logistic.\n\nEvoTrees.predict(model, X)\n\nAlternatively, models act as a functor, returning predictions when called as a function with features as argument:\n\nmodel(X)\n\nMLJ\n\nFrom MLJ, the type can be imported using:\n\nEvoTreeMLE = @load EvoTreeMLE pkg=EvoTrees\n\nDo model = EvoTreeMLE() to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in EvoTreeMLE(loss=...).\n\nTraining data\n\nIn MLJ or MLJBase, bind an instance model to data with\n\nmach = machine(model, X, y)\n\nwhere\n\nX: any table of input features (eg, a DataFrame) whose columns each have one of the following element scitypes: Continuous, Count, or <:OrderedFactor; check column scitypes with schema(X)\ny: is the target, which can be any AbstractVector whose element scitype is <:Continuous; check the scitype with scitype(y)\n\nTrain the machine using fit!(mach, rows=...).\n\nOperations\n\npredict(mach, Xnew): returns a vector of Gaussian or Logistic distributions (according to provided loss) given features Xnew having the same scitype as X above.\n\nPredictions are probabilistic.\n\nSpecific metrics can also be predicted using:\n\npredict_mean(mach, Xnew)\npredict_mode(mach, Xnew)\npredict_median(mach, Xnew)\n\nFitted parameters\n\nThe fields of fitted_params(mach) are:\n\n:fitresult: The GBTree object returned by EvoTrees.jl fitting algorithm.\n\nReport\n\nThe fields of report(mach) are:\n\n:features: The names of the features encountered in training.\n\nExamples\n\n# Internal API\nusing EvoTrees\nconfig = EvoTreeMLE(max_depth=5, nbins=32, nrounds=100)\nnobs, nfeats = 1_000, 5\nx_train, y_train = randn(nobs, nfeats), rand(nobs)\nmodel = fit_evotree(config; x_train, y_train)\npreds = EvoTrees.predict(model, x_train)\n\n# MLJ Interface\nusing MLJ\nEvoTreeMLE = @load EvoTreeMLE pkg=EvoTrees\nmodel = EvoTreeMLE(max_depth=5, nbins=32, nrounds=100)\nX, y = @load_boston\nmach = machine(model, X, y) |> fit!\npreds = predict(mach, X)\npreds = predict_mean(mach, X)\npreds = predict_mode(mach, X)\npreds = predict_median(mach, X)\n\n\n\n\n\n","category":"type"},{"location":"models/#EvoTreeGaussian","page":"Models","title":"EvoTreeGaussian","text":"","category":"section"},{"location":"models/","page":"Models","title":"Models","text":"EvoTreeGaussian is to be deprecated. Please use EvoTreeMLE with loss = :gaussian_mle. ","category":"page"},{"location":"models/","page":"Models","title":"Models","text":"EvoTreeGaussian","category":"page"},{"location":"models/#EvoTrees.EvoTreeGaussian","page":"Models","title":"EvoTrees.EvoTreeGaussian","text":"EvoTreeGaussian(;kwargs...)\n\nA model type for constructing a EvoTreeGaussian, based on EvoTrees.jl, and implementing both an internal API the MLJ model interface. EvoTreeGaussian is used to perform Gaussian probabilistic regression, fitting μ and σ parameters to maximize likelihood.\n\nHyper-parameters\n\nnrounds=10:                 Number of rounds. It corresponds to the number of trees that will be sequentially stacked. Must be >= 1.\neta=0.1:              Learning rate. Each tree raw predictions are scaled by eta prior to be added to the stack of predictions. Must be > 0. A lower eta results in slower learning, requiring a higher nrounds but typically improves model performance.  \nlambda::T=0.0:              L2 regularization term on weights. Must be >= 0. Higher lambda can result in a more robust model.\ngamma::T=0.0:               Minimum gain imprvement needed to perform a node split. Higher gamma can result in a more robust model. Must be >= 0.\nmax_depth=5:                Maximum depth of a tree. Must be >= 1. A tree of depth 1 is made of a single prediction leaf. A complete tree of depth N contains 2^(N - 1) terminal leaves and 2^(N - 1) - 1 split nodes. Compute cost is proportional to 2^max_depth. Typical optimal values are in the 3 to 9 range.\nmin_weight=8.0:             Minimum weight needed in a node to perform a split. Matches the number of observations by default or the sum of weights as provided by the weights vector. Must be > 0.\nrowsample=1.0:              Proportion of rows that are sampled at each iteration to build the tree. Should be in ]0, 1].\ncolsample=1.0:              Proportion of columns / features that are sampled at each iteration to build the tree. Should be in ]0, 1].\nnbins=32:                   Number of bins into which each feature is quantized. Buckets are defined based on quantiles, hence resulting in equal weight bins. Should be between 2 and 255.\nmonotone_constraints=Dict{Int, Int}(): Specify monotonic constraints using a dict where the key is the feature index and the value the applicable constraint (-1=decreasing, 0=none, 1=increasing).  !Experimental feature: note that for Gaussian regression, constraints may not be enforce systematically.\ntree_type=\"binary\"    Tree structure to be used. One of:\nbinary:       Each node of a tree is grown independently. Tree are built depthwise until max depth is reach or if min weight or gain (see gamma) stops further node splits.  \noblivious:    A common splitting condition is imposed to all nodes of a given depth. \nrng=123:                    Either an integer used as a seed to the random number generator or an actual random number generator (::Random.AbstractRNG).\n\nInternal API\n\nDo config = EvoTreeGaussian() to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in EvoTreeGaussian(max_depth=...).\n\nTraining model\n\nA model is built using fit_evotree:\n\nmodel = fit_evotree(config; x_train, y_train, kwargs...)\n\nInference\n\nPredictions are obtained using predict which returns a Matrix of size [nobs, 2] where the second dimensions refer to μ and σ respectively:\n\nEvoTrees.predict(model, X)\n\nAlternatively, models act as a functor, returning predictions when called as a function with features as argument:\n\nmodel(X)\n\nMLJ\n\nFrom MLJ, the type can be imported using:\n\nEvoTreeGaussian = @load EvoTreeGaussian pkg=EvoTrees\n\nDo model = EvoTreeGaussian() to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in EvoTreeGaussian(loss=...).\n\nTraining data\n\nIn MLJ or MLJBase, bind an instance model to data with\n\nmach = machine(model, X, y)\n\nwhere\n\nX: any table of input features (eg, a DataFrame) whose columns each have one of the following element scitypes: Continuous, Count, or <:OrderedFactor; check column scitypes with schema(X)\ny: is the target, which can be any AbstractVector whose element scitype is <:Continuous; check the scitype with scitype(y)\n\nTrain the machine using fit!(mach, rows=...).\n\nOperations\n\npredict(mach, Xnew): returns a vector of Gaussian distributions given features Xnew having the same scitype as X above.\n\nPredictions are probabilistic.\n\nSpecific metrics can also be predicted using:\n\npredict_mean(mach, Xnew)\npredict_mode(mach, Xnew)\npredict_median(mach, Xnew)\n\nFitted parameters\n\nThe fields of fitted_params(mach) are:\n\n:fitresult: The GBTree object returned by EvoTrees.jl fitting algorithm.\n\nReport\n\nThe fields of report(mach) are:\n\n:features: The names of the features encountered in training.\n\nExamples\n\n# Internal API\nusing EvoTrees\nparams = EvoTreeGaussian(max_depth=5, nbins=32, nrounds=100)\nnobs, nfeats = 1_000, 5\nx_train, y_train = randn(nobs, nfeats), rand(nobs)\nmodel = fit_evotree(params; x_train, y_train)\npreds = EvoTrees.predict(model, x_train)\n\n# MLJ Interface\nusing MLJ\nEvoTreeGaussian = @load EvoTreeGaussian pkg=EvoTrees\nmodel = EvoTreeGaussian(max_depth=5, nbins=32, nrounds=100)\nX, y = @load_boston\nmach = machine(model, X, y) |> fit!\npreds = predict(mach, X)\npreds = predict_mean(mach, X)\npreds = predict_mode(mach, X)\npreds = predict_median(mach, X)\n\n\n\n\n\n","category":"type"},{"location":"tutorials/classification-iris/#Classification-on-Iris-dataset","page":"Classification - IRIS","title":"Classification on Iris dataset","text":"","category":"section"},{"location":"tutorials/classification-iris/","page":"Classification - IRIS","title":"Classification - IRIS","text":"We will use the iris dataset, which is included in the MLDatasets package. This dataset consists of measurements of the sepal length, sepal width, petal length, and petal width for three different types of iris flowers: Setosa, Versicolor, and Virginica.","category":"page"},{"location":"tutorials/classification-iris/#Getting-started","page":"Classification - IRIS","title":"Getting started","text":"","category":"section"},{"location":"tutorials/classification-iris/","page":"Classification - IRIS","title":"Classification - IRIS","text":"To begin, we will load the required packages and the dataset:","category":"page"},{"location":"tutorials/classification-iris/","page":"Classification - IRIS","title":"Classification - IRIS","text":"using EvoTrees\nusing MLDatasets\nusing DataFrames\nusing Statistics: mean\nusing CategoricalArrays\nusing Random\n\ndf = MLDatasets.Iris().dataframe","category":"page"},{"location":"tutorials/classification-iris/#Preprocessing","page":"Classification - IRIS","title":"Preprocessing","text":"","category":"section"},{"location":"tutorials/classification-iris/","page":"Classification - IRIS","title":"Classification - IRIS","text":"Before we can train our model, we need to preprocess the dataset. We will convert the class variable, which specifies the type of iris flower, into a categorical variable.","category":"page"},{"location":"tutorials/classification-iris/","page":"Classification - IRIS","title":"Classification - IRIS","text":"Random.seed!(123)\n\ndf[!, :class] = categorical(df[!, :class])\ntarget_name = \"class\"\nfnames = setdiff(names(df), [target_name])\n\ntrain_ratio = 0.8\ntrain_indices = randperm(nrow(df))[1:Int(train_ratio * nrow(df))]\n\ndtrain = df[train_indices, :]\ndeval = df[setdiff(1:nrow(df), train_indices), :]","category":"page"},{"location":"tutorials/classification-iris/#Training","page":"Classification - IRIS","title":"Training","text":"","category":"section"},{"location":"tutorials/classification-iris/","page":"Classification - IRIS","title":"Classification - IRIS","text":"Now we are ready to train our model. We will first define a model configuration using the EvoTreeClassifier model constructor.  Then, we'll use fit_evotree to train a boosted tree model. We'll pass optional x_eval and y_eval arguments, which enable the usage of early stopping. ","category":"page"},{"location":"tutorials/classification-iris/","page":"Classification - IRIS","title":"Classification - IRIS","text":"config = EvoTreeClassifier(\n    nrounds=200, \n    eta=0.05, \n    max_depth=5, \n    lambda=0.1, \n    rowsample=0.8, \n    colsample=0.8)\n\nmodel = fit_evotree(config, dtrain;\n    target_name,\n    fnames,\n    deval,\n    metric = :mlogloss,\n    early_stopping_rounds=10,\n    print_every_n=10)","category":"page"},{"location":"tutorials/classification-iris/","page":"Classification - IRIS","title":"Classification - IRIS","text":"Finally, we can get predictions by passing training and testing data to our model. We can then evaluate the accuracy of our model, which should be near 100% for this simple classification problem. ","category":"page"},{"location":"tutorials/classification-iris/","page":"Classification - IRIS","title":"Classification - IRIS","text":"pred_train = model(x_train)\nidx_train = [findmax(row)[2] for row in eachrow(pred_train)]\n\npred_eval = model(x_eval)\nidx_eval = [findmax(row)[2] for row in eachrow(pred_eval)]","category":"page"},{"location":"tutorials/classification-iris/","page":"Classification - IRIS","title":"Classification - IRIS","text":"julia> mean(idx_train .== levelcode.(y_train))\n1.0\n\njulia> mean(idx_eval .== levelcode.(y_eval))\n0.9333333333333333","category":"page"},{"location":"tutorials/examples-MLJ/#MLJ-Integration","page":"MLJ API","title":"MLJ Integration","text":"","category":"section"},{"location":"tutorials/examples-MLJ/","page":"MLJ API","title":"MLJ API","text":"EvoTrees.jl provides a first-class integration with the MLJ ecosystem. ","category":"page"},{"location":"tutorials/examples-MLJ/","page":"MLJ API","title":"MLJ API","text":"See official project page for more info.","category":"page"},{"location":"tutorials/examples-MLJ/","page":"MLJ API","title":"MLJ API","text":"To use with MLJ, an EvoTrees model configuration must first be initialized using either: ","category":"page"},{"location":"tutorials/examples-MLJ/","page":"MLJ API","title":"MLJ API","text":"EvoTreeRegressor\nEvoTreeClassifier\nEvoTreeCount\nEvoTreeMLE","category":"page"},{"location":"tutorials/examples-MLJ/","page":"MLJ API","title":"MLJ API","text":"The model is then passed to MLJ's machine, opening access to the rest of the MLJ modeling ecosystem. ","category":"page"},{"location":"tutorials/examples-MLJ/","page":"MLJ API","title":"MLJ API","text":"using StatsBase: sample\nusing EvoTrees\nusing EvoTrees: sigmoid, logit # only needed to create the synthetic data below\nusing MLJBase\n\nfeatures = rand(10_000) .* 5 .- 2\nX = reshape(features, (size(features)[1], 1))\nY = sin.(features) .* 0.5 .+ 0.5\nY = logit(Y) + randn(size(Y))\nY = sigmoid(Y)\ny = Y\nX = MLJBase.table(X)\n\n# linear regression\ntree_model = EvoTreeRegressor(loss=:linear, max_depth=5, eta=0.05, nrounds=10)\n\n# set machine\nmach = machine(tree_model, X, y)\n\n# partition data\ntrain, test = partition(eachindex(y), 0.7, shuffle=true); # 70:30 split\n\n# fit data\nfit!(mach, rows=train, verbosity=1)\n\n# continue training\nmach.model.nrounds += 10\nfit!(mach, rows=train, verbosity=1)\n\n# predict on train data\npred_train = predict(mach, selectrows(X, train))\nmean(abs.(pred_train - selectrows(Y, train)))\n\n# predict on test data\npred_test = predict(mach, selectrows(X, test))\nmean(abs.(pred_test - selectrows(Y, test)))","category":"page"},{"location":"#[EvoTrees.jl](https://github.com/Evovest/EvoTrees.jl)","page":"Introduction","title":"EvoTrees.jl","text":"","category":"section"},{"location":"","page":"Introduction","title":"Introduction","text":"A Julia implementation of boosted trees with CPU and GPU support. Efficient histogram based algorithms with support for multiple loss functions, including various regressions, multi-classification and Gaussian max likelihood. ","category":"page"},{"location":"","page":"Introduction","title":"Introduction","text":"See the examples-API section to get started using the internal API, or examples-MLJ to use within the MLJ framework.","category":"page"},{"location":"","page":"Introduction","title":"Introduction","text":"Complete details about hyper-parameters are found in the Models section.","category":"page"},{"location":"","page":"Introduction","title":"Introduction","text":"R binding available.","category":"page"},{"location":"#Installation","page":"Introduction","title":"Installation","text":"","category":"section"},{"location":"","page":"Introduction","title":"Introduction","text":"Latest:","category":"page"},{"location":"","page":"Introduction","title":"Introduction","text":"julia> Pkg.add(url=\"https://github.com/Evovest/EvoTrees.jl\")","category":"page"},{"location":"","page":"Introduction","title":"Introduction","text":"From General Registry:","category":"page"},{"location":"","page":"Introduction","title":"Introduction","text":"julia> Pkg.add(\"EvoTrees\")","category":"page"},{"location":"#Quick-start","page":"Introduction","title":"Quick start","text":"","category":"section"},{"location":"","page":"Introduction","title":"Introduction","text":"A model configuration must first be defined, using one of the model constructor: ","category":"page"},{"location":"","page":"Introduction","title":"Introduction","text":"EvoTreeRegressor\nEvoTreeClassifier\nEvoTreeCount\nEvoTreeMLE","category":"page"},{"location":"","page":"Introduction","title":"Introduction","text":"Then fitting can be performed using fit_evotree. 2 broad methods are supported: Matrix and Tables based inputs. Optional kwargs can be used to specify eval data on which to track eval metric and perform early stopping. Look at the docs for more details on available hyper-parameters for each of the above constructors and other options for training.","category":"page"},{"location":"","page":"Introduction","title":"Introduction","text":"Predictions are obtained by passing features data to the model. Model acts as a functor, ie. it's a struct containing the fitted model as well as a function generating the prediction of that model for the features argument. ","category":"page"},{"location":"#Matrix-features-input","page":"Introduction","title":"Matrix features input","text":"","category":"section"},{"location":"","page":"Introduction","title":"Introduction","text":"using EvoTrees\n\nconfig = EvoTreeRegressor(\n    loss=:mse, \n    nrounds=100, \n    max_depth=6,\n    nbins=32,\n    eta=0.1)\n\nx_train, y_train = rand(1_000, 10), rand(1_000)\nm = fit_evotree(config; x_train, y_train)\npreds = m(x_train)","category":"page"},{"location":"#DataFrames-and-Tables-input","page":"Introduction","title":"DataFrames and Tables input","text":"","category":"section"},{"location":"","page":"Introduction","title":"Introduction","text":"When using a Tables compatible input such as DataFrames, features with elements types Real (incl. Bool) and Categorical are automatically recognized as input features. Alternatively, fnames kwarg can be used. ","category":"page"},{"location":"","page":"Introduction","title":"Introduction","text":"Categorical features are treated accordingly by the algorithm. Ordered variables will be treated as numerical features, using ≤ split rule, while unordered variables are using ==. Support is currently limited to a maximum of 255 levels. Bool variables are treated as unordered, 2-levels cat variables.","category":"page"},{"location":"","page":"Introduction","title":"Introduction","text":"dtrain = DataFrame(x_train, :auto)\ndtrain.y .= y_train\nm = fit_evotree(config, dtrain; target_name=\"y\");\nm = fit_evotree(config, dtrain; target_name=\"y\", fnames=[\"x1\", \"x3\"]);","category":"page"},{"location":"#GPU-Acceleration","page":"Introduction","title":"GPU Acceleration","text":"","category":"section"},{"location":"","page":"Introduction","title":"Introduction","text":"If running on a CUDA enabled machine, training and inference on GPU can be triggered through the device kwarg: ","category":"page"},{"location":"","page":"Introduction","title":"Introduction","text":"m = fit_evotree(config, dtrain; target_name=\"y\", device=\"gpu\");\np = m(dtrain; device=\"gpu\")","category":"page"},{"location":"#Reproducibility","page":"Introduction","title":"Reproducibility","text":"","category":"section"},{"location":"","page":"Introduction","title":"Introduction","text":"EvoTrees models trained on cpu can be fully reproducible.","category":"page"},{"location":"","page":"Introduction","title":"Introduction","text":"Models of the gradient boosting family typically involve some stochasticity.  In EvoTrees, this primarily concern the the 2 subsampling parameters rowsample and colsample. The other stochastic operation happens at model initialisation when the features are binarized to allow for fast histogram construction: a random subsample of 1_000 * nbins is used to compute the breaking points. ","category":"page"},{"location":"","page":"Introduction","title":"Introduction","text":"These random parts of the algorithm can be deterministically reproduced on cpu by specifying an rng to the model constructor. rng can be an integer (ex: 123) or a random generator (ex: Random.Xoshiro(123)).  If no rng is specified, 123 is used by default. When an integer rng is used, a Random.MersenneTwister generator will be created by the EvoTrees's constructor. Otherwise, the provided random generator will be used.  ","category":"page"},{"location":"","page":"Introduction","title":"Introduction","text":"Consequently, the following m1 and m2 models will be identical:","category":"page"},{"location":"","page":"Introduction","title":"Introduction","text":"config = EvoTreeRegressor(rowsample=0.5, rng=123)\nm1 = fit_evotree(config, df; target_name=\"y\");\nconfig = EvoTreeRegressor(rowsample=0.5, rng=123)\nm2 = fit_evotree(config, df; target_name=\"y\");","category":"page"},{"location":"","page":"Introduction","title":"Introduction","text":"However, the following m1 and m2 models won't be because the there's stochasticity involved in the model from rowsample and the random generator in the config isn't reset between the fits:","category":"page"},{"location":"","page":"Introduction","title":"Introduction","text":"config = EvoTreeRegressor(rowsample=0.5, rng=123)\nm1 = fit_evotree(config, df; target_name=\"y\");\nm2 = fit_evotree(config, df; target_name=\"y\");","category":"page"},{"location":"","page":"Introduction","title":"Introduction","text":"Note that in presence of multiple identical or very highly correlated features, model may not be reproducible if features are permuted since in situation where 2 features provide identical gains, the first one will be selected. Therefore, if the identity relationship doesn't hold on new data, different predictions will be returned from models trained on different features order. ","category":"page"},{"location":"","page":"Introduction","title":"Introduction","text":"At the moment, there's no reproducibility guarantee on GPU, although this may change in the future. ","category":"page"},{"location":"#Save/Load","page":"Introduction","title":"Save/Load","text":"","category":"section"},{"location":"","page":"Introduction","title":"Introduction","text":"EvoTrees.save(m, \"data/model.bson\")\nm = EvoTrees.load(\"data/model.bson\");","category":"page"},{"location":"tutorials/examples-API/#Internal-API-examples","page":"Internal API","title":"Internal API examples","text":"","category":"section"},{"location":"tutorials/examples-API/","page":"Internal API","title":"Internal API","text":"The following provides minimal examples of usage of the various loss functions available in EvoTrees using the internal API.","category":"page"},{"location":"tutorials/examples-API/#Regression","page":"Internal API","title":"Regression","text":"","category":"section"},{"location":"tutorials/examples-API/","page":"Internal API","title":"Internal API","text":"Minimal example to fit a noisy sinus wave.","category":"page"},{"location":"tutorials/examples-API/","page":"Internal API","title":"Internal API","text":"(Image: )","category":"page"},{"location":"tutorials/examples-API/","page":"Internal API","title":"Internal API","text":"using EvoTrees\nusing EvoTrees: sigmoid, logit\nusing StatsBase: sample\n\n# prepare a dataset\nfeatures = rand(10000) .* 20 .- 10\nX = reshape(features, (size(features)[1], 1))\nY = sin.(features) .* 0.5 .+ 0.5\nY = logit(Y) + randn(size(Y))\nY = sigmoid(Y)\n𝑖 = collect(1:size(X, 1))\n\n# train-eval split\n𝑖_sample = sample(𝑖, size(𝑖, 1), replace = false)\ntrain_size = 0.8\n𝑖_train = 𝑖_sample[1:floor(Int, train_size * size(𝑖, 1))]\n𝑖_eval = 𝑖_sample[floor(Int, train_size * size(𝑖, 1))+1:end]\n\nx_train, x_eval = X[𝑖_train, :], X[𝑖_eval, :]\ny_train, y_eval = Y[𝑖_train], Y[𝑖_eval]\n\nconfig = EvoTreeRegressor(\n    loss=:mse,\n    nrounds=100, nbins = 100,\n    lambda = 0.5, gamma=0.1, eta=0.1,\n    max_depth=6, min_weight=1.0,\n    rowsample=0.5, colsample=1.0)\n\nmodel = fit_evotree(config; x_train, y_train, x_eval, y_eval, metric=:mse, print_every_n=25)\npred_eval_linear = model(x_eval)\n\n# logistic / cross-entropy\nconfig = EvoTreeRegressor(\n    loss=:logistic,\n    nrounds=100, nbins = 100,\n    lambda = 0.5, gamma=0.1, eta=0.1,\n    max_depth=6, min_weight=1.0,\n    rowsample=0.5, colsample=1.0)\n\nmodel = fit_evotree(config; x_train, y_train, x_eval, y_eval, metric=:logloss, print_every_n=25)\npred_eval_logistic = model(x_eval)\n\n# L1\nconfig = EvoTreeRegressor(\n    loss=:l1, alpha=0.5,\n    nrounds=100, nbins=100,\n    lambda = 0.5, gamma=0.0, eta=0.1,\n    max_depth=6, min_weight=1.0,\n    rowsample=0.5, colsample=1.0)\n\nmodel = fit_evotree(config; x_train, y_train, x_eval, y_eval, metric=:mae, print_every_n=25)\npred_eval_L1 = model(x_eval)","category":"page"},{"location":"tutorials/examples-API/#Poisson-Count","page":"Internal API","title":"Poisson Count","text":"","category":"section"},{"location":"tutorials/examples-API/","page":"Internal API","title":"Internal API","text":"# Poisson\nconfig = EvoTreeCount(\n    loss=:poisson,\n    nrounds=100, nbins=100,\n    lambda=0.5, gamma=0.1, eta=0.1,\n    max_depth=6, min_weight=1.0,\n    rowsample=0.5, colsample=1.0)\n\nmodel = fit_evotree(config; x_train, y_train, x_eval, y_eval, metric = :poisson, print_every_n = 25)\npred_eval_poisson = model(x_eval)","category":"page"},{"location":"tutorials/examples-API/#Quantile-Regression","page":"Internal API","title":"Quantile Regression","text":"","category":"section"},{"location":"tutorials/examples-API/","page":"Internal API","title":"Internal API","text":"(Image: )","category":"page"},{"location":"tutorials/examples-API/","page":"Internal API","title":"Internal API","text":"# q50\nconfig = EvoTreeRegressor(\n    loss=:quantile, alpha=0.5,\n    nrounds=200, nbins=100,\n    lambda=0.1, gamma=0.0, eta=0.05,\n    max_depth=6, min_weight=1.0,\n    rowsample=0.5, colsample=1.0)\n\nmodel = fit_evotree(config; x_train, y_train, x_eval, y_eval, metric = :quantile, print_every_n = 25)\npred_train_q50 = model(x_train)\n\n# q20\nconfig = EvoTreeRegressor(\n    loss=:quantile, alpha=0.2,\n    nrounds=200, nbins=100,\n    lambda=0.1, gamma=0.0, eta=0.05,\n    max_depth=6, min_weight=1.0,\n    rowsample=0.5, colsample=1.0)\n\nmodel = fit_evotree(config; x_train, y_train, x_eval, y_eval, metric = :quantile, print_every_n = 25)\npred_train_q20 = model(x_train)\n\n# q80\nconfig = EvoTreeRegressor(\n    loss=:quantile, alpha=0.8,\n    nrounds=200, nbins=100,\n    lambda=0.1, gamma=0.0, eta=0.05,\n    max_depth=6, min_weight=1.0,\n    rowsample=0.5, colsample=1.0)\n\nmodel = fit_evotree(config; x_train, y_train, x_eval, y_eval, metric = :quantile, print_every_n = 25)\npred_train_q80 = model(x_train)","category":"page"},{"location":"tutorials/examples-API/#Gaussian-Max-Likelihood","page":"Internal API","title":"Gaussian Max Likelihood","text":"","category":"section"},{"location":"tutorials/examples-API/","page":"Internal API","title":"Internal API","text":"(Image: )","category":"page"},{"location":"tutorials/examples-API/","page":"Internal API","title":"Internal API","text":"config = EvoTreeMLE(\n    loss=:gaussian_mle,\n    nrounds=100, nbins=100,\n    lambda=0.0, gamma=0.0, eta=0.1,\n    max_depth=6, rowsample=0.5)","category":"page"}]
+[{"location":"tutorials/ranking-LTRC/#Ranking-with-Yahoo!-Learning-to-Rank-Challenge.","page":"Ranking - Yahoo! LTRC","title":"Ranking with Yahoo! Learning to Rank Challenge.","text":"","category":"section"},{"location":"tutorials/ranking-LTRC/","page":"Ranking - Yahoo! LTRC","title":"Ranking - Yahoo! LTRC","text":"In this tutorial, we present how a ranking task can be tackled using regular regression techniques without compromising performance compared to specialized ranking learners. The data used is from the C14 - Yahoo! Learning to Rank Challenge, which can be obtained following a request to https://webscope.sandbox.yahoo.com.","category":"page"},{"location":"tutorials/ranking-LTRC/#Getting-started","page":"Ranking - Yahoo! LTRC","title":"Getting started","text":"","category":"section"},{"location":"tutorials/ranking-LTRC/","page":"Ranking - Yahoo! LTRC","title":"Ranking - Yahoo! LTRC","text":"To begin, we load the required packages:","category":"page"},{"location":"tutorials/ranking-LTRC/","page":"Ranking - Yahoo! LTRC","title":"Ranking - Yahoo! LTRC","text":"using EvoTrees\nusing DataFrames\nusing Statistics: mean\nusing CategoricalArrays\nusing Random","category":"page"},{"location":"tutorials/ranking-LTRC/#Load-LIBSVM-format-data","page":"Ranking - Yahoo! LTRC","title":"Load LIBSVM format data","text":"","category":"section"},{"location":"tutorials/ranking-LTRC/","page":"Ranking - Yahoo! LTRC","title":"Ranking - Yahoo! LTRC","text":"Some datasets come in the so called LIBSVM format, which stores data using a sparse representation: ","category":"page"},{"location":"tutorials/ranking-LTRC/","page":"Ranking - Yahoo! LTRC","title":"Ranking - Yahoo! LTRC","text":"<label> <query> <feature_id_1>:<feature_value_1> <feature_id_2>:<feature_value_2>","category":"page"},{"location":"tutorials/ranking-LTRC/","page":"Ranking - Yahoo! LTRC","title":"Ranking - Yahoo! LTRC","text":"We use the ReadLIBSVM.jl package to perform parsing: ","category":"page"},{"location":"tutorials/ranking-LTRC/","page":"Ranking - Yahoo! LTRC","title":"Ranking - Yahoo! LTRC","text":"using ReadLIBSVM\ndtrain = read_libsvm(\"set1.train.txt\"; has_query=true)\ndeval = read_libsvm(\"set1.valid.txt\"; has_query=true)\ndtest = read_libsvm(\"set1.test.txt\"; has_query=true)","category":"page"},{"location":"tutorials/ranking-LTRC/#Preprocessing","page":"Ranking - Yahoo! LTRC","title":"Preprocessing","text":"","category":"section"},{"location":"tutorials/ranking-LTRC/","page":"Ranking - Yahoo! LTRC","title":"Ranking - Yahoo! LTRC","text":"Preprocessing is minimal since all features are parsed as floats and specific files are provided for each of the train, eval and test splits. ","category":"page"},{"location":"tutorials/ranking-LTRC/","page":"Ranking - Yahoo! LTRC","title":"Ranking - Yahoo! LTRC","text":"Several features are fully missing (contain only 0s) in the training dataset. They are removed from all datasets since they cannot bring value to the model.","category":"page"},{"location":"tutorials/ranking-LTRC/","page":"Ranking - Yahoo! LTRC","title":"Ranking - Yahoo! LTRC","text":"Then, the features, targets and query ids are extracted from the parsed LIBSVM format. ","category":"page"},{"location":"tutorials/ranking-LTRC/","page":"Ranking - Yahoo! LTRC","title":"Ranking - Yahoo! LTRC","text":"colsums_train = map(sum, eachcol(dtrain[:x]))\ndrop_cols = colsums_train .== 0\n\nx_train = dtrain[:x][:, .!drop_cols]\nx_eval = deval[:x][:, .!drop_cols]\nx_test = dtest[:x][:, .!drop_cols]\n\n# assign queries\nq_train = dtrain[:q]\nq_eval = deval[:q]\nq_test = dtest[:q]\n\n# assign targets\ny_train = dtrain[:y]\ny_eval = deval[:y]\ny_test = dtest[:y]","category":"page"},{"location":"tutorials/ranking-LTRC/#Training","page":"Ranking - Yahoo! LTRC","title":"Training","text":"","category":"section"},{"location":"tutorials/ranking-LTRC/","page":"Ranking - Yahoo! LTRC","title":"Ranking - Yahoo! LTRC","text":"Now we are ready to train our model. We first define a model configuration using the EvoTreeRegressor model constructor.  Then, we use fit_evotree to train a boosted tree model. The optional x_eval and y_eval arguments are provided to enable the usage of early stopping. ","category":"page"},{"location":"tutorials/ranking-LTRC/","page":"Ranking - Yahoo! LTRC","title":"Ranking - Yahoo! LTRC","text":"config = EvoTreeRegressor(\n    nrounds=6000,\n    loss=:mse,\n    eta=0.02,\n    nbins=64,\n    max_depth=11,\n    rowsample=0.9,\n    colsample=0.9,\n)\n\nm_mse, logger_mse = fit_evotree(\n    config;\n    x_train=x_train,\n    y_train=y_train,\n    x_eval=x_eval,\n    y_eval=y_eval,\n    early_stopping_rounds=200,\n    print_every_n=50,\n    metric=:mse,\n    return_logger=true\n);\n\np_test = m_mse(x_test);","category":"page"},{"location":"tutorials/ranking-LTRC/#Model-evaluation","page":"Ranking - Yahoo! LTRC","title":"Model evaluation","text":"","category":"section"},{"location":"tutorials/ranking-LTRC/","page":"Ranking - Yahoo! LTRC","title":"Ranking - Yahoo! LTRC","text":"For ranking problems, a commonly used metric is the Normalized Discounted Cumulative Gain. It essentially considers whether the model is good at identifying the top K outcomes within a group. There are various flavors to its implementation, though the most commonly used one is the following:","category":"page"},{"location":"tutorials/ranking-LTRC/","page":"Ranking - Yahoo! LTRC","title":"Ranking - Yahoo! LTRC","text":"function ndcg(p, y, k=10)\n    k = min(k, length(p))\n    p_order = partialsortperm(p, 1:k, rev=true)\n    y_order = partialsortperm(y, 1:k, rev=true)\n    _y = y[p_order]\n    gains = 2 .^ _y .- 1\n    discounts = log2.((1:k) .+ 1)\n    ndcg = sum(gains ./ discounts)\n\n    y_order = partialsortperm(y, 1:k, rev=true)\n    _y = y[y_order]\n    gains = 2 .^ _y .- 1\n    discounts = log2.((1:k) .+ 1)\n    idcg = sum(gains ./ discounts)\n    return idcg == 0 ? 1.0 : ndcg / idcg\nend","category":"page"},{"location":"tutorials/ranking-LTRC/","page":"Ranking - Yahoo! LTRC","title":"Ranking - Yahoo! LTRC","text":"To compute the NDCG over a collection of groups, it is handy to leverage DataFrames' combine and groupby functionalities: ","category":"page"},{"location":"tutorials/ranking-LTRC/","page":"Ranking - Yahoo! LTRC","title":"Ranking - Yahoo! LTRC","text":"test_df = DataFrame(p=p_test, y=y_test, q=q_test)\ntest_df_agg = combine(groupby(test_df, \"q\"), [\"p\", \"y\"] => ndcg => \"ndcg\")\nndcg_test = round(mean(test_df_agg.ndcg), sigdigits=5)\n@info \"ndcg_test MSE\" ndcg_test\n\n┌ Info: ndcg_test MSE\n└   ndcg_test = 0.8008","category":"page"},{"location":"tutorials/ranking-LTRC/#Logistic-regression-alternative","page":"Ranking - Yahoo! LTRC","title":"Logistic regression alternative","text":"","category":"section"},{"location":"tutorials/ranking-LTRC/","page":"Ranking - Yahoo! LTRC","title":"Ranking - Yahoo! LTRC","text":"The above regression experiment shows a performance competitive with the results outlined in CatBoost's ranking benchmarks. ","category":"page"},{"location":"tutorials/ranking-LTRC/","page":"Ranking - Yahoo! LTRC","title":"Ranking - Yahoo! LTRC","text":"Another approach is to use a scaling of the the target ranking scores to perform a logistic regression.","category":"page"},{"location":"tutorials/ranking-LTRC/","page":"Ranking - Yahoo! LTRC","title":"Ranking - Yahoo! LTRC","text":"max_rank = 4\ny_train = dtrain[:y] ./ max_rank\ny_eval = deval[:y] ./ max_rank\ny_test = dtest[:y] ./ max_rank\n\nconfig = EvoTreeRegressor(\n    nrounds=6000,\n    loss=:logloss,\n    eta=0.01,\n    nbins=64,\n    max_depth=11,\n    rowsample=0.9,\n    colsample=0.9,\n)\n\nm_logloss, logger_logloss = fit_evotree(\n    config;\n    x_train=x_train,\n    y_train=y_train,\n    x_eval=x_eval,\n    y_eval=y_eval,\n    early_stopping_rounds=200,\n    print_every_n=50,\n    metric=:logloss,\n    return_logger=true\n);","category":"page"},{"location":"tutorials/ranking-LTRC/","page":"Ranking - Yahoo! LTRC","title":"Ranking - Yahoo! LTRC","text":"To measure the NDCG, the original targets must be used since NDCG is a scale sensitive measure.","category":"page"},{"location":"tutorials/ranking-LTRC/","page":"Ranking - Yahoo! LTRC","title":"Ranking - Yahoo! LTRC","text":"y_train = dtrain[:y]\ny_eval = deval[:y]\ny_test = dtest[:y]\n\np_test = m_logloss(x_test);\ntest_df = DataFrame(p=p_test, y=y_test, q=q_test)\ntest_df_agg = combine(groupby(test_df, \"q\"), [\"p\", \"y\"] => ndcg => \"ndcg\")\nndcg_test = round(mean(test_df_agg.ndcg), sigdigits=5)\n@info \"ndcg_test LogLoss\" ndcg_test\n\n┌ Info: ndcg_test LogLoss\n└   ndcg_test = 0.80267","category":"page"},{"location":"tutorials/ranking-LTRC/#Conclusion","page":"Ranking - Yahoo! LTRC","title":"Conclusion","text":"","category":"section"},{"location":"tutorials/ranking-LTRC/","page":"Ranking - Yahoo! LTRC","title":"Ranking - Yahoo! LTRC","text":"We've seen that a ranking problem can be efficiently handled with generic regression tasks, yet achieve comparable performance to specialized ranking loss functions. Below, we present the NDCG obtained from the above experiments along those published on CatBoost's benchmarks.","category":"page"},{"location":"tutorials/ranking-LTRC/","page":"Ranking - Yahoo! LTRC","title":"Ranking - Yahoo! LTRC","text":"Model NDCG\nEvoTrees - mse 0.80080\nEvoTrees - logistic 0.80267\ncat-rmse 0.802115\ncat-query-rmse 0.802229\ncat-pair-logit 0.797318\ncat-pair-logit-pairwise 0.790396\ncat-yeti-rank 0.802972\nxgb-rmse 0.798892\nxgb-pairwise 0.800048\nxgb-lambdamart-ndcg 0.800048\nlgb-rmse 0.8013675\nlgb-pairwise 0.801347","category":"page"},{"location":"tutorials/ranking-LTRC/","page":"Ranking - Yahoo! LTRC","title":"Ranking - Yahoo! LTRC","text":"It should be noted that the later results were not reproduced in the scope of current tutorial, so one should be careful about any claim of model superiority. The results from CatBoost's benchmarks were however already indicative of strong performance of non-specialized ranking loss functions, to which this tutorial brings further support. ","category":"page"},{"location":"api/#fit_evotree","page":"API","title":"fit_evotree","text":"","category":"section"},{"location":"api/","page":"API","title":"API","text":"fit_evotree","category":"page"},{"location":"api/#EvoTrees.fit_evotree","page":"API","title":"EvoTrees.fit_evotree","text":"fit_evotree(\n    params::EvoTypes{L}, \n    dtrain;\n    target_name,\n    fnames=nothing,\n    w_name=nothing,\n    offset_name=nothing,\n    deval=nothing,\n    metric=nothing,\n    early_stopping_rounds=9999,\n    print_every_n=9999,\n    verbosity=1,\n    return_logger=false,\n    device=\"cpu\")\n\nMain training function. Performs model fitting given configuration params, dtrain, target_name and other optional kwargs. \n\nArguments\n\nparams::EvoTypes: configuration info providing hyper-paramters. EvoTypes can be one of: \nEvoTreeRegressor\nEvoTreeClassifier\nEvoTreeCount\nEvoTreeMLE\ndtrain: A Tables compatible training data (named tuples, DataFrame...) containing features and target variables. \n\nKeyword arguments\n\ntarget_name: name of target variable. \nfnames = nothing: the names of the x_train features. If provided, should be a vector of string with length(fnames) = size(x_train, 2).\nw_name = nothing: name of the variable containing weights. If nothing, common weights on one will be used.\noffset_name = nothing: name of the offset variable.\ndeval: A Tables compatible evaluation data containing features and target variables. \nmetric: The evaluation metric that wil be tracked on deval.    Supported metrics are: \n:mse: mean-squared error. Adapted for general regression models.\n:rmse: root-mean-squared error (CPU only). Adapted for general regression models.\n:mae: mean absolute error. Adapted for general regression models.\n:logloss: Adapted for :logistic regression models.\n:mlogloss: Multi-class cross entropy. Adapted to EvoTreeClassifier classification models. \n:poisson: Poisson deviance. Adapted to EvoTreeCount count models.\n:gamma: Gamma deviance. Adapted to regression problem on Gamma like, positively distributed targets.\n:tweedie: Tweedie deviance. Adapted to regression problem on Tweedie like, positively distributed targets with probability mass at y == 0.\n:gaussian_mle: Gaussian maximum log-likelihood. Adapted to EvoTreeMLE models with loss = :gaussian_mle. \n:logistic_mle: Logistic maximum log-likelihood. Adapted to EvoTreeMLE models with loss = :logistic_mle. \nearly_stopping_rounds::Integer: number of consecutive rounds without metric improvement after which fitting in stopped. \nprint_every_n: sets at which frequency logging info should be printed. \nverbosity: set to 1 to print logging info during training.\nreturn_logger::Bool = false: if set to true (default), fit_evotree return a tuple (m, logger) where logger is a dict containing various tracking information.\ndevice=\"cpu\": Hardware device to use for computations. Can be either \"cpu\" or \"gpu\". Following losses are not GPU supported at the moment:l1, :quantile, :logistic_mle.\n\n\n\n\n\nfit_evotree(params::EvoTypes{L};\n    x_train::AbstractMatrix, y_train::AbstractVector, w_train=nothing, offset_train=nothing,\n    x_eval=nothing, y_eval=nothing, w_eval=nothing, offset_eval=nothing,\n    early_stopping_rounds=9999,\n    print_every_n=9999,\n    verbosity=1)\n\nMain training function. Performs model fitting given configuration params, x_train, y_train and other optional kwargs. \n\nArguments\n\nparams::EvoTypes: configuration info providing hyper-paramters. EvoTypes can be one of: \nEvoTreeRegressor\nEvoTreeClassifier\nEvoTreeCount\nEvoTreeMLE\n\nKeyword arguments\n\nx_train::Matrix: training data of size [#observations, #features]. \ny_train::Vector: vector of train targets of length #observations.\nw_train::Vector: vector of train weights of length #observations. If nothing, a vector of ones is assumed.\noffset_train::VecOrMat: offset for the training data. Should match the size of the predictions.\nx_eval::Matrix: evaluation data of size [#observations, #features]. \ny_eval::Vector: vector of evaluation targets of length #observations.\nw_eval::Vector: vector of evaluation weights of length #observations. Defaults to nothing (assumes a vector of 1s).\noffset_eval::VecOrMat: evaluation data offset. Should match the size of the predictions.\nmetric: The evaluation metric that wil be tracked on x_eval, y_eval and optionally w_eval / offset_eval data.    Supported metrics are: \n:mse: mean-squared error. Adapted for general regression models.\n:rmse: root-mean-squared error (CPU only). Adapted for general regression models.\n:mae: mean absolute error. Adapted for general regression models.\n:logloss: Adapted for :logistic regression models.\n:mlogloss: Multi-class cross entropy. Adapted to EvoTreeClassifier classification models. \n:poisson: Poisson deviance. Adapted to EvoTreeCount count models.\n:gamma: Gamma deviance. Adapted to regression problem on Gamma like, positively distributed targets.\n:tweedie: Tweedie deviance. Adapted to regression problem on Tweedie like, positively distributed targets with probability mass at y == 0.\n:gaussian_mle: Gaussian maximum log-likelihood. Adapted to EvoTreeMLE models with loss = :gaussian_mle. \n:logistic_mle: Logistic maximum log-likelihood. Adapted to EvoTreeMLE models with loss = :logistic_mle. \nearly_stopping_rounds::Integer: number of consecutive rounds without metric improvement after which fitting in stopped. \nprint_every_n: sets at which frequency logging info should be printed. \nverbosity: set to 1 to print logging info during training.\nfnames: the names of the x_train features. If provided, should be a vector of string with length(fnames) = size(x_train, 2).\nreturn_logger::Bool = false: if set to true (default), fit_evotree return a tuple (m, logger) where logger is a dict containing various tracking information.\ndevice=\"cpu\": Hardware device to use for computations. Can be either \"cpu\" or \"gpu\". Following losses are not GPU supported at the moment:l1, :quantile, :logistic_mle.\n\n\n\n\n\n","category":"function"},{"location":"api/#Predict","page":"API","title":"Predict","text":"","category":"section"},{"location":"api/","page":"API","title":"API","text":"EvoTrees.predict","category":"page"},{"location":"api/#MLJModelInterface.predict","page":"API","title":"MLJModelInterface.predict","text":"predict(model::EvoTree, X::AbstractMatrix; ntree_limit = length(model.trees))\n\nPredictions from an EvoTree model - sums the predictions from all trees composing the model. Use ntree_limit=N to only predict with the first N trees.\n\n\n\n\n\n","category":"function"},{"location":"api/#Features-Importance","page":"API","title":"Features Importance","text":"","category":"section"},{"location":"api/","page":"API","title":"API","text":"EvoTrees.importance","category":"page"},{"location":"api/#EvoTrees.importance","page":"API","title":"EvoTrees.importance","text":"importance(model::EvoTree; fnames=model.info[:fnames])\n\nSorted normalized feature importance based on loss function gain. Feature names associated to the model are stored in model.info[:fnames] as a string Vector and can be updated at any time. Eg: model.info[:fnames] = new_fnames_vec.\n\n\n\n\n\n","category":"function"},{"location":"tutorials/logistic-regression-titanic/#Logistic-Regression-on-Titanic-Dataset","page":"Logistic Regression - Titanic","title":"Logistic Regression on Titanic Dataset","text":"","category":"section"},{"location":"tutorials/logistic-regression-titanic/","page":"Logistic Regression - Titanic","title":"Logistic Regression - Titanic","text":"We will use the Titanic dataset, which is included in the MLDatasets package. It describes the survival status of individual passengers on the Titanic. The model will be approached as a logistic regression problem, although a Classifier model could also have been used (see the Classification - Iris tutorial). ","category":"page"},{"location":"tutorials/logistic-regression-titanic/#Getting-started","page":"Logistic Regression - Titanic","title":"Getting started","text":"","category":"section"},{"location":"tutorials/logistic-regression-titanic/","page":"Logistic Regression - Titanic","title":"Logistic Regression - Titanic","text":"To begin, we will load the required packages and the dataset:","category":"page"},{"location":"tutorials/logistic-regression-titanic/","page":"Logistic Regression - Titanic","title":"Logistic Regression - Titanic","text":"using EvoTrees\nusing MLDatasets\nusing DataFrames\nusing Statistics: mean\nusing CategoricalArrays\nusing Random\n\ndf = MLDatasets.Titanic().dataframe","category":"page"},{"location":"tutorials/logistic-regression-titanic/#Preprocessing","page":"Logistic Regression - Titanic","title":"Preprocessing","text":"","category":"section"},{"location":"tutorials/logistic-regression-titanic/","page":"Logistic Regression - Titanic","title":"Logistic Regression - Titanic","text":"A first step in data processing is to prepare the input features in a model compatible format. ","category":"page"},{"location":"tutorials/logistic-regression-titanic/","page":"Logistic Regression - Titanic","title":"Logistic Regression - Titanic","text":"EvoTrees' Tables API supports input that are either Real, Bool or Categorical. A recommended approach for String features such as Sex is to convert them into an unordered Categorical. ","category":"page"},{"location":"tutorials/logistic-regression-titanic/","page":"Logistic Regression - Titanic","title":"Logistic Regression - Titanic","text":"For dealing with features withh missing values such as Age, a common approach is to first create an Bool indicator variable capturing the info on whether a value is missing. Then, the missing values can be inputed (replaced by some default values such as mean or median, or more sophisticated approach such as predictions from another model).","category":"page"},{"location":"tutorials/logistic-regression-titanic/","page":"Logistic Regression - Titanic","title":"Logistic Regression - Titanic","text":"# convert string feature to Categorical\ntransform!(df, :Sex => categorical => :Sex)\n\n# treat string feature and missing values\ntransform!(df, :Age => ByRow(ismissing) => :Age_ismissing)\ntransform!(df, :Age => (x -> coalesce.(x, median(skipmissing(x)))) => :Age);\n\n# remove unneeded variables\ndf = df[:, Not([:PassengerId, :Name, :Embarked, :Cabin, :Ticket])]\n","category":"page"},{"location":"tutorials/logistic-regression-titanic/","page":"Logistic Regression - Titanic","title":"Logistic Regression - Titanic","text":"The full data can now be split according to train and eval indices.  Target and feature names are also set.","category":"page"},{"location":"tutorials/logistic-regression-titanic/","page":"Logistic Regression - Titanic","title":"Logistic Regression - Titanic","text":"Random.seed!(123)\n\ntrain_ratio = 0.8\ntrain_indices = randperm(nrow(df))[1:Int(round(train_ratio * nrow(df)))]\n\ndtrain = df[train_indices, :]\ndeval = df[setdiff(1:nrow(df), train_indices), :]\n\ntarget_name = \"Survived\"\nfnames = setdiff(names(df), [target_name])","category":"page"},{"location":"tutorials/logistic-regression-titanic/#Training","page":"Logistic Regression - Titanic","title":"Training","text":"","category":"section"},{"location":"tutorials/logistic-regression-titanic/","page":"Logistic Regression - Titanic","title":"Logistic Regression - Titanic","text":"Now we are ready to train our model. We will first define a model configuration using the EvoTreeRegressor model constructor.  Then, we'll use fit_evotree to train a boosted tree model. We'll pass optional deval arguments, which enables the tracking of an evaluation metric and early stopping. ","category":"page"},{"location":"tutorials/logistic-regression-titanic/","page":"Logistic Regression - Titanic","title":"Logistic Regression - Titanic","text":"config = EvoTreeRegressor(\n  loss=:logistic, \n  nrounds=200, \n  eta=0.05, \n  nbins=128, \n  max_depth=5, \n  rowsample=0.5, \n  colsample=0.9)\n\nmodel = fit_evotree(\n    config, dtrain; \n    deval,\n    target_name,\n    fnames,\n    metric = :logloss,\n    early_stopping_rounds=10,\n    print_every_n=10)","category":"page"},{"location":"tutorials/logistic-regression-titanic/#Diagnosis","page":"Logistic Regression - Titanic","title":"Diagnosis","text":"","category":"section"},{"location":"tutorials/logistic-regression-titanic/","page":"Logistic Regression - Titanic","title":"Logistic Regression - Titanic","text":"We can get predictions by passing training and testing data to our model. We can then evaluate the accuracy of our model, which should be around 85%. ","category":"page"},{"location":"tutorials/logistic-regression-titanic/","page":"Logistic Regression - Titanic","title":"Logistic Regression - Titanic","text":"pred_train = model(dtrain)\npred_eval = model(deval)","category":"page"},{"location":"tutorials/logistic-regression-titanic/","page":"Logistic Regression - Titanic","title":"Logistic Regression - Titanic","text":"julia> mean((pred_train .> 0.5) .== dtrain[!, target_name])\n0.8821879382889201\n\njulia> mean((pred_eval .> 0.5) .== deval[!, target_name])\n0.8426966292134831","category":"page"},{"location":"tutorials/logistic-regression-titanic/","page":"Logistic Regression - Titanic","title":"Logistic Regression - Titanic","text":"Finally, features importance can be inspected using EvoTrees.importance.","category":"page"},{"location":"tutorials/logistic-regression-titanic/","page":"Logistic Regression - Titanic","title":"Logistic Regression - Titanic","text":"julia> EvoTrees.importance(model)\n7-element Vector{Pair{String, Float64}}:\n           \"Sex\" => 0.29612654189959403\n           \"Age\" => 0.25487324307720827\n          \"Fare\" => 0.2530947969323613\n        \"Pclass\" => 0.11354283043193575\n         \"SibSp\" => 0.05129209383816148\n         \"Parch\" => 0.017385183317069588\n \"Age_ismissing\" => 0.013685310503669728","category":"page"},{"location":"tutorials/regression-boston/#Regression-on-Boston-Housing-Dataset","page":"Regression - Boston","title":"Regression on Boston Housing Dataset","text":"","category":"section"},{"location":"tutorials/regression-boston/","page":"Regression - Boston","title":"Regression - Boston","text":"We will use the Boston Housing dataset, which is included in the MLDatasets package. It's derived from information collected by the U.S. Census Service concerning housing in the area of Boston. Target variable represents the median housing value.","category":"page"},{"location":"tutorials/regression-boston/#Getting-started","page":"Regression - Boston","title":"Getting started","text":"","category":"section"},{"location":"tutorials/regression-boston/","page":"Regression - Boston","title":"Regression - Boston","text":"To begin, we will load the required packages and the dataset:","category":"page"},{"location":"tutorials/regression-boston/","page":"Regression - Boston","title":"Regression - Boston","text":"using EvoTrees\nusing MLDatasets\nusing DataFrames\nusing Statistics: mean\nusing CategoricalArrays\nusing Random\n\ndf = MLDatasets.BostonHousing().dataframe","category":"page"},{"location":"tutorials/regression-boston/#Preprocessing","page":"Regression - Boston","title":"Preprocessing","text":"","category":"section"},{"location":"tutorials/regression-boston/","page":"Regression - Boston","title":"Regression - Boston","text":"Before we can train our model, we need to preprocess the dataset. We will split our data according to train and eval indices, and separate features from the target variable.","category":"page"},{"location":"tutorials/regression-boston/","page":"Regression - Boston","title":"Regression - Boston","text":"Random.seed!(123)\n\ntrain_ratio = 0.8\ntrain_indices = randperm(nrow(df))[1:Int(round(train_ratio * nrow(df)))]\n\ntrain_data = df[train_indices, :]\neval_data = df[setdiff(1:nrow(df), train_indices), :]\n\nx_train, y_train = Matrix(train_data[:, Not(:MEDV)]), train_data[:, :MEDV]\nx_eval, y_eval = Matrix(eval_data[:, Not(:MEDV)]), eval_data[:, :MEDV]","category":"page"},{"location":"tutorials/regression-boston/#Training","page":"Regression - Boston","title":"Training","text":"","category":"section"},{"location":"tutorials/regression-boston/","page":"Regression - Boston","title":"Regression - Boston","text":"Now we are ready to train our model. We will first define a model configuration using the EvoTreeRegressor model constructor.  Then, we'll use fit_evotree to train a boosted tree model. We'll pass optional x_eval and y_eval arguments, which enable the usage of early stopping. ","category":"page"},{"location":"tutorials/regression-boston/","page":"Regression - Boston","title":"Regression - Boston","text":"config = EvoTreeRegressor(\n    nrounds=200, \n    eta=0.1, \n    max_depth=4, \n    lambda=0.1, \n    rowsample=0.9, \n    colsample=0.9)\n\nmodel = fit_evotree(config;\n    x_train, y_train,\n    x_eval, y_eval,\n    metric = :mse,\n    early_stopping_rounds=10,\n    print_every_n=10)","category":"page"},{"location":"tutorials/regression-boston/","page":"Regression - Boston","title":"Regression - Boston","text":"Finally, we can get predictions by passing training and testing data to our model. We can then apply various evaluation metric, such as the MAE (mean absolute error):  ","category":"page"},{"location":"tutorials/regression-boston/","page":"Regression - Boston","title":"Regression - Boston","text":"pred_train = model(x_train)\npred_eval = model(x_eval)","category":"page"},{"location":"tutorials/regression-boston/","page":"Regression - Boston","title":"Regression - Boston","text":"julia> mean(abs.(pred_train .- y_train))\n1.056997874224627\n\njulia> mean(abs.(pred_eval .- y_eval))\n2.3298767665825264","category":"page"},{"location":"models/#Models","page":"Models","title":"Models","text":"","category":"section"},{"location":"models/#EvoTreeRegressor","page":"Models","title":"EvoTreeRegressor","text":"","category":"section"},{"location":"models/","page":"Models","title":"Models","text":"EvoTreeRegressor","category":"page"},{"location":"models/#EvoTrees.EvoTreeRegressor","page":"Models","title":"EvoTrees.EvoTreeRegressor","text":"EvoTreeRegressor(;kwargs...)\n\nA model type for constructing a EvoTreeRegressor, based on EvoTrees.jl, and implementing both an internal API and the MLJ model interface.\n\nHyper-parameters\n\nloss=:mse:         Loss to be be minimized during training. One of:\n:mse\n:logloss\n:gamma\n:tweedie\n:quantile\n:l1\nnrounds=10:           Number of rounds. It corresponds to the number of trees that will be sequentially stacked. Must be >= 1.\neta=0.1:              Learning rate. Each tree raw predictions are scaled by eta prior to be added to the stack of predictions. Must be > 0. A lower eta results in slower learning, requiring a higher nrounds but typically improves model performance.   \nlambda::T=0.0:        L2 regularization term on weights. Must be >= 0. Higher lambda can result in a more robust model.\ngamma::T=0.0:         Minimum gain improvement needed to perform a node split. Higher gamma can result in a more robust model. Must be >= 0.\nalpha::T=0.5:         Loss specific parameter in the [0, 1] range:                           - :quantile: target quantile for the regression.                           - :l1: weighting parameters to positive vs negative residuals.                                 - Positive residual weights = alpha                                 - Negative residual weights = (1 - alpha)\nmax_depth=5:          Maximum depth of a tree. Must be >= 1. A tree of depth 1 is made of a single prediction leaf. A complete tree of depth N contains 2^(N - 1) terminal leaves and 2^(N - 1) - 1 split nodes. Compute cost is proportional to 2^max_depth. Typical optimal values are in the 3 to 9 range.\nmin_weight=1.0:       Minimum weight needed in a node to perform a split. Matches the number of observations by default or the sum of weights as provided by the weights vector. Must be > 0.\nrowsample=1.0:        Proportion of rows that are sampled at each iteration to build the tree. Should be in ]0, 1].\ncolsample=1.0:        Proportion of columns / features that are sampled at each iteration to build the tree. Should be in ]0, 1].\nnbins=32:             Number of bins into which each feature is quantized. Buckets are defined based on quantiles, hence resulting in equal weight bins. Should be between 2 and 255.\nmonotone_constraints=Dict{Int, Int}(): Specify monotonic constraints using a dict where the key is the feature index and the value the applicable constraint (-1=decreasing, 0=none, 1=increasing).  Only :linear, :logistic, :gamma and tweedie losses are supported at the moment.\ntree_type=\"binary\"    Tree structure to be used. One of:\nbinary:       Each node of a tree is grown independently. Tree are built depthwise until max depth is reach or if min weight or gain (see gamma) stops further node splits.  \noblivious:    A common splitting condition is imposed to all nodes of a given depth. \nrng=123:              Either an integer used as a seed to the random number generator or an actual random number generator (::Random.AbstractRNG).\n\nInternal API\n\nDo config = EvoTreeRegressor() to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in EvoTreeRegressor(loss=...).\n\nTraining model\n\nA model is built using fit_evotree:\n\nmodel = fit_evotree(config; x_train, y_train, kwargs...)\n\nInference\n\nPredictions are obtained using predict which returns a Matrix of size [nobs, 1]:\n\nEvoTrees.predict(model, X)\n\nAlternatively, models act as a functor, returning predictions when called as a function with features as argument:\n\nmodel(X)\n\nMLJ Interface\n\nFrom MLJ, the type can be imported using:\n\nEvoTreeRegressor = @load EvoTreeRegressor pkg=EvoTrees\n\nDo model = EvoTreeRegressor() to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in EvoTreeRegressor(loss=...).\n\nTraining model\n\nIn MLJ or MLJBase, bind an instance model to data with     mach = machine(model, X, y) where\n\nX: any table of input features (eg, a DataFrame) whose columns each have one of the following element scitypes: Continuous, Count, or <:OrderedFactor; check column scitypes with schema(X)\ny: is the target, which can be any AbstractVector whose element scitype is <:Continuous; check the scitype with scitype(y)\n\nTrain the machine using fit!(mach, rows=...).\n\nOperations\n\npredict(mach, Xnew): return predictions of the target given features Xnew having the same scitype as X above. Predictions are deterministic.\n\nFitted parameters\n\nThe fields of fitted_params(mach) are:\n\n:fitresult: The GBTree object returned by EvoTrees.jl fitting algorithm.\n\nReport\n\nThe fields of report(mach) are:\n\n:features: The names of the features encountered in training.\n\nExamples\n\n# Internal API\nusing EvoTrees\nconfig = EvoTreeRegressor(max_depth=5, nbins=32, nrounds=100)\nnobs, nfeats = 1_000, 5\nx_train, y_train = randn(nobs, nfeats), rand(nobs)\nmodel = fit_evotree(config; x_train, y_train)\npreds = EvoTrees.predict(model, x_train)\n\n# MLJ Interface\nusing MLJ\nEvoTreeRegressor = @load EvoTreeRegressor pkg=EvoTrees\nmodel = EvoTreeRegressor(max_depth=5, nbins=32, nrounds=100)\nX, y = @load_boston\nmach = machine(model, X, y) |> fit!\npreds = predict(mach, X)\n\n\n\n\n\n","category":"type"},{"location":"models/#EvoTreeClassifier","page":"Models","title":"EvoTreeClassifier","text":"","category":"section"},{"location":"models/","page":"Models","title":"Models","text":"EvoTreeClassifier","category":"page"},{"location":"models/#EvoTrees.EvoTreeClassifier","page":"Models","title":"EvoTrees.EvoTreeClassifier","text":"EvoTreeClassifier(;kwargs...)\n\nA model type for constructing a EvoTreeClassifier, based on EvoTrees.jl, and implementing both an internal API and the MLJ model interface. EvoTreeClassifier is used to perform multi-class classification, using cross-entropy loss.\n\nHyper-parameters\n\nnrounds=10:                 Number of rounds. It corresponds to the number of trees that will be sequentially stacked. Must be >= 1.\neta=0.1:              Learning rate. Each tree raw predictions are scaled by eta prior to be added to the stack of predictions. Must be > 0. A lower eta results in slower learning, requiring a higher nrounds but typically improves model performance.  \nlambda::T=0.0:              L2 regularization term on weights. Must be >= 0. Higher lambda can result in a more robust model.\ngamma::T=0.0:               Minimum gain improvement needed to perform a node split. Higher gamma can result in a more robust model. Must be >= 0.\nmax_depth=5:                Maximum depth of a tree. Must be >= 1. A tree of depth 1 is made of a single prediction leaf. A complete tree of depth N contains 2^(N - 1) terminal leaves and 2^(N - 1) - 1 split nodes. Compute cost is proportional to 2^max_depth. Typical optimal values are in the 3 to 9 range.\nmin_weight=1.0:             Minimum weight needed in a node to perform a split. Matches the number of observations by default or the sum of weights as provided by the weights vector. Must be > 0.\nrowsample=1.0:              Proportion of rows that are sampled at each iteration to build the tree. Should be in ]0, 1].\ncolsample=1.0:              Proportion of columns / features that are sampled at each iteration to build the tree. Should be in ]0, 1].\nnbins=32:                   Number of bins into which each feature is quantized. Buckets are defined based on quantiles, hence resulting in equal weight bins. Should be between 2 and 255.\ntree_type=\"binary\"    Tree structure to be used. One of:\nbinary:       Each node of a tree is grown independently. Tree are built depthwise until max depth is reach or if min weight or gain (see gamma) stops further node splits.  \noblivious:    A common splitting condition is imposed to all nodes of a given depth. \nrng=123:                    Either an integer used as a seed to the random number generator or an actual random number generator (::Random.AbstractRNG).\n\nInternal API\n\nDo config = EvoTreeClassifier() to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in EvoTreeClassifier(max_depth=...).\n\nTraining model\n\nA model is built using fit_evotree:\n\nmodel = fit_evotree(config; x_train, y_train, kwargs...)\n\nInference\n\nPredictions are obtained using predict which returns a Matrix of size [nobs, K] where K is the number of classes:\n\nEvoTrees.predict(model, X)\n\nAlternatively, models act as a functor, returning predictions when called as a function with features as argument:\n\nmodel(X)\n\nMLJ\n\nFrom MLJ, the type can be imported using:\n\nEvoTreeClassifier = @load EvoTreeClassifier pkg=EvoTrees\n\nDo model = EvoTreeClassifier() to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in EvoTreeClassifier(loss=...).\n\nTraining data\n\nIn MLJ or MLJBase, bind an instance model to data with\n\nmach = machine(model, X, y)\n\nwhere\n\nX: any table of input features (eg, a DataFrame) whose columns each have one of the following element scitypes: Continuous, Count, or <:OrderedFactor; check column scitypes with schema(X)\ny: is the target, which can be any AbstractVector whose element scitype is <:Multiclas or <:OrderedFactor; check the scitype with scitype(y)\n\nTrain the machine using fit!(mach, rows=...).\n\nOperations\n\npredict(mach, Xnew): return predictions of the target given features Xnew having the same scitype as X above. Predictions are probabilistic.\npredict_mode(mach, Xnew): returns the mode of each of the prediction above.\n\nFitted parameters\n\nThe fields of fitted_params(mach) are:\n\n:fitresult: The GBTree object returned by EvoTrees.jl fitting algorithm.\n\nReport\n\nThe fields of report(mach) are:\n\n:features: The names of the features encountered in training.\n\nExamples\n\n# Internal API\nusing EvoTrees\nconfig = EvoTreeClassifier(max_depth=5, nbins=32, nrounds=100)\nnobs, nfeats = 1_000, 5\nx_train, y_train = randn(nobs, nfeats), rand(1:3, nobs)\nmodel = fit_evotree(config; x_train, y_train)\npreds = EvoTrees.predict(model, x_train)\n\n# MLJ Interface\nusing MLJ\nEvoTreeClassifier = @load EvoTreeClassifier pkg=EvoTrees\nmodel = EvoTreeClassifier(max_depth=5, nbins=32, nrounds=100)\nX, y = @load_iris\nmach = machine(model, X, y) |> fit!\npreds = predict(mach, X)\npreds = predict_mode(mach, X)\n\nSee also EvoTrees.jl.\n\n\n\n\n\n","category":"type"},{"location":"models/#EvoTreeCount","page":"Models","title":"EvoTreeCount","text":"","category":"section"},{"location":"models/","page":"Models","title":"Models","text":"EvoTreeCount","category":"page"},{"location":"models/#EvoTrees.EvoTreeCount","page":"Models","title":"EvoTrees.EvoTreeCount","text":"EvoTreeCount(;kwargs...)\n\nA model type for constructing a EvoTreeCount, based on EvoTrees.jl, and implementing both an internal API the MLJ model interface. EvoTreeCount is used to perform Poisson probabilistic regression on count target.\n\nHyper-parameters\n\nnrounds=10:                 Number of rounds. It corresponds to the number of trees that will be sequentially stacked. Must be >= 1.\neta=0.1:              Learning rate. Each tree raw predictions are scaled by eta prior to be added to the stack of predictions. Must be > 0. A lower eta results in slower learning, requiring a higher nrounds but typically improves model performance.  \nlambda::T=0.0:              L2 regularization term on weights. Must be >= 0. Higher lambda can result in a more robust model. Must be >= 0.\ngamma::T=0.0:               Minimum gain imprvement needed to perform a node split. Higher gamma can result in a more robust model.\nmax_depth=5:                Maximum depth of a tree. Must be >= 1. A tree of depth 1 is made of a single prediction leaf. A complete tree of depth N contains 2^(N - 1) terminal leaves and 2^(N - 1) - 1 split nodes. Compute cost is proportional to 2^max_depth. Typical optimal values are in the 3 to 9 range.\nmin_weight=1.0:             Minimum weight needed in a node to perform a split. Matches the number of observations by default or the sum of weights as provided by the weights vector. Must be > 0.\nrowsample=1.0:              Proportion of rows that are sampled at each iteration to build the tree. Should be ]0, 1].\ncolsample=1.0:              Proportion of columns / features that are sampled at each iteration to build the tree. Should be ]0, 1].\nnbins=32:                   Number of bins into which each feature is quantized. Buckets are defined based on quantiles, hence resulting in equal weight bins. Should be between 2 and 255.\nmonotone_constraints=Dict{Int, Int}(): Specify monotonic constraints using a dict where the key is the feature index and the value the applicable constraint (-1=decreasing, 0=none, 1=increasing).\ntree_type=\"binary\"    Tree structure to be used. One of:\nbinary:       Each node of a tree is grown independently. Tree are built depthwise until max depth is reach or if min weight or gain (see gamma) stops further node splits.  \noblivious:    A common splitting condition is imposed to all nodes of a given depth. \nrng=123:                    Either an integer used as a seed to the random number generator or an actual random number generator (::Random.AbstractRNG).\n\nInternal API\n\nDo config = EvoTreeCount() to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in EvoTreeCount(max_depth=...).\n\nTraining model\n\nA model is built using fit_evotree:\n\nmodel = fit_evotree(config; x_train, y_train, kwargs...)\n\nInference\n\nPredictions are obtained using predict which returns a Matrix of size [nobs, 1]:\n\nEvoTrees.predict(model, X)\n\nAlternatively, models act as a functor, returning predictions when called as a function with features as argument:\n\nmodel(X)\n\nMLJ\n\nFrom MLJ, the type can be imported using:\n\nEvoTreeCount = @load EvoTreeCount pkg=EvoTrees\n\nDo model = EvoTreeCount() to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in EvoTreeCount(loss=...).\n\nTraining data\n\nIn MLJ or MLJBase, bind an instance model to data with     mach = machine(model, X, y) where\n\nX: any table of input features (eg, a DataFrame) whose columns each have one of the following element scitypes: Continuous, Count, or <:OrderedFactor; check column scitypes with schema(X)\ny: is the target, which can be any AbstractVector whose element scitype is <:Count; check the scitype with scitype(y)\n\nTrain the machine using fit!(mach, rows=...).\n\nOperations\n\npredict(mach, Xnew): returns a vector of Poisson distributions given features Xnew having the same scitype as X above. Predictions are probabilistic.\n\nSpecific metrics can also be predicted using:\n\npredict_mean(mach, Xnew)\npredict_mode(mach, Xnew)\npredict_median(mach, Xnew)\n\nFitted parameters\n\nThe fields of fitted_params(mach) are:\n\n:fitresult: The GBTree object returned by EvoTrees.jl fitting algorithm.\n\nReport\n\nThe fields of report(mach) are:\n\n:features: The names of the features encountered in training.\n\nExamples\n\n# Internal API\nusing EvoTrees\nconfig = EvoTreeCount(max_depth=5, nbins=32, nrounds=100)\nnobs, nfeats = 1_000, 5\nx_train, y_train = randn(nobs, nfeats), rand(0:2, nobs)\nmodel = fit_evotree(config; x_train, y_train)\npreds = EvoTrees.predict(model, x_train)\n\nusing MLJ\nEvoTreeCount = @load EvoTreeCount pkg=EvoTrees\nmodel = EvoTreeCount(max_depth=5, nbins=32, nrounds=100)\nnobs, nfeats = 1_000, 5\nX, y = randn(nobs, nfeats), rand(0:2, nobs)\nmach = machine(model, X, y) |> fit!\npreds = predict(mach, X)\npreds = predict_mean(mach, X)\npreds = predict_mode(mach, X)\npreds = predict_median(mach, X)\n\n\nSee also EvoTrees.jl.\n\n\n\n\n\n","category":"type"},{"location":"models/#EvoTreeMLE","page":"Models","title":"EvoTreeMLE","text":"","category":"section"},{"location":"models/","page":"Models","title":"Models","text":"EvoTreeMLE","category":"page"},{"location":"models/#EvoTrees.EvoTreeMLE","page":"Models","title":"EvoTrees.EvoTreeMLE","text":"EvoTreeMLE(;kwargs...)\n\nA model type for constructing a EvoTreeMLE, based on EvoTrees.jl, and implementing both an internal API the MLJ model interface. EvoTreeMLE performs maximum likelihood estimation. Assumed distribution is specified through loss kwargs. Both Gaussian and Logistic distributions are supported.\n\nHyper-parameters\n\nloss=:gaussian:         Loss to be be minimized during training. One of:\n\n:gaussian / :gaussian_mle\n:logistic / :logistic_mle\nnrounds=10:                 Number of rounds. It corresponds to the number of trees that will be sequentially stacked. Must be >= 1.\neta=0.1:              Learning rate. Each tree raw predictions are scaled by eta prior to be added to the stack of predictions. Must be > 0.\n\nA lower eta results in slower learning, requiring a higher nrounds but typically improves model performance.  \n\nlambda::T=0.0:              L2 regularization term on weights. Must be >= 0. Higher lambda can result in a more robust model.\ngamma::T=0.0:               Minimum gain imprvement needed to perform a node split. Higher gamma can result in a more robust model. Must be >= 0.\nmax_depth=5:                Maximum depth of a tree. Must be >= 1. A tree of depth 1 is made of a single prediction leaf. A complete tree of depth N contains 2^(N - 1) terminal leaves and 2^(N - 1) - 1 split nodes. Compute cost is proportional to 2^max_depth. Typical optimal values are in the 3 to 9 range.\nmin_weight=8.0:             Minimum weight needed in a node to perform a split. Matches the number of observations by default or the sum of weights as provided by the weights vector. Must be > 0.\nrowsample=1.0:              Proportion of rows that are sampled at each iteration to build the tree. Should be in ]0, 1].\ncolsample=1.0:              Proportion of columns / features that are sampled at each iteration to build the tree. Should be in ]0, 1].\nnbins=32:                   Number of bins into which each feature is quantized. Buckets are defined based on quantiles, hence resulting in equal weight bins. Should be between 2 and 255.\nmonotone_constraints=Dict{Int, Int}(): Specify monotonic constraints using a dict where the key is the feature index and the value the applicable constraint (-1=decreasing, 0=none, 1=increasing).  !Experimental feature: note that for MLE regression, constraints may not be enforced systematically.\ntree_type=\"binary\"          Tree structure to be used. One of:\nbinary:       Each node of a tree is grown independently. Tree are built depthwise until max depth is reach or if min weight or gain (see gamma) stops further node splits.  \noblivious:    A common splitting condition is imposed to all nodes of a given depth. \nrng=123:                    Either an integer used as a seed to the random number generator or an actual random number generator (::Random.AbstractRNG).\n\nInternal API\n\nDo config = EvoTreeMLE() to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in EvoTreeMLE(max_depth=...).\n\nTraining model\n\nA model is built using fit_evotree:\n\nmodel = fit_evotree(config; x_train, y_train, kwargs...)\n\nInference\n\nPredictions are obtained using predict which returns a Matrix of size [nobs, nparams] where the second dimensions refer to μ & σ for Normal/Gaussian and μ & s for Logistic.\n\nEvoTrees.predict(model, X)\n\nAlternatively, models act as a functor, returning predictions when called as a function with features as argument:\n\nmodel(X)\n\nMLJ\n\nFrom MLJ, the type can be imported using:\n\nEvoTreeMLE = @load EvoTreeMLE pkg=EvoTrees\n\nDo model = EvoTreeMLE() to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in EvoTreeMLE(loss=...).\n\nTraining data\n\nIn MLJ or MLJBase, bind an instance model to data with\n\nmach = machine(model, X, y)\n\nwhere\n\nX: any table of input features (eg, a DataFrame) whose columns each have one of the following element scitypes: Continuous, Count, or <:OrderedFactor; check column scitypes with schema(X)\ny: is the target, which can be any AbstractVector whose element scitype is <:Continuous; check the scitype with scitype(y)\n\nTrain the machine using fit!(mach, rows=...).\n\nOperations\n\npredict(mach, Xnew): returns a vector of Gaussian or Logistic distributions (according to provided loss) given features Xnew having the same scitype as X above.\n\nPredictions are probabilistic.\n\nSpecific metrics can also be predicted using:\n\npredict_mean(mach, Xnew)\npredict_mode(mach, Xnew)\npredict_median(mach, Xnew)\n\nFitted parameters\n\nThe fields of fitted_params(mach) are:\n\n:fitresult: The GBTree object returned by EvoTrees.jl fitting algorithm.\n\nReport\n\nThe fields of report(mach) are:\n\n:features: The names of the features encountered in training.\n\nExamples\n\n# Internal API\nusing EvoTrees\nconfig = EvoTreeMLE(max_depth=5, nbins=32, nrounds=100)\nnobs, nfeats = 1_000, 5\nx_train, y_train = randn(nobs, nfeats), rand(nobs)\nmodel = fit_evotree(config; x_train, y_train)\npreds = EvoTrees.predict(model, x_train)\n\n# MLJ Interface\nusing MLJ\nEvoTreeMLE = @load EvoTreeMLE pkg=EvoTrees\nmodel = EvoTreeMLE(max_depth=5, nbins=32, nrounds=100)\nX, y = @load_boston\nmach = machine(model, X, y) |> fit!\npreds = predict(mach, X)\npreds = predict_mean(mach, X)\npreds = predict_mode(mach, X)\npreds = predict_median(mach, X)\n\n\n\n\n\n","category":"type"},{"location":"models/#EvoTreeGaussian","page":"Models","title":"EvoTreeGaussian","text":"","category":"section"},{"location":"models/","page":"Models","title":"Models","text":"EvoTreeGaussian is to be deprecated. Please use EvoTreeMLE with loss = :gaussian_mle. ","category":"page"},{"location":"models/","page":"Models","title":"Models","text":"EvoTreeGaussian","category":"page"},{"location":"models/#EvoTrees.EvoTreeGaussian","page":"Models","title":"EvoTrees.EvoTreeGaussian","text":"EvoTreeGaussian(;kwargs...)\n\nA model type for constructing a EvoTreeGaussian, based on EvoTrees.jl, and implementing both an internal API the MLJ model interface. EvoTreeGaussian is used to perform Gaussian probabilistic regression, fitting μ and σ parameters to maximize likelihood.\n\nHyper-parameters\n\nnrounds=10:                 Number of rounds. It corresponds to the number of trees that will be sequentially stacked. Must be >= 1.\neta=0.1:              Learning rate. Each tree raw predictions are scaled by eta prior to be added to the stack of predictions. Must be > 0. A lower eta results in slower learning, requiring a higher nrounds but typically improves model performance.  \nlambda::T=0.0:              L2 regularization term on weights. Must be >= 0. Higher lambda can result in a more robust model.\ngamma::T=0.0:               Minimum gain imprvement needed to perform a node split. Higher gamma can result in a more robust model. Must be >= 0.\nmax_depth=5:                Maximum depth of a tree. Must be >= 1. A tree of depth 1 is made of a single prediction leaf. A complete tree of depth N contains 2^(N - 1) terminal leaves and 2^(N - 1) - 1 split nodes. Compute cost is proportional to 2^max_depth. Typical optimal values are in the 3 to 9 range.\nmin_weight=8.0:             Minimum weight needed in a node to perform a split. Matches the number of observations by default or the sum of weights as provided by the weights vector. Must be > 0.\nrowsample=1.0:              Proportion of rows that are sampled at each iteration to build the tree. Should be in ]0, 1].\ncolsample=1.0:              Proportion of columns / features that are sampled at each iteration to build the tree. Should be in ]0, 1].\nnbins=32:                   Number of bins into which each feature is quantized. Buckets are defined based on quantiles, hence resulting in equal weight bins. Should be between 2 and 255.\nmonotone_constraints=Dict{Int, Int}(): Specify monotonic constraints using a dict where the key is the feature index and the value the applicable constraint (-1=decreasing, 0=none, 1=increasing).  !Experimental feature: note that for Gaussian regression, constraints may not be enforce systematically.\ntree_type=\"binary\"    Tree structure to be used. One of:\nbinary:       Each node of a tree is grown independently. Tree are built depthwise until max depth is reach or if min weight or gain (see gamma) stops further node splits.  \noblivious:    A common splitting condition is imposed to all nodes of a given depth. \nrng=123:                    Either an integer used as a seed to the random number generator or an actual random number generator (::Random.AbstractRNG).\n\nInternal API\n\nDo config = EvoTreeGaussian() to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in EvoTreeGaussian(max_depth=...).\n\nTraining model\n\nA model is built using fit_evotree:\n\nmodel = fit_evotree(config; x_train, y_train, kwargs...)\n\nInference\n\nPredictions are obtained using predict which returns a Matrix of size [nobs, 2] where the second dimensions refer to μ and σ respectively:\n\nEvoTrees.predict(model, X)\n\nAlternatively, models act as a functor, returning predictions when called as a function with features as argument:\n\nmodel(X)\n\nMLJ\n\nFrom MLJ, the type can be imported using:\n\nEvoTreeGaussian = @load EvoTreeGaussian pkg=EvoTrees\n\nDo model = EvoTreeGaussian() to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in EvoTreeGaussian(loss=...).\n\nTraining data\n\nIn MLJ or MLJBase, bind an instance model to data with\n\nmach = machine(model, X, y)\n\nwhere\n\nX: any table of input features (eg, a DataFrame) whose columns each have one of the following element scitypes: Continuous, Count, or <:OrderedFactor; check column scitypes with schema(X)\ny: is the target, which can be any AbstractVector whose element scitype is <:Continuous; check the scitype with scitype(y)\n\nTrain the machine using fit!(mach, rows=...).\n\nOperations\n\npredict(mach, Xnew): returns a vector of Gaussian distributions given features Xnew having the same scitype as X above.\n\nPredictions are probabilistic.\n\nSpecific metrics can also be predicted using:\n\npredict_mean(mach, Xnew)\npredict_mode(mach, Xnew)\npredict_median(mach, Xnew)\n\nFitted parameters\n\nThe fields of fitted_params(mach) are:\n\n:fitresult: The GBTree object returned by EvoTrees.jl fitting algorithm.\n\nReport\n\nThe fields of report(mach) are:\n\n:features: The names of the features encountered in training.\n\nExamples\n\n# Internal API\nusing EvoTrees\nparams = EvoTreeGaussian(max_depth=5, nbins=32, nrounds=100)\nnobs, nfeats = 1_000, 5\nx_train, y_train = randn(nobs, nfeats), rand(nobs)\nmodel = fit_evotree(params; x_train, y_train)\npreds = EvoTrees.predict(model, x_train)\n\n# MLJ Interface\nusing MLJ\nEvoTreeGaussian = @load EvoTreeGaussian pkg=EvoTrees\nmodel = EvoTreeGaussian(max_depth=5, nbins=32, nrounds=100)\nX, y = @load_boston\nmach = machine(model, X, y) |> fit!\npreds = predict(mach, X)\npreds = predict_mean(mach, X)\npreds = predict_mode(mach, X)\npreds = predict_median(mach, X)\n\n\n\n\n\n","category":"type"},{"location":"tutorials/classification-iris/#Classification-on-Iris-dataset","page":"Classification - IRIS","title":"Classification on Iris dataset","text":"","category":"section"},{"location":"tutorials/classification-iris/","page":"Classification - IRIS","title":"Classification - IRIS","text":"We will use the iris dataset, which is included in the MLDatasets package. This dataset consists of measurements of the sepal length, sepal width, petal length, and petal width for three different types of iris flowers: Setosa, Versicolor, and Virginica.","category":"page"},{"location":"tutorials/classification-iris/#Getting-started","page":"Classification - IRIS","title":"Getting started","text":"","category":"section"},{"location":"tutorials/classification-iris/","page":"Classification - IRIS","title":"Classification - IRIS","text":"To begin, we will load the required packages and the dataset:","category":"page"},{"location":"tutorials/classification-iris/","page":"Classification - IRIS","title":"Classification - IRIS","text":"using EvoTrees\nusing MLDatasets\nusing DataFrames\nusing Statistics: mean\nusing CategoricalArrays\nusing Random\n\ndf = MLDatasets.Iris().dataframe","category":"page"},{"location":"tutorials/classification-iris/#Preprocessing","page":"Classification - IRIS","title":"Preprocessing","text":"","category":"section"},{"location":"tutorials/classification-iris/","page":"Classification - IRIS","title":"Classification - IRIS","text":"Before we can train our model, we need to preprocess the dataset. We will convert the class variable, which specifies the type of iris flower, into a categorical variable.","category":"page"},{"location":"tutorials/classification-iris/","page":"Classification - IRIS","title":"Classification - IRIS","text":"Random.seed!(123)\n\ndf[!, :class] = categorical(df[!, :class])\ntarget_name = \"class\"\nfnames = setdiff(names(df), [target_name])\n\ntrain_ratio = 0.8\ntrain_indices = randperm(nrow(df))[1:Int(train_ratio * nrow(df))]\n\ndtrain = df[train_indices, :]\ndeval = df[setdiff(1:nrow(df), train_indices), :]","category":"page"},{"location":"tutorials/classification-iris/#Training","page":"Classification - IRIS","title":"Training","text":"","category":"section"},{"location":"tutorials/classification-iris/","page":"Classification - IRIS","title":"Classification - IRIS","text":"Now we are ready to train our model. We will first define a model configuration using the EvoTreeClassifier model constructor.  Then, we'll use fit_evotree to train a boosted tree model. We'll pass optional x_eval and y_eval arguments, which enable the usage of early stopping. ","category":"page"},{"location":"tutorials/classification-iris/","page":"Classification - IRIS","title":"Classification - IRIS","text":"config = EvoTreeClassifier(\n    nrounds=200, \n    eta=0.05, \n    max_depth=5, \n    lambda=0.1, \n    rowsample=0.8, \n    colsample=0.8)\n\nmodel = fit_evotree(config, dtrain;\n    target_name,\n    fnames,\n    deval,\n    metric = :mlogloss,\n    early_stopping_rounds=10,\n    print_every_n=10)","category":"page"},{"location":"tutorials/classification-iris/","page":"Classification - IRIS","title":"Classification - IRIS","text":"Finally, we can get predictions by passing training and testing data to our model. We can then evaluate the accuracy of our model, which should be near 100% for this simple classification problem. ","category":"page"},{"location":"tutorials/classification-iris/","page":"Classification - IRIS","title":"Classification - IRIS","text":"pred_train = model(x_train)\nidx_train = [findmax(row)[2] for row in eachrow(pred_train)]\n\npred_eval = model(x_eval)\nidx_eval = [findmax(row)[2] for row in eachrow(pred_eval)]","category":"page"},{"location":"tutorials/classification-iris/","page":"Classification - IRIS","title":"Classification - IRIS","text":"julia> mean(idx_train .== levelcode.(y_train))\n1.0\n\njulia> mean(idx_eval .== levelcode.(y_eval))\n0.9333333333333333","category":"page"},{"location":"tutorials/examples-MLJ/#MLJ-Integration","page":"MLJ API","title":"MLJ Integration","text":"","category":"section"},{"location":"tutorials/examples-MLJ/","page":"MLJ API","title":"MLJ API","text":"EvoTrees.jl provides a first-class integration with the MLJ ecosystem. ","category":"page"},{"location":"tutorials/examples-MLJ/","page":"MLJ API","title":"MLJ API","text":"See official project page for more info.","category":"page"},{"location":"tutorials/examples-MLJ/","page":"MLJ API","title":"MLJ API","text":"To use with MLJ, an EvoTrees model configuration must first be initialized using either: ","category":"page"},{"location":"tutorials/examples-MLJ/","page":"MLJ API","title":"MLJ API","text":"EvoTreeRegressor\nEvoTreeClassifier\nEvoTreeCount\nEvoTreeMLE","category":"page"},{"location":"tutorials/examples-MLJ/","page":"MLJ API","title":"MLJ API","text":"The model is then passed to MLJ's machine, opening access to the rest of the MLJ modeling ecosystem. ","category":"page"},{"location":"tutorials/examples-MLJ/","page":"MLJ API","title":"MLJ API","text":"using StatsBase: sample\nusing EvoTrees\nusing EvoTrees: sigmoid, logit # only needed to create the synthetic data below\nusing MLJBase\n\nfeatures = rand(10_000) .* 5 .- 2\nX = reshape(features, (size(features)[1], 1))\nY = sin.(features) .* 0.5 .+ 0.5\nY = logit(Y) + randn(size(Y))\nY = sigmoid(Y)\ny = Y\nX = MLJBase.table(X)\n\n# linear regression\ntree_model = EvoTreeRegressor(loss=:linear, max_depth=5, eta=0.05, nrounds=10)\n\n# set machine\nmach = machine(tree_model, X, y)\n\n# partition data\ntrain, test = partition(eachindex(y), 0.7, shuffle=true); # 70:30 split\n\n# fit data\nfit!(mach, rows=train, verbosity=1)\n\n# continue training\nmach.model.nrounds += 10\nfit!(mach, rows=train, verbosity=1)\n\n# predict on train data\npred_train = predict(mach, selectrows(X, train))\nmean(abs.(pred_train - selectrows(Y, train)))\n\n# predict on test data\npred_test = predict(mach, selectrows(X, test))\nmean(abs.(pred_test - selectrows(Y, test)))","category":"page"},{"location":"#[EvoTrees.jl](https://github.com/Evovest/EvoTrees.jl)","page":"Introduction","title":"EvoTrees.jl","text":"","category":"section"},{"location":"","page":"Introduction","title":"Introduction","text":"A Julia implementation of boosted trees with CPU and GPU support. Efficient histogram based algorithms with support for multiple loss functions, including various regressions, multi-classification and Gaussian max likelihood. ","category":"page"},{"location":"","page":"Introduction","title":"Introduction","text":"See the examples-API section to get started using the internal API, or examples-MLJ to use within the MLJ framework.","category":"page"},{"location":"","page":"Introduction","title":"Introduction","text":"Complete details about hyper-parameters are found in the Models section.","category":"page"},{"location":"","page":"Introduction","title":"Introduction","text":"R binding available.","category":"page"},{"location":"#Installation","page":"Introduction","title":"Installation","text":"","category":"section"},{"location":"","page":"Introduction","title":"Introduction","text":"Latest:","category":"page"},{"location":"","page":"Introduction","title":"Introduction","text":"julia> Pkg.add(url=\"https://github.com/Evovest/EvoTrees.jl\")","category":"page"},{"location":"","page":"Introduction","title":"Introduction","text":"From General Registry:","category":"page"},{"location":"","page":"Introduction","title":"Introduction","text":"julia> Pkg.add(\"EvoTrees\")","category":"page"},{"location":"#Quick-start","page":"Introduction","title":"Quick start","text":"","category":"section"},{"location":"","page":"Introduction","title":"Introduction","text":"A model configuration must first be defined, using one of the model constructor: ","category":"page"},{"location":"","page":"Introduction","title":"Introduction","text":"EvoTreeRegressor\nEvoTreeClassifier\nEvoTreeCount\nEvoTreeMLE","category":"page"},{"location":"","page":"Introduction","title":"Introduction","text":"Then fitting can be performed using fit_evotree. 2 broad methods are supported: Matrix and Tables based inputs. Optional kwargs can be used to specify eval data on which to track eval metric and perform early stopping. Look at the docs for more details on available hyper-parameters for each of the above constructors and other options for training.","category":"page"},{"location":"","page":"Introduction","title":"Introduction","text":"Predictions are obtained by passing features data to the model. Model acts as a functor, ie. it's a struct containing the fitted model as well as a function generating the prediction of that model for the features argument. ","category":"page"},{"location":"#Matrix-features-input","page":"Introduction","title":"Matrix features input","text":"","category":"section"},{"location":"","page":"Introduction","title":"Introduction","text":"using EvoTrees\n\nconfig = EvoTreeRegressor(\n    loss=:mse, \n    nrounds=100, \n    max_depth=6,\n    nbins=32,\n    eta=0.1)\n\nx_train, y_train = rand(1_000, 10), rand(1_000)\nm = fit_evotree(config; x_train, y_train)\npreds = m(x_train)","category":"page"},{"location":"#DataFrames-and-Tables-input","page":"Introduction","title":"DataFrames and Tables input","text":"","category":"section"},{"location":"","page":"Introduction","title":"Introduction","text":"When using a Tables compatible input such as DataFrames, features with elements types Real (incl. Bool) and Categorical are automatically recognized as input features. Alternatively, fnames kwarg can be used. ","category":"page"},{"location":"","page":"Introduction","title":"Introduction","text":"Categorical features are treated accordingly by the algorithm. Ordered variables will be treated as numerical features, using ≤ split rule, while unordered variables are using ==. Support is currently limited to a maximum of 255 levels. Bool variables are treated as unordered, 2-levels cat variables.","category":"page"},{"location":"","page":"Introduction","title":"Introduction","text":"dtrain = DataFrame(x_train, :auto)\ndtrain.y .= y_train\nm = fit_evotree(config, dtrain; target_name=\"y\");\nm = fit_evotree(config, dtrain; target_name=\"y\", fnames=[\"x1\", \"x3\"]);","category":"page"},{"location":"#GPU-Acceleration","page":"Introduction","title":"GPU Acceleration","text":"","category":"section"},{"location":"","page":"Introduction","title":"Introduction","text":"If running on a CUDA enabled machine, training and inference on GPU can be triggered through the device kwarg: ","category":"page"},{"location":"","page":"Introduction","title":"Introduction","text":"m = fit_evotree(config, dtrain; target_name=\"y\", device=\"gpu\");\np = m(dtrain; device=\"gpu\")","category":"page"},{"location":"#Reproducibility","page":"Introduction","title":"Reproducibility","text":"","category":"section"},{"location":"","page":"Introduction","title":"Introduction","text":"EvoTrees models trained on cpu can be fully reproducible.","category":"page"},{"location":"","page":"Introduction","title":"Introduction","text":"Models of the gradient boosting family typically involve some stochasticity.  In EvoTrees, this primarily concern the the 2 subsampling parameters rowsample and colsample. The other stochastic operation happens at model initialisation when the features are binarized to allow for fast histogram construction: a random subsample of 1_000 * nbins is used to compute the breaking points. ","category":"page"},{"location":"","page":"Introduction","title":"Introduction","text":"These random parts of the algorithm can be deterministically reproduced on cpu by specifying an rng to the model constructor. rng can be an integer (ex: 123) or a random generator (ex: Random.Xoshiro(123)).  If no rng is specified, 123 is used by default. When an integer rng is used, a Random.MersenneTwister generator will be created by the EvoTrees's constructor. Otherwise, the provided random generator will be used.  ","category":"page"},{"location":"","page":"Introduction","title":"Introduction","text":"Consequently, the following m1 and m2 models will be identical:","category":"page"},{"location":"","page":"Introduction","title":"Introduction","text":"config = EvoTreeRegressor(rowsample=0.5, rng=123)\nm1 = fit_evotree(config, df; target_name=\"y\");\nconfig = EvoTreeRegressor(rowsample=0.5, rng=123)\nm2 = fit_evotree(config, df; target_name=\"y\");","category":"page"},{"location":"","page":"Introduction","title":"Introduction","text":"However, the following m1 and m2 models won't be because the there's stochasticity involved in the model from rowsample and the random generator in the config isn't reset between the fits:","category":"page"},{"location":"","page":"Introduction","title":"Introduction","text":"config = EvoTreeRegressor(rowsample=0.5, rng=123)\nm1 = fit_evotree(config, df; target_name=\"y\");\nm2 = fit_evotree(config, df; target_name=\"y\");","category":"page"},{"location":"","page":"Introduction","title":"Introduction","text":"Note that in presence of multiple identical or very highly correlated features, model may not be reproducible if features are permuted since in situation where 2 features provide identical gains, the first one will be selected. Therefore, if the identity relationship doesn't hold on new data, different predictions will be returned from models trained on different features order. ","category":"page"},{"location":"","page":"Introduction","title":"Introduction","text":"At the moment, there's no reproducibility guarantee on GPU, although this may change in the future. ","category":"page"},{"location":"#Save/Load","page":"Introduction","title":"Save/Load","text":"","category":"section"},{"location":"","page":"Introduction","title":"Introduction","text":"EvoTrees.save(m, \"data/model.bson\")\nm = EvoTrees.load(\"data/model.bson\");","category":"page"},{"location":"tutorials/examples-API/#Internal-API-examples","page":"Internal API","title":"Internal API examples","text":"","category":"section"},{"location":"tutorials/examples-API/","page":"Internal API","title":"Internal API","text":"The following provides minimal examples of usage of the various loss functions available in EvoTrees using the internal API.","category":"page"},{"location":"tutorials/examples-API/#Regression","page":"Internal API","title":"Regression","text":"","category":"section"},{"location":"tutorials/examples-API/","page":"Internal API","title":"Internal API","text":"Minimal example to fit a noisy sinus wave.","category":"page"},{"location":"tutorials/examples-API/","page":"Internal API","title":"Internal API","text":"(Image: )","category":"page"},{"location":"tutorials/examples-API/","page":"Internal API","title":"Internal API","text":"using EvoTrees\nusing EvoTrees: sigmoid, logit\nusing StatsBase: sample\n\n# prepare a dataset\nfeatures = rand(10000) .* 20 .- 10\nX = reshape(features, (size(features)[1], 1))\nY = sin.(features) .* 0.5 .+ 0.5\nY = logit(Y) + randn(size(Y))\nY = sigmoid(Y)\n𝑖 = collect(1:size(X, 1))\n\n# train-eval split\n𝑖_sample = sample(𝑖, size(𝑖, 1), replace = false)\ntrain_size = 0.8\n𝑖_train = 𝑖_sample[1:floor(Int, train_size * size(𝑖, 1))]\n𝑖_eval = 𝑖_sample[floor(Int, train_size * size(𝑖, 1))+1:end]\n\nx_train, x_eval = X[𝑖_train, :], X[𝑖_eval, :]\ny_train, y_eval = Y[𝑖_train], Y[𝑖_eval]\n\nconfig = EvoTreeRegressor(\n    loss=:mse,\n    nrounds=100, nbins = 100,\n    lambda = 0.5, gamma=0.1, eta=0.1,\n    max_depth=6, min_weight=1.0,\n    rowsample=0.5, colsample=1.0)\n\nmodel = fit_evotree(config; x_train, y_train, x_eval, y_eval, metric=:mse, print_every_n=25)\npred_eval_linear = model(x_eval)\n\n# logistic / cross-entropy\nconfig = EvoTreeRegressor(\n    loss=:logistic,\n    nrounds=100, nbins = 100,\n    lambda = 0.5, gamma=0.1, eta=0.1,\n    max_depth=6, min_weight=1.0,\n    rowsample=0.5, colsample=1.0)\n\nmodel = fit_evotree(config; x_train, y_train, x_eval, y_eval, metric=:logloss, print_every_n=25)\npred_eval_logistic = model(x_eval)\n\n# L1\nconfig = EvoTreeRegressor(\n    loss=:l1, alpha=0.5,\n    nrounds=100, nbins=100,\n    lambda = 0.5, gamma=0.0, eta=0.1,\n    max_depth=6, min_weight=1.0,\n    rowsample=0.5, colsample=1.0)\n\nmodel = fit_evotree(config; x_train, y_train, x_eval, y_eval, metric=:mae, print_every_n=25)\npred_eval_L1 = model(x_eval)","category":"page"},{"location":"tutorials/examples-API/#Poisson-Count","page":"Internal API","title":"Poisson Count","text":"","category":"section"},{"location":"tutorials/examples-API/","page":"Internal API","title":"Internal API","text":"# Poisson\nconfig = EvoTreeCount(\n    loss=:poisson,\n    nrounds=100, nbins=100,\n    lambda=0.5, gamma=0.1, eta=0.1,\n    max_depth=6, min_weight=1.0,\n    rowsample=0.5, colsample=1.0)\n\nmodel = fit_evotree(config; x_train, y_train, x_eval, y_eval, metric = :poisson, print_every_n = 25)\npred_eval_poisson = model(x_eval)","category":"page"},{"location":"tutorials/examples-API/#Quantile-Regression","page":"Internal API","title":"Quantile Regression","text":"","category":"section"},{"location":"tutorials/examples-API/","page":"Internal API","title":"Internal API","text":"(Image: )","category":"page"},{"location":"tutorials/examples-API/","page":"Internal API","title":"Internal API","text":"# q50\nconfig = EvoTreeRegressor(\n    loss=:quantile, alpha=0.5,\n    nrounds=200, nbins=100,\n    lambda=0.1, gamma=0.0, eta=0.05,\n    max_depth=6, min_weight=1.0,\n    rowsample=0.5, colsample=1.0)\n\nmodel = fit_evotree(config; x_train, y_train, x_eval, y_eval, metric = :quantile, print_every_n = 25)\npred_train_q50 = model(x_train)\n\n# q20\nconfig = EvoTreeRegressor(\n    loss=:quantile, alpha=0.2,\n    nrounds=200, nbins=100,\n    lambda=0.1, gamma=0.0, eta=0.05,\n    max_depth=6, min_weight=1.0,\n    rowsample=0.5, colsample=1.0)\n\nmodel = fit_evotree(config; x_train, y_train, x_eval, y_eval, metric = :quantile, print_every_n = 25)\npred_train_q20 = model(x_train)\n\n# q80\nconfig = EvoTreeRegressor(\n    loss=:quantile, alpha=0.8,\n    nrounds=200, nbins=100,\n    lambda=0.1, gamma=0.0, eta=0.05,\n    max_depth=6, min_weight=1.0,\n    rowsample=0.5, colsample=1.0)\n\nmodel = fit_evotree(config; x_train, y_train, x_eval, y_eval, metric = :quantile, print_every_n = 25)\npred_train_q80 = model(x_train)","category":"page"},{"location":"tutorials/examples-API/#Gaussian-Max-Likelihood","page":"Internal API","title":"Gaussian Max Likelihood","text":"","category":"section"},{"location":"tutorials/examples-API/","page":"Internal API","title":"Internal API","text":"(Image: )","category":"page"},{"location":"tutorials/examples-API/","page":"Internal API","title":"Internal API","text":"config = EvoTreeMLE(\n    loss=:gaussian_mle,\n    nrounds=100, nbins=100,\n    lambda=0.0, gamma=0.0, eta=0.1,\n    max_depth=6, rowsample=0.5)","category":"page"}]
 }
diff --git a/dev/tutorials/classification-iris/index.html b/dev/tutorials/classification-iris/index.html
index ba297c38..21b6e8d1 100644
--- a/dev/tutorials/classification-iris/index.html
+++ b/dev/tutorials/classification-iris/index.html
@@ -38,4 +38,4 @@
 1.0
 
 julia&gt; mean(idx_eval .== levelcode.(y_eval))
-0.9333333333333333</code></pre></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../logistic-regression-titanic/">« Logistic Regression - Titanic</a><a class="docs-footer-nextpage" href="../ranking-LTRC/">Ranking - Yahoo! LTRC »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 0.27.25 on <span class="colophon-date" title="Monday 11 September 2023 00:43">Monday 11 September 2023</span>. Using Julia version 1.6.7.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
+0.9333333333333333</code></pre></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../logistic-regression-titanic/">« Logistic Regression - Titanic</a><a class="docs-footer-nextpage" href="../ranking-LTRC/">Ranking - Yahoo! LTRC »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 0.27.25 on <span class="colophon-date" title="Tuesday 12 September 2023 01:45">Tuesday 12 September 2023</span>. Using Julia version 1.6.7.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
diff --git a/dev/tutorials/examples-API/index.html b/dev/tutorials/examples-API/index.html
index 54f0b959..5cd9289c 100644
--- a/dev/tutorials/examples-API/index.html
+++ b/dev/tutorials/examples-API/index.html
@@ -94,4 +94,4 @@
     loss=:gaussian_mle,
     nrounds=100, nbins=100,
     lambda=0.0, gamma=0.0, eta=0.1,
-    max_depth=6, rowsample=0.5)</code></pre></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../ranking-LTRC/">« Ranking - Yahoo! LTRC</a><a class="docs-footer-nextpage" href="../examples-MLJ/">MLJ API »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 0.27.25 on <span class="colophon-date" title="Monday 11 September 2023 00:43">Monday 11 September 2023</span>. Using Julia version 1.6.7.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
+    max_depth=6, rowsample=0.5)</code></pre></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../ranking-LTRC/">« Ranking - Yahoo! LTRC</a><a class="docs-footer-nextpage" href="../examples-MLJ/">MLJ API »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 0.27.25 on <span class="colophon-date" title="Tuesday 12 September 2023 01:45">Tuesday 12 September 2023</span>. Using Julia version 1.6.7.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
diff --git a/dev/tutorials/examples-MLJ/index.html b/dev/tutorials/examples-MLJ/index.html
index 23758d05..3f5ba00a 100644
--- a/dev/tutorials/examples-MLJ/index.html
+++ b/dev/tutorials/examples-MLJ/index.html
@@ -34,4 +34,4 @@
 
 # predict on test data
 pred_test = predict(mach, selectrows(X, test))
-mean(abs.(pred_test - selectrows(Y, test)))</code></pre></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../examples-API/">« Internal API</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 0.27.25 on <span class="colophon-date" title="Monday 11 September 2023 00:43">Monday 11 September 2023</span>. Using Julia version 1.6.7.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
+mean(abs.(pred_test - selectrows(Y, test)))</code></pre></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../examples-API/">« Internal API</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 0.27.25 on <span class="colophon-date" title="Tuesday 12 September 2023 01:45">Tuesday 12 September 2023</span>. Using Julia version 1.6.7.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
diff --git a/dev/tutorials/logistic-regression-titanic/index.html b/dev/tutorials/logistic-regression-titanic/index.html
index c8a3446a..ee2cf1fb 100644
--- a/dev/tutorials/logistic-regression-titanic/index.html
+++ b/dev/tutorials/logistic-regression-titanic/index.html
@@ -53,4 +53,4 @@
         &quot;Pclass&quot; =&gt; 0.11354283043193575
          &quot;SibSp&quot; =&gt; 0.05129209383816148
          &quot;Parch&quot; =&gt; 0.017385183317069588
- &quot;Age_ismissing&quot; =&gt; 0.013685310503669728</code></pre></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../regression-boston/">« Regression - Boston</a><a class="docs-footer-nextpage" href="../classification-iris/">Classification - IRIS »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 0.27.25 on <span class="colophon-date" title="Monday 11 September 2023 00:43">Monday 11 September 2023</span>. Using Julia version 1.6.7.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
+ &quot;Age_ismissing&quot; =&gt; 0.013685310503669728</code></pre></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../regression-boston/">« Regression - Boston</a><a class="docs-footer-nextpage" href="../classification-iris/">Classification - IRIS »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 0.27.25 on <span class="colophon-date" title="Tuesday 12 September 2023 01:45">Tuesday 12 September 2023</span>. Using Julia version 1.6.7.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
diff --git a/dev/tutorials/ranking-LTRC/index.html b/dev/tutorials/ranking-LTRC/index.html
index 4165b27d..e27a1a61 100644
--- a/dev/tutorials/ranking-LTRC/index.html
+++ b/dev/tutorials/ranking-LTRC/index.html
@@ -1,5 +1,5 @@
 <!DOCTYPE html>
-<html lang="en"><head><meta charset="UTF-8"/><meta name="viewport" content="width=device-width, initial-scale=1.0"/><title>Ranking - Yahoo! LTRC · EvoTrees.jl</title><script data-outdated-warner src="../../assets/warner.js"></script><link href="https://cdnjs.cloudflare.com/ajax/libs/lato-font/3.0.0/css/lato-font.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/juliamono/0.045/juliamono.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.15.4/css/fontawesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.15.4/css/solid.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.15.4/css/brands.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/KaTeX/0.13.24/katex.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL="../.."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.6/require.min.js" data-main="../../assets/documenter.js"></script><script src="../../siteinfo.js"></script><script src="../../../versions.js"></script><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/documenter-dark.css" data-theme-name="documenter-dark" data-theme-primary-dark/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/documenter-light.css" data-theme-name="documenter-light" data-theme-primary/><script src="../../assets/themeswap.js"></script><link href="../../assets/style.css" rel="stylesheet" type="text/css"/></head><body><div id="documenter"><nav class="docs-sidebar"><a class="docs-logo" href="../../"><img src="../../assets/logo.png" alt="EvoTrees.jl logo"/></a><form class="docs-search" action="../../search/"><input class="docs-search-query" id="documenter-search-query" name="q" type="text" placeholder="Search docs"/></form><ul class="docs-menu"><li><a class="tocitem" href="../../">Introduction</a></li><li><a class="tocitem" href="../../models/">Models</a></li><li><a class="tocitem" href="../../api/">API</a></li><li><span class="tocitem">Tutorials</span><ul><li><a class="tocitem" href="../regression-boston/">Regression - Boston</a></li><li><a class="tocitem" href="../logistic-regression-titanic/">Logistic Regression - Titanic</a></li><li><a class="tocitem" href="../classification-iris/">Classification - IRIS</a></li><li class="is-active"><a class="tocitem" href>Ranking - Yahoo! LTRC</a><ul class="internal"><li><a class="tocitem" href="#Getting-started"><span>Getting started</span></a></li><li><a class="tocitem" href="#Load-LIBSVM-format-data"><span>Load LIBSVM format data</span></a></li><li><a class="tocitem" href="#Preprocessing"><span>Preprocessing</span></a></li><li><a class="tocitem" href="#Training"><span>Training</span></a></li><li><a class="tocitem" href="#Model-evaluation"><span>Model evaluation</span></a></li><li><a class="tocitem" href="#Logistic-regression-alternative"><span>Logistic regression alternative</span></a></li><li><a class="tocitem" href="#Conclusion"><span>Conclusion</span></a></li></ul></li><li><a class="tocitem" href="../examples-API/">Internal API</a></li><li><a class="tocitem" href="../examples-MLJ/">MLJ API</a></li></ul></li></ul><div class="docs-version-selector field has-addons"><div class="control"><span class="docs-label button is-static is-size-7">Version</span></div><div class="docs-selector control is-expanded"><div class="select is-fullwidth is-size-7"><select id="documenter-version-selector"></select></div></div></div></nav><div class="docs-main"><header class="docs-navbar"><nav class="breadcrumb"><ul class="is-hidden-mobile"><li><a class="is-disabled">Tutorials</a></li><li class="is-active"><a href>Ranking - Yahoo! LTRC</a></li></ul><ul class="is-hidden-tablet"><li class="is-active"><a href>Ranking - Yahoo! LTRC</a></li></ul></nav><div class="docs-right"><a class="docs-edit-link" href="https://github.com/Evovest/EvoTrees.jl/blob/main/docs/src/tutorials/ranking-LTRC.md" title="Edit on GitHub"><span class="docs-icon fab"></span><span class="docs-label is-hidden-touch">Edit on GitHub</span></a><a class="docs-settings-button fas fa-cog" id="documenter-settings-button" href="#" title="Settings"></a><a class="docs-sidebar-button fa fa-bars is-hidden-desktop" id="documenter-sidebar-button" href="#"></a></div></header><article class="content" id="documenter-page"><h1 id="Ranking-with-Yahoo!-Learning-to-Rank-Challenge."><a class="docs-heading-anchor" href="#Ranking-with-Yahoo!-Learning-to-Rank-Challenge.">Ranking with Yahoo! Learning to Rank Challenge.</a><a id="Ranking-with-Yahoo!-Learning-to-Rank-Challenge.-1"></a><a class="docs-heading-anchor-permalink" href="#Ranking-with-Yahoo!-Learning-to-Rank-Challenge." title="Permalink"></a></h1><p>In this tutorial, we we walk through how a ranking task can be tackled using regular regression techniques without compromise on performance compared to specialized ranking learners.  The data used is from the <code>C14 - Yahoo! Learning to Rank Challenge</code>, which can be obtained following a request to <a href="https://webscope.sandbox.yahoo.com">https://webscope.sandbox.yahoo.com</a>.</p><h2 id="Getting-started"><a class="docs-heading-anchor" href="#Getting-started">Getting started</a><a id="Getting-started-1"></a><a class="docs-heading-anchor-permalink" href="#Getting-started" title="Permalink"></a></h2><p>To begin, we load the required packages:</p><pre><code class="language-julia hljs">using EvoTrees
+<html lang="en"><head><meta charset="UTF-8"/><meta name="viewport" content="width=device-width, initial-scale=1.0"/><title>Ranking - Yahoo! LTRC · EvoTrees.jl</title><script data-outdated-warner src="../../assets/warner.js"></script><link href="https://cdnjs.cloudflare.com/ajax/libs/lato-font/3.0.0/css/lato-font.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/juliamono/0.045/juliamono.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.15.4/css/fontawesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.15.4/css/solid.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.15.4/css/brands.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/KaTeX/0.13.24/katex.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL="../.."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.6/require.min.js" data-main="../../assets/documenter.js"></script><script src="../../siteinfo.js"></script><script src="../../../versions.js"></script><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/documenter-dark.css" data-theme-name="documenter-dark" data-theme-primary-dark/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/documenter-light.css" data-theme-name="documenter-light" data-theme-primary/><script src="../../assets/themeswap.js"></script><link href="../../assets/style.css" rel="stylesheet" type="text/css"/></head><body><div id="documenter"><nav class="docs-sidebar"><a class="docs-logo" href="../../"><img src="../../assets/logo.png" alt="EvoTrees.jl logo"/></a><form class="docs-search" action="../../search/"><input class="docs-search-query" id="documenter-search-query" name="q" type="text" placeholder="Search docs"/></form><ul class="docs-menu"><li><a class="tocitem" href="../../">Introduction</a></li><li><a class="tocitem" href="../../models/">Models</a></li><li><a class="tocitem" href="../../api/">API</a></li><li><span class="tocitem">Tutorials</span><ul><li><a class="tocitem" href="../regression-boston/">Regression - Boston</a></li><li><a class="tocitem" href="../logistic-regression-titanic/">Logistic Regression - Titanic</a></li><li><a class="tocitem" href="../classification-iris/">Classification - IRIS</a></li><li class="is-active"><a class="tocitem" href>Ranking - Yahoo! LTRC</a><ul class="internal"><li><a class="tocitem" href="#Getting-started"><span>Getting started</span></a></li><li><a class="tocitem" href="#Load-LIBSVM-format-data"><span>Load LIBSVM format data</span></a></li><li><a class="tocitem" href="#Preprocessing"><span>Preprocessing</span></a></li><li><a class="tocitem" href="#Training"><span>Training</span></a></li><li><a class="tocitem" href="#Model-evaluation"><span>Model evaluation</span></a></li><li><a class="tocitem" href="#Logistic-regression-alternative"><span>Logistic regression alternative</span></a></li><li><a class="tocitem" href="#Conclusion"><span>Conclusion</span></a></li></ul></li><li><a class="tocitem" href="../examples-API/">Internal API</a></li><li><a class="tocitem" href="../examples-MLJ/">MLJ API</a></li></ul></li></ul><div class="docs-version-selector field has-addons"><div class="control"><span class="docs-label button is-static is-size-7">Version</span></div><div class="docs-selector control is-expanded"><div class="select is-fullwidth is-size-7"><select id="documenter-version-selector"></select></div></div></div></nav><div class="docs-main"><header class="docs-navbar"><nav class="breadcrumb"><ul class="is-hidden-mobile"><li><a class="is-disabled">Tutorials</a></li><li class="is-active"><a href>Ranking - Yahoo! LTRC</a></li></ul><ul class="is-hidden-tablet"><li class="is-active"><a href>Ranking - Yahoo! LTRC</a></li></ul></nav><div class="docs-right"><a class="docs-edit-link" href="https://github.com/Evovest/EvoTrees.jl/blob/main/docs/src/tutorials/ranking-LTRC.md" title="Edit on GitHub"><span class="docs-icon fab"></span><span class="docs-label is-hidden-touch">Edit on GitHub</span></a><a class="docs-settings-button fas fa-cog" id="documenter-settings-button" href="#" title="Settings"></a><a class="docs-sidebar-button fa fa-bars is-hidden-desktop" id="documenter-sidebar-button" href="#"></a></div></header><article class="content" id="documenter-page"><h1 id="Ranking-with-Yahoo!-Learning-to-Rank-Challenge."><a class="docs-heading-anchor" href="#Ranking-with-Yahoo!-Learning-to-Rank-Challenge.">Ranking with Yahoo! Learning to Rank Challenge.</a><a id="Ranking-with-Yahoo!-Learning-to-Rank-Challenge.-1"></a><a class="docs-heading-anchor-permalink" href="#Ranking-with-Yahoo!-Learning-to-Rank-Challenge." title="Permalink"></a></h1><p>In this tutorial, we present how a ranking task can be tackled using regular regression techniques without compromising performance compared to specialized ranking learners. The data used is from the <code>C14 - Yahoo! Learning to Rank Challenge</code>, which can be obtained following a request to <a href="https://webscope.sandbox.yahoo.com">https://webscope.sandbox.yahoo.com</a>.</p><h2 id="Getting-started"><a class="docs-heading-anchor" href="#Getting-started">Getting started</a><a id="Getting-started-1"></a><a class="docs-heading-anchor-permalink" href="#Getting-started" title="Permalink"></a></h2><p>To begin, we load the required packages:</p><pre><code class="language-julia hljs">using EvoTrees
 using DataFrames
 using Statistics: mean
 using CategoricalArrays
@@ -100,4 +100,4 @@
 @info &quot;ndcg_test LogLoss&quot; ndcg_test
 
 ┌ Info: ndcg_test LogLoss
-└   ndcg_test = 0.80267</code></pre><h2 id="Conclusion"><a class="docs-heading-anchor" href="#Conclusion">Conclusion</a><a id="Conclusion-1"></a><a class="docs-heading-anchor-permalink" href="#Conclusion" title="Permalink"></a></h2><p>We&#39;ve seen that a ranking problem can be efficiently handled with generic regression tasks, yet achieve comparable performance to specialized ranking loss functions. Below, we present the NDCG obtained from the above experiments along those presented by CatBoost&#39;s <a href="https://github.com/catboost/benchmarks/blob/master/ranking/Readme.md#4-results">benchmarks</a>.</p><table><tr><th style="text-align: right"><strong>Model</strong></th><th style="text-align: right"><strong>NDCG</strong></th></tr><tr><td style="text-align: right"><strong>EvoTrees - mse</strong></td><td style="text-align: right"><strong>0.80080</strong></td></tr><tr><td style="text-align: right"><strong>EvoTrees - logistic</strong></td><td style="text-align: right"><strong>0.80267</strong></td></tr><tr><td style="text-align: right">cat-rmse</td><td style="text-align: right">0.802115</td></tr><tr><td style="text-align: right">cat-query-rmse</td><td style="text-align: right">0.802229</td></tr><tr><td style="text-align: right">cat-pair-logit</td><td style="text-align: right">0.797318</td></tr><tr><td style="text-align: right">cat-pair-logit-pairwise</td><td style="text-align: right">0.790396</td></tr><tr><td style="text-align: right">cat-yeti-rank</td><td style="text-align: right">0.802972</td></tr><tr><td style="text-align: right">xgb-rmse</td><td style="text-align: right">0.798892</td></tr><tr><td style="text-align: right">xgb-pairwise</td><td style="text-align: right">0.800048</td></tr><tr><td style="text-align: right">xgb-lambdamart-ndcg</td><td style="text-align: right">0.800048</td></tr><tr><td style="text-align: right">lgb-rmse</td><td style="text-align: right">0.8013675</td></tr><tr><td style="text-align: right">lgb-pairwise</td><td style="text-align: right">0.801347</td></tr></table><p>It should be noted that the later results were not reproduced in the scope of current tutorial, so one should be careful about any claim of model superiority. The results from CatBoost&#39;s benchmarks were however already indicative of strong performance of non-specialized ranking loss functions, to which this tutorial brings further support. </p></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../classification-iris/">« Classification - IRIS</a><a class="docs-footer-nextpage" href="../examples-API/">Internal API »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 0.27.25 on <span class="colophon-date" title="Monday 11 September 2023 00:43">Monday 11 September 2023</span>. Using Julia version 1.6.7.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
+└   ndcg_test = 0.80267</code></pre><h2 id="Conclusion"><a class="docs-heading-anchor" href="#Conclusion">Conclusion</a><a id="Conclusion-1"></a><a class="docs-heading-anchor-permalink" href="#Conclusion" title="Permalink"></a></h2><p>We&#39;ve seen that a ranking problem can be efficiently handled with generic regression tasks, yet achieve comparable performance to specialized ranking loss functions. Below, we present the NDCG obtained from the above experiments along those published on CatBoost&#39;s <a href="https://github.com/catboost/benchmarks/blob/master/ranking/Readme.md#4-results">benchmarks</a>.</p><table><tr><th style="text-align: right"><strong>Model</strong></th><th style="text-align: right"><strong>NDCG</strong></th></tr><tr><td style="text-align: right"><strong>EvoTrees - mse</strong></td><td style="text-align: right"><strong>0.80080</strong></td></tr><tr><td style="text-align: right"><strong>EvoTrees - logistic</strong></td><td style="text-align: right"><strong>0.80267</strong></td></tr><tr><td style="text-align: right">cat-rmse</td><td style="text-align: right">0.802115</td></tr><tr><td style="text-align: right">cat-query-rmse</td><td style="text-align: right">0.802229</td></tr><tr><td style="text-align: right">cat-pair-logit</td><td style="text-align: right">0.797318</td></tr><tr><td style="text-align: right">cat-pair-logit-pairwise</td><td style="text-align: right">0.790396</td></tr><tr><td style="text-align: right">cat-yeti-rank</td><td style="text-align: right">0.802972</td></tr><tr><td style="text-align: right">xgb-rmse</td><td style="text-align: right">0.798892</td></tr><tr><td style="text-align: right">xgb-pairwise</td><td style="text-align: right">0.800048</td></tr><tr><td style="text-align: right">xgb-lambdamart-ndcg</td><td style="text-align: right">0.800048</td></tr><tr><td style="text-align: right">lgb-rmse</td><td style="text-align: right">0.8013675</td></tr><tr><td style="text-align: right">lgb-pairwise</td><td style="text-align: right">0.801347</td></tr></table><p>It should be noted that the later results were not reproduced in the scope of current tutorial, so one should be careful about any claim of model superiority. The results from CatBoost&#39;s benchmarks were however already indicative of strong performance of non-specialized ranking loss functions, to which this tutorial brings further support. </p></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../classification-iris/">« Classification - IRIS</a><a class="docs-footer-nextpage" href="../examples-API/">Internal API »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 0.27.25 on <span class="colophon-date" title="Tuesday 12 September 2023 01:45">Tuesday 12 September 2023</span>. Using Julia version 1.6.7.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
diff --git a/dev/tutorials/regression-boston/index.html b/dev/tutorials/regression-boston/index.html
index 791c576d..a668f11f 100644
--- a/dev/tutorials/regression-boston/index.html
+++ b/dev/tutorials/regression-boston/index.html
@@ -33,4 +33,4 @@
 1.056997874224627
 
 julia&gt; mean(abs.(pred_eval .- y_eval))
-2.3298767665825264</code></pre></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../../api/">« API</a><a class="docs-footer-nextpage" href="../logistic-regression-titanic/">Logistic Regression - Titanic »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 0.27.25 on <span class="colophon-date" title="Monday 11 September 2023 00:43">Monday 11 September 2023</span>. Using Julia version 1.6.7.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
+2.3298767665825264</code></pre></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../../api/">« API</a><a class="docs-footer-nextpage" href="../logistic-regression-titanic/">Logistic Regression - Titanic »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 0.27.25 on <span class="colophon-date" title="Tuesday 12 September 2023 01:45">Tuesday 12 September 2023</span>. Using Julia version 1.6.7.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>

Model	NDCG
EvoTrees - mse	0.80080
EvoTrees - logistic	0.80267
cat-rmse	0.802115
cat-query-rmse	0.802229
cat-pair-logit	0.797318
cat-pair-logit-pairwise	0.790396
cat-yeti-rank	0.802972
xgb-rmse	0.798892
xgb-pairwise	0.800048
xgb-lambdamart-ndcg	0.800048
lgb-rmse	0.8013675
lgb-pairwise	0.801347