Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R] Add training metrics monitoring for xgboost() #10747

Merged
merged 3 commits into from
Nov 7, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 9 additions & 1 deletion R-package/R/xgboost.R
Original file line number Diff line number Diff line change
Expand Up @@ -806,6 +806,8 @@ process.x.and.col.args <- function(
#' If not `NULL`, should be passed as a numeric vector with length matching to the number of rows in `x`.
#' @param verbosity Verbosity of printing messages. Valid values of 0 (silent), 1 (warning),
#' 2 (info), and 3 (debug).
#' @param monitor_training Whether to monitor objective optimization progress on the input data.
#' Note that same 'x' and 'y' data are used for both model fitting and evaluation.
#' @param nthreads Number of parallel threads to use. If passing zero, will use all CPU threads.
#' @param seed Seed to use for random number generation. If passing `NULL`, will draw a random
#' number using R's PRNG system to use as seed.
Expand Down Expand Up @@ -892,6 +894,7 @@ xgboost <- function(
nrounds = 100L,
weights = NULL,
verbosity = 0L,
monitor_training = verbosity > 0,
nthreads = parallel::detectCores(),
seed = 0L,
monotone_constraints = NULL,
Expand Down Expand Up @@ -929,11 +932,16 @@ xgboost <- function(

fn_dm <- if (use_qdm) xgb.QuantileDMatrix else xgb.DMatrix
dm <- do.call(fn_dm, lst_args$dmatrix_args)
evals <- list()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I missed this the last time. How does one use validation with this X/y interface?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no validation. This is just monitoring on the same training data.

I guess evaluation on a different dataset could be added later, but it would need be more involved - for example, it would need to re-encoded categorical columns in the evaluation data using the same categories as in 'x', would need to check whether the classes in 'y' match, and so on.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the explanation. Might be necessary though, considering validation is pretty much the only effective way in keeping ML training in check. I doubt that anyone who uses ML training aside from DL doesn't have validation during training.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's xgb.cv though. There could be an xgboost.cv (or cv.xgboost) to do it better, or somethng like 'eval_data_fraction' (better yet if implemented in the core library), but I think those could be added later after CRAN release blockers are addressed.

if (monitor_training) {
evals <- list(train = dm)
}
model <- xgb.train(
params = params,
data = dm,
nrounds = nrounds,
verbose = verbosity
verbose = verbosity,
evals = evals
)
attributes(model)$metadata <- lst_args$metadata
attributes(model)$call <- match.call()
Expand Down
4 changes: 4 additions & 0 deletions R-package/man/xgboost.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading