-
Notifications
You must be signed in to change notification settings - Fork 241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor convert_input
to Perform tasks via helper function
#3338
base: develop
Are you sure you want to change the base?
Changes from all commits
614b8f9
838af61
d5e8d24
f22b962
d884203
68d9516
5208b02
f570646
e479c46
d9e911b
ed581f7
63ac964
0f9ac13
b98617f
4b771d3
63f270f
dbb7a6d
74003d9
95fb810
2bcb7c4
fcae9bd
c8e8a02
766174f
d9074df
94c92f0
a578be2
293a68b
f7f6926
8f820b0
4548744
ffc0971
9384343
6dc63f7
04d7835
f5edb7f
3379def
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,130 @@ | ||
#' Return new arrangement of database while adding code to deal with ensembles | ||
#' | ||
#' @param result list of results from the download function | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. PEcAn has a ton of different download functions, so you can't just say "the". Could you clarify which function or set of functions and, more importantly, how this list needs to be structured |
||
#' @param con database connection | ||
#' @param start_date start date of the data | ||
#' @param end_date end date of the data | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What formats are acceptable for start and end date? Character string? Date object? POSIX object? |
||
#' @param write whether to write to the database | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Clarify that I think you mean this needs to be a boolean/logical variable. Also this variable is in the Roxygen here but not in the function definition below |
||
#' @param overwrite Logical: If a file already exists, create a fresh copy? | ||
#' @param insert.new.file whether to insert a new file | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. clarify that this should be boolean/logical |
||
#' @param input.args input arguments obtained from the convert_input function | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is input.args a documented output of the convert_input function or an undocumented internal variable? Just thinking you might want to provide a bit more info about what info this argument actually contains and how it's organized or even basic things like what variable type it is (I'm guessing list?) |
||
#' @param machine machine information | ||
#' @param mimetype data product specific file format | ||
#' @param formatname format name of the data | ||
#' @param allow.conflicting.dates whether to allow conflicting dates | ||
#' @param ensemble ensemble id | ||
#' @param ensemble_name ensemble name | ||
#' @param existing.input existing input records | ||
#' @param existing.dbfile existing dbfile records | ||
#' @param input input records | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Rather than continuing to provide line-by-line comments, I'll just leave the general comment that there's not enough info about most variables. Would be good to know what things are list vs data frames vs vectors vs scalars. Which things are logical vs stings vs numeric. etc. |
||
#' @return list of input and dbfile ids | ||
#' | ||
#' @author Betsy Cowdery, Michael Dietze, Ankur Desai, Tony Gardella, Luke Dramko | ||
|
||
add_database_entries <- function( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The name of this function is very generic. We have LOTS of functions in PEcAn.DB that add db entries. Could you update the name to clarify what type of entries this function is adding? |
||
result, con, start_date, | ||
end_date, overwrite, | ||
insert.new.file, input.args, | ||
machine, mimetype, formatname, | ||
allow.conflicting.dates, ensemble, | ||
ensemble_name, existing.input, | ||
existing.dbfile, input) { | ||
# Setup newinput. This list will contain two variables: a vector of input IDs and a vector of DB IDs for each entry in result. | ||
# This list will be returned. | ||
newinput <- list(input.id = NULL, dbfile.id = NULL) # Blank vectors are null. | ||
|
||
for (i in 1:length(result)) { # Master for loop | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. pretty sure seq_along would be better here |
||
id_not_added <- TRUE | ||
|
||
if (!is.null(existing.input) && nrow(existing.input[[i]]) > 0 && | ||
(existing.input[[i]]$start_date != start_date || existing.input[[i]]$end_date != end_date)) { | ||
# Updating record with new dates | ||
db.query( | ||
paste0( | ||
"UPDATE inputs SET start_date='", start_date, "', end_date='", end_date, | ||
"' WHERE id=", existing.input[[i]]$id | ||
), | ||
con | ||
) | ||
id_not_added <- FALSE | ||
|
||
# The overall structure of this loop has been set up so that exactly one input.id and one dbfile.id will be written to newinput every iteration. | ||
newinput$input.id <- c(newinput$input.id, existing.input[[i]]$id) | ||
newinput$dbfile.id <- c(newinput$dbfile.id, existing.dbfile[[i]]$id) | ||
} | ||
|
||
if (overwrite) { | ||
# A bit hacky, but need to make sure that all fields are updated to expected values (i.e., what they'd be if convert_input was creating a new record) | ||
if (!is.null(existing.input) && nrow(existing.input[[i]]) > 0) { | ||
db.query( | ||
paste0( | ||
"UPDATE dbfiles SET file_path='", dirname(result[[i]]$file[1]), | ||
"', file_name='", result[[i]]$dbfile.name[1], | ||
"' WHERE id=", existing.dbfile[[i]]$id | ||
), | ||
con | ||
) | ||
} | ||
|
||
if (!is.null(existing.dbfile) && nrow(existing.dbfile[[i]]) > 0) { | ||
db.query(paste0( | ||
"UPDATE dbfiles SET file_path='", dirname(result[[i]]$file[1]), | ||
"', file_name='", result[[i]]$dbfile.name[1], | ||
"' WHERE id=", existing.dbfile[[i]]$id | ||
), con) | ||
} | ||
} | ||
|
||
# If there is no ensemble then for each record there should be one parent | ||
# But when you have ensembles, all of the members have one parent !! | ||
parent.id <- if (is.numeric(ensemble)) { | ||
ifelse(is.null(input[[i]]), NA, input[[1]]$id) | ||
} else { | ||
ifelse(is.null(input[[i]]), NA, input[[i]]$id) | ||
} | ||
|
||
|
||
if ("newsite" %in% names(input.args) && !is.null(input.args[["newsite"]])) { | ||
site.id <- input.args$newsite | ||
} | ||
|
||
if (insert.new.file && id_not_added) { | ||
dbfile.id <- dbfile.insert( | ||
in.path = dirname(result[[i]]$file[1]), | ||
in.prefix = result[[i]]$dbfile.name[1], | ||
"Input", | ||
existing.input[[i]]$id, | ||
con, | ||
reuse = TRUE, | ||
hostname = machine$hostname | ||
) | ||
|
||
newinput$input.id <- c(newinput$input.id, existing.input[[i]]$id) | ||
newinput$dbfile.id <- c(newinput$dbfile.id, dbfile.id) | ||
} else if (id_not_added) { | ||
# This is to tell input.insert if we are writing ensembles | ||
# Why does it need it? Because it checks for inputs with the same time period, site, and machine | ||
# and if it returns something it does not insert anymore, but for ensembles, it needs to bypass this condition | ||
ens.flag <- if (!is.null(ensemble) || is.null(ensemble_name)) TRUE else FALSE | ||
|
||
new_entry <- dbfile.input.insert( | ||
in.path = dirname(result[[i]]$file[1]), | ||
in.prefix = result[[i]]$dbfile.name[1], | ||
siteid = site.id, | ||
startdate = start_date, | ||
enddate = end_date, | ||
mimetype = mimetype, | ||
formatname = formatname, | ||
parentid = parent.id, | ||
con = con, | ||
hostname = machine$hostname, | ||
allow.conflicting.dates = allow.conflicting.dates, | ||
ens = ens.flag | ||
) | ||
|
||
newinput$input.id <- c(newinput$input.id, new_entry$input.id) | ||
newinput$dbfile.id <- c(newinput$dbfile.id, new_entry$dbfile.id) | ||
} | ||
} # End for loop | ||
return(newinput) | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
#' Function to check if result has empty or missing files | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. result of what? Need more context |
||
#' | ||
#' @param result A list of dataframes with file paths | ||
#' @param existing.input Existing input records | ||
#' @param existing.dbfile Existing dbfile records | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. expand param definitions |
||
#' @return A list of dataframes with file paths, a list of strings with the output file name, a list of existing input records, and a list of existing dbfile records | ||
#' | ||
#' @author Betsy Cowdery, Michael Dietze, Ankur Desai, Tony Gardella, Luke Dramko | ||
|
||
check_missing_files <- function(result, existing.input = NULL, existing.dbfile = NULL) { | ||
result_sizes <- purrr::map_dfr( | ||
result, | ||
~ dplyr::mutate( | ||
., | ||
file_size = purrr::map_dbl(file, file.size), | ||
missing = is.na(file_size), | ||
empty = file_size == 0 | ||
) | ||
) | ||
|
||
if (any(result_sizes$missing) || any(result_sizes$empty)) { | ||
PEcAn.logger::logger.severe( | ||
"Requested Processing produced empty files or Nonexistent files:\n", | ||
log_format_df(result_sizes[, c(1, 8, 9, 10)]), | ||
"\n Table of results printed above.", | ||
wrap = FALSE | ||
) | ||
} | ||
|
||
|
||
# Wrap in a list for consistent processing later | ||
if (is.data.frame(existing.input)) { | ||
existing.input <- list(existing.input) | ||
} | ||
|
||
if (is.data.frame(existing.dbfile)) { | ||
existing.dbfile <- list(existing.dbfile) | ||
} | ||
return(list(existing.input, existing.dbfile)) | ||
} | ||
|
||
log_format_df <- function(df) { | ||
formatted_df <- rbind(colnames(df), format(df)) | ||
formatted_text <- purrr::reduce(formatted_df, paste, sep = " ") | ||
paste(formatted_text, collapse = "\n") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't tell me anything about what this function does.