-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: make aggregate allow for arbitrary functions from the .json #39
Comments
One thing to keep in mind here is performance. If you have an R function which is applied on every group (of which there can be quite many in these datasets when looking at 1e4-1e5 patients and hourly data), the milliseconds to call functions in R add up considerably (one trick of data.table here is to not call these functions to R, but use the C API iirc). At some point I did play around with this, but from what I remember, arbitrary R functions were barely usable (loading a couple of concepts would suddenly take ~30 mins or more). |
Note that
as suggested by @prockenschaub here works and finishes in very reasonable time. |
Ultimately, I guess it depends on the size of the table, number of groups, and complexity of the function. However, I don't think the fact that it won't work for some cases is a good argument against allowing arbitrary functions. It is a good point, though, and will need to be considered by the person creating the concept. If it finishes in reasonable time, fine. If it takes too long, another option needs to be found. By forwarding |
@mlondschien This might work well for I'm not against this, I just wanted to point out why I did not do something that is a pretty obvious thing one could do: I was worried about user experience. Someone will put in their own function for a time-varying concept and then things stop working without it being clear why. We could add a warning if a non @prockenschaub As side remark: above, when saying non |
My current version of "caregiver": {
"category": "misc",
"aggregate": "unique",
"sources": {
"mimic": [
{
"table": "chartevents",
"val_var": "cgid",
"class": "col_itm",
"target": "ts_tbl"
},
{
"table": "datetimeevents",
"val_var": "cgid",
"class": "col_itm",
"target": "ts_tbl"
},
{
"table": "inputevents_cv",
"val_var": "cgid",
"class": "col_itm",
"target": "ts_tbl"
},
{
"table": "inputevents_mv",
"val_var": "cgid",
"class": "col_itm",
"target": "ts_tbl"
},
{
"table": "noteevents",
"val_var": "cgid",
"class": "col_itm",
"target": "ts_tbl"
},
{
"table": "outputevents",
"val_var": "cgid",
"class": "col_itm",
"target": "ts_tbl"
},
{
"table": "procedureevents_mv",
"val_var": "cgid",
"class": "col_itm",
"target": "ts_tbl"
}
],
"miiv": [
{
"table": "chartevents",
"val_var": "caregiver_id",
"class": "col_itm",
"target": "ts_tbl"
},
{
"table": "datetimeevents",
"val_var": "caregiver_id",
"class": "col_itm",
"target": "ts_tbl"
},
{
"table": "ingredientevents",
"val_var": "caregiver_id",
"class": "col_itm",
"target": "ts_tbl"
},
{
"table": "inputevents",
"val_var": "caregiver_id",
"class": "col_itm",
"target": "ts_tbl"
},
{
"table": "outputevents",
"val_var": "caregiver_id",
"class": "col_itm",
"target": "ts_tbl"
},
{
"table": "procedureevents",
"val_var": "caregiver_id",
"class": "col_itm",
"target": "ts_tbl"
}
]
}
}, I load this as follows caregiver = ricu::load_concepts("caregiver", src, aggregate = unique) It's not super fast, but it works. |
Problem
concept-dict.json
allows to specify a standard aggregation for concepts. However, this currently only works for functions that are known todt_gforce
, as aggregation functions specified as strings are directly passed on todt_gforce
(see also #36).Solution
We could simply check for any function that is known to
dt_gforce
and pass those on. If we get another function, we try to parse it.The text was updated successfully, but these errors were encountered: