Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERFORMANCE: Search large X:s for globals is slow [SOLVED] #13

Open
HenrikBengtsson opened this issue Jun 21, 2018 · 3 comments
Open

PERFORMANCE: Search large X:s for globals is slow [SOLVED] #13

HenrikBengtsson opened this issue Jun 21, 2018 · 3 comments

Comments

@HenrikBengtsson
Copy link
Collaborator

HenrikBengtsson commented Jun 21, 2018

In future.apply 1.0.0, future_lapply(X, ...) searches also X for possible globals (Issue #12). For long X:s this introduces a significant overhead, especially if X does not contain any globals and we wouldn't have to search X in the first place. For example,

X <- vector("list", length = 100e3)
y <- future_lapply(X, FUN = identity)

All the slowness comes from an internal:

gp <- future::getGlobalsAndPackages(X, globals = TRUE)

Following the code, this is slow because

names <- globals::findGlobals(X)

is slow, which in turn is because it effetively does:

names <- lapply(X, FUN = globals::findGlobals)

We might be able to speed up globals::findGlobals() a bit here, but don't know how much. [UPDATE 2018-06-20]: there was a low-hanging fruit in the globals package making it possible to speed this up lots, cf. futureverse/globals@566e3e9. I'll be running revdep checks on globals (first and and second generation dependencies) to make sure this doesn't break anything. If all ok, the need for working around this in future.apply is much smaller.

Regardless, there could be a need for an argument controlling whether X should be searched for globals or not, especially since it is likely that in most use cases X does not have globals.

HenrikBengtsson added a commit to futureverse/globals that referenced this issue Jun 21, 2018
@HenrikBengtsson HenrikBengtsson changed the title PERFORMANCE: Search large X:s for globals is slow PERFORMANCE: Search large X:s for globals is slow [SOLVED] Jun 21, 2018
@HenrikBengtsson
Copy link
Collaborator Author

I've updated globals::findGlobals() to be much faster in these, most commonly used, cases(*). I've made sure it does not break any of the 29 reverse package dependencies (first and second generation). Install this new version as:

remotes::install_github("HenrikBengtsson/globals@develop")

(*) There will still be cases where this improvement won't help, e.g. long lists which in turn contains lists.

@DavisVaughan
Copy link

DavisVaughan commented Jun 21, 2018

This is definitely helpful. I was just doing some testing iterating over rsample cross-validation objects (lists containing data frames and other structures deep inside). Before the fix, searching X was adding 7-8 seconds onto the total time (which was only 13 seconds to start with). Post fix, negligible time added.

Any timeline on an updated globals CRAN release?

@HenrikBengtsson
Copy link
Collaborator Author

globals 0.12.1 hit CRAN ~10 hours ago.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants