-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revamp IO #72
Revamp IO #72
Conversation
return ak_to_series(ds, backend, extract=extract) | ||
|
||
|
||
def _merge(ind1, ind2, builder): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jpivarski , is it expected that including a builder should make this not cacheable by njit?
I was thinking about making an offsets array, but we don't know how big the values array will be, as some IDs in ind2 may not exist in ind1 - we only have an upper bound. Perhaps a starts/stops pair would be fine for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've never tested cacheability of Numba functions to take any kind of Awkward arguments. (You're not talking about a closure, right? Numba closures have to include the objects as constants.) In principle, there's no reason why a function with a ArrayBuilder argument couldn't be cached, unless maybe it has some associated behavior that is not serializable. Behavior dicts are part of an Awkward argument's type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the only usage I have, the builder is pristine at the start of the function. For ak arrays I've passed haven't had any attributes of behaviours. The specific message is
NumbaWarning: Cannot cache compiled function "_merge" as it uses dynamic globals (such as ctypes pointers and large global arrays)
The dask readers and future cuDF readers will not fit into the refactoring here and need separate loading functions.