Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revamp IO #72

Merged
merged 9 commits into from
Jul 26, 2024
Merged

Revamp IO #72

merged 9 commits into from
Jul 26, 2024

Conversation

martindurant
Copy link
Member

@martindurant martindurant commented Jul 25, 2024

  • Make IO functions work with multiple backends
  • add AVRO reader
  • add AVRO, JSON and Parquet schema-getters
  • add DB table merge

The dask readers and future cuDF readers will not fit into the refactoring here and need separate loading functions.

return ak_to_series(ds, backend, extract=extract)


def _merge(ind1, ind2, builder):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpivarski , is it expected that including a builder should make this not cacheable by njit?

I was thinking about making an offsets array, but we don't know how big the values array will be, as some IDs in ind2 may not exist in ind1 - we only have an upper bound. Perhaps a starts/stops pair would be fine for this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've never tested cacheability of Numba functions to take any kind of Awkward arguments. (You're not talking about a closure, right? Numba closures have to include the objects as constants.) In principle, there's no reason why a function with a ArrayBuilder argument couldn't be cached, unless maybe it has some associated behavior that is not serializable. Behavior dicts are part of an Awkward argument's type.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the only usage I have, the builder is pristine at the start of the function. For ak arrays I've passed haven't had any attributes of behaviours. The specific message is

NumbaWarning: Cannot cache compiled function "_merge" as it uses dynamic globals (such as ctypes pointers and large global arrays)

@martindurant martindurant marked this pull request as ready for review July 26, 2024 17:59
@martindurant martindurant merged commit d1d7e26 into intake:main Jul 26, 2024
12 checks passed
@martindurant martindurant deleted the io branch July 26, 2024 18:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants