Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ak.from_parquet slower than pa.parquet.read_table + ak.from_arrow #3151

Open
lgray opened this issue Jun 13, 2024 · 5 comments
Open

ak.from_parquet slower than pa.parquet.read_table + ak.from_arrow #3151

lgray opened this issue Jun 13, 2024 · 5 comments
Labels
performance Works, but not fast enough or uses too much memory

Comments

@lgray
Copy link
Contributor

lgray commented Jun 13, 2024

Version of Awkward Array

2.6.5

Description and code to reproduce

In benchmarking GPU resources I ran into a curious performance difference in trying to compare CPU based reads with arrow to GPU-DMA reads via cudf.

image

Is this expected? A factor of two, coming only from reading (all other bits of code are the same) seems like performance left on the floor.

@lgray lgray added the performance Works, but not fast enough or uses too much memory label Jun 13, 2024
@jpivarski
Copy link
Member

pa.parquet.read_table is a high-level convenience method that reads everything from the file. We use pa.parquet.ParquetFile to possibly select columns and row groups. But, after that, we just feed the result to ak.from_arrow. That's the only difference between the two procedures.

But is the time difference really seen in the pa.parquet.read_tableak.from_arrow versus ak.from_parquet itself? These Jupyter cells also include copies to and from the GPU, JIT-compilation of the **2 function, and stuff that might not be the same. In fact, if these two cells are from the same process and they were executed in the order shown above, then **2 gets compiled in the first one and not the second one, which could easily account for a few seconds (especially if it's the first thing to be compiled, as it has to warm up the compilation machinery).

@lgray
Copy link
Contributor Author

lgray commented Jun 13, 2024

The device is not occupied, but to sate your skepticism:
image

@lgray
Copy link
Contributor Author

lgray commented Jun 13, 2024

The **2 was compiled much earlier, FWIW.

@jpivarski
Copy link
Member

Okay. If the speed difference persists after replacing pa.parquet.read_table with pa.parquet.ParquetFile.read_row_groups, then there is something in the Awkward code that's impeding performance, because the Awkward code is supposed to be just pa.parquet.ParquetFile.read_row_groups followed by ak.from_arrow.

@lgray
Copy link
Contributor Author

lgray commented Jun 13, 2024

A size-able fraction of the time, but not all of it.
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Works, but not fast enough or uses too much memory
Projects
None yet
Development

No branches or pull requests

2 participants