[FEA] Make cudf::ast::operation own its subexpressions #10744

vyasr · 2022-04-26T20:58:28Z

Is your feature request related to a problem? Please describe.
Currently users are responsible for keeping all subexpressions of an expression alive. In particular, operations do not own their operands, which are stored as a std::vector<std::reference_wrapper<expression const>> const. This approach is cumbersome because it forces client code to keep track of all previously created expressions to avoid their premature destruction.

Describe the solution you'd like
Instead of storing a vector of references to its operands, an operation should instead store a vector of unique pointers to copies of the operands. While this change would create duplicate expressions, they are cheap to carry around, and this change would simplify the management of expression lifetimes in all client code.

Describe alternatives you've considered
We could use std::shared_ptr instead of using std::unique_ptr, which would allow subexpression reuse. Reuse isn't that important here though because expressions are relatively small and only use host memory. The primary goal is to simplify the management of expression objects, which is simpler with unique_ptr than shared_ptr.

We could also retain the status quo, which is not unworkable, just more complex and less intuitive than necessary. The fact that higher-level clients built on top of libcudf likely to build different ownerships models into their wrappers of expression APIs indicates that the libcudf approach is not the friendliest API for programmatic expression construction.

Additional context
This change would not change that aliteral only stores a reference to the underlying scalar. Creating a scalar involves a device memory allocation, an expensive operation that we should not hide from a user. Additionally, it is reasonably straightforward for client code to wrap a literal in some owning container that handles scalar allocation if needed, whereas maintaining all subexpressions of potentially arbitrarily complex expressions is a significantly more onerous task.

The text was updated successfully, but these errors were encountered:

vyasr · 2022-04-26T20:59:40Z

@jlowe I believe that you were interested in something like this at one point as well. We had higher priorities and didn't consider addressing it at the time, but I think this would help simplify some of the JNI wrappers right?

jlowe · 2022-04-26T22:10:24Z

I believe that you were interested in something like this at one point as well.

Yes, it would simplify things a bit since the JNI code explicitly tracks all the subexpressions and deletes them when the parent expression is finally deleted. The Java code does a full translation from a Java tree to a native tree in an explicit compile() step, so nothing in the Java layer should need to change. It should only be a change in the JNI native code layer that explicitly tracks the AST sub-expressions. Now that it's coded and working, not a high priority from my perspective. 😄

github-actions · 2022-05-26T23:03:08Z

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions · 2022-09-27T00:19:20Z

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

This merge request follows up on #10744. It attempts to simplify managing expressions by adding a class called an ast tree. The ast tree manages and holds related expressions together. When the tree is destroyed, all the expressions are also destroyed. Ideally we would use a bump allocator for allocating the expressions instead of `std::vector<std::unique_ptr<expression>>`. We'd also ideally use a `cuda::std::inplace_vector` for storing the operands of the `operation` class, but that's in a newer version of CCCL. Authors: - Basit Ayantunde (https://github.com/lamarrr) - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Lawrence Mitchell (https://github.com/wence-) - Bradley Dice (https://github.com/bdice) - Karthikeyan (https://github.com/karthikeyann) URL: #17156

vyasr · 2024-11-13T04:52:24Z

We decided in #17156 to go a different direction. Rather than having operations owning subexpressions, we introduced a new tree class that manages all expressions that it owns. That approach was chosen instead of owning subexpressions to avoid the recursive copying that would be otherwise introduced by having operations own their subexpressions if you wanted to support assignment or similar operations where an expression could get added to multiple operations, in which case each operation would have to copy the subexpression.

@lamarrr please let me know if I've missed anything in my summary or if there's any reason to reopen this issue, but I'm going to close this as resolved by #17156 for now.

vyasr added feature request New feature or request Needs Triage Need team to review and classify labels Apr 26, 2022

vyasr added this to the Conditional Joins milestone Apr 26, 2022

vyasr mentioned this issue Apr 26, 2022

Implement DataFrame.eval using libcudf ASTs #8022

Merged

github-actions bot added the inactive-30d label May 26, 2022

GregoryKimball added libcudf Affects libcudf (C++/CUDA) code. proposal Change current process or code code quality and removed Needs Triage Need team to review and classify feature request New feature or request labels Jun 28, 2022

github-actions bot added the inactive-90d label Sep 27, 2022

GregoryKimball modified the milestones: Conditional Joins, Expression evaluation Oct 24, 2022

GregoryKimball removed the inactive-90d label Apr 3, 2023

GregoryKimball added 0 - Backlog In queue waiting for assignment and removed inactive-30d labels Feb 12, 2024

vyasr removed the code quality label Feb 23, 2024

vyasr mentioned this issue May 7, 2024

[QST] Recusively Generating AST Expressions (C++ libcudf) #15653

Closed

lamarrr mentioned this issue Oct 23, 2024

Added ast tree to simplify expression lifetime management #17156

Merged

3 tasks

vyasr closed this as completed Nov 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Make cudf::ast::operation own its subexpressions #10744

[FEA] Make cudf::ast::operation own its subexpressions #10744

vyasr commented Apr 26, 2022

vyasr commented Apr 26, 2022

jlowe commented Apr 26, 2022 •

edited

Loading

github-actions bot commented May 26, 2022

github-actions bot commented Sep 27, 2022

vyasr commented Nov 13, 2024

[FEA] Make cudf::ast::operation own its subexpressions #10744

[FEA] Make cudf::ast::operation own its subexpressions #10744

Comments

vyasr commented Apr 26, 2022

vyasr commented Apr 26, 2022

jlowe commented Apr 26, 2022 • edited Loading

github-actions bot commented May 26, 2022

github-actions bot commented Sep 27, 2022

vyasr commented Nov 13, 2024

jlowe commented Apr 26, 2022 •

edited

Loading