-
Notifications
You must be signed in to change notification settings - Fork 903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Make cudf::ast::operation own its subexpressions #10744
Comments
@jlowe I believe that you were interested in something like this at one point as well. We had higher priorities and didn't consider addressing it at the time, but I think this would help simplify some of the JNI wrappers right? |
Yes, it would simplify things a bit since the JNI code explicitly tracks all the subexpressions and deletes them when the parent expression is finally deleted. The Java code does a full translation from a Java tree to a native tree in an explicit |
This issue has been labeled |
This issue has been labeled |
This merge request follows up on #10744. It attempts to simplify managing expressions by adding a class called an ast tree. The ast tree manages and holds related expressions together. When the tree is destroyed, all the expressions are also destroyed. Ideally we would use a bump allocator for allocating the expressions instead of `std::vector<std::unique_ptr<expression>>`. We'd also ideally use a `cuda::std::inplace_vector` for storing the operands of the `operation` class, but that's in a newer version of CCCL. Authors: - Basit Ayantunde (https://github.com/lamarrr) - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Lawrence Mitchell (https://github.com/wence-) - Bradley Dice (https://github.com/bdice) - Karthikeyan (https://github.com/karthikeyann) URL: #17156
We decided in #17156 to go a different direction. Rather than having operations owning subexpressions, we introduced a new tree class that manages all expressions that it owns. That approach was chosen instead of owning subexpressions to avoid the recursive copying that would be otherwise introduced by having operations own their subexpressions if you wanted to support assignment or similar operations where an expression could get added to multiple operations, in which case each operation would have to copy the subexpression. @lamarrr please let me know if I've missed anything in my summary or if there's any reason to reopen this issue, but I'm going to close this as resolved by #17156 for now. |
Is your feature request related to a problem? Please describe.
Currently users are responsible for keeping all subexpressions of an
expression
alive. In particular,operation
s do not own their operands, which are stored as astd::vector<std::reference_wrapper<expression const>> const
. This approach is cumbersome because it forces client code to keep track of all previously created expressions to avoid their premature destruction.Describe the solution you'd like
Instead of storing a vector of references to its operands, an
operation
should instead store a vector of unique pointers to copies of the operands. While this change would create duplicateexpression
s, they are cheap to carry around, and this change would simplify the management ofexpression
lifetimes in all client code.Describe alternatives you've considered
We could use
std::shared_ptr
instead of usingstd::unique_ptr
, which would allow subexpression reuse. Reuse isn't that important here though becauseexpression
s are relatively small and only use host memory. The primary goal is to simplify the management ofexpression
objects, which is simpler withunique_ptr
thanshared_ptr
.We could also retain the status quo, which is not unworkable, just more complex and less intuitive than necessary. The fact that higher-level clients built on top of libcudf likely to build different ownerships models into their wrappers of expression APIs indicates that the libcudf approach is not the friendliest API for programmatic expression construction.
Additional context
This change would not change that a
literal
only stores a reference to the underlyingscalar
. Creating ascalar
involves a device memory allocation, an expensive operation that we should not hide from a user. Additionally, it is reasonably straightforward for client code to wrap a literal in some owning container that handles scalar allocation if needed, whereas maintaining all subexpressions of potentially arbitrarily complex expressions is a significantly more onerous task.The text was updated successfully, but these errors were encountered: