Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Make cudf::ast::operation own its subexpressions #10744

Closed
vyasr opened this issue Apr 26, 2022 · 5 comments
Closed

[FEA] Make cudf::ast::operation own its subexpressions #10744

vyasr opened this issue Apr 26, 2022 · 5 comments
Labels
0 - Backlog In queue waiting for assignment libcudf Affects libcudf (C++/CUDA) code. proposal Change current process or code

Comments

@vyasr
Copy link
Contributor

vyasr commented Apr 26, 2022

Is your feature request related to a problem? Please describe.
Currently users are responsible for keeping all subexpressions of an expression alive. In particular, operations do not own their operands, which are stored as a std::vector<std::reference_wrapper<expression const>> const. This approach is cumbersome because it forces client code to keep track of all previously created expressions to avoid their premature destruction.

Describe the solution you'd like
Instead of storing a vector of references to its operands, an operation should instead store a vector of unique pointers to copies of the operands. While this change would create duplicate expressions, they are cheap to carry around, and this change would simplify the management of expression lifetimes in all client code.

Describe alternatives you've considered
We could use std::shared_ptr instead of using std::unique_ptr, which would allow subexpression reuse. Reuse isn't that important here though because expressions are relatively small and only use host memory. The primary goal is to simplify the management of expression objects, which is simpler with unique_ptr than shared_ptr.

We could also retain the status quo, which is not unworkable, just more complex and less intuitive than necessary. The fact that higher-level clients built on top of libcudf likely to build different ownerships models into their wrappers of expression APIs indicates that the libcudf approach is not the friendliest API for programmatic expression construction.

Additional context
This change would not change that aliteral only stores a reference to the underlying scalar. Creating a scalar involves a device memory allocation, an expensive operation that we should not hide from a user. Additionally, it is reasonably straightforward for client code to wrap a literal in some owning container that handles scalar allocation if needed, whereas maintaining all subexpressions of potentially arbitrarily complex expressions is a significantly more onerous task.

@vyasr vyasr added feature request New feature or request Needs Triage Need team to review and classify labels Apr 26, 2022
@vyasr
Copy link
Contributor Author

vyasr commented Apr 26, 2022

@jlowe I believe that you were interested in something like this at one point as well. We had higher priorities and didn't consider addressing it at the time, but I think this would help simplify some of the JNI wrappers right?

@jlowe
Copy link
Member

jlowe commented Apr 26, 2022

I believe that you were interested in something like this at one point as well.

Yes, it would simplify things a bit since the JNI code explicitly tracks all the subexpressions and deletes them when the parent expression is finally deleted. The Java code does a full translation from a Java tree to a native tree in an explicit compile() step, so nothing in the Java layer should need to change. It should only be a change in the JNI native code layer that explicitly tracks the AST sub-expressions. Now that it's coded and working, not a high priority from my perspective. 😄

@github-actions
Copy link

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

@GregoryKimball GregoryKimball added libcudf Affects libcudf (C++/CUDA) code. proposal Change current process or code code quality and removed Needs Triage Need team to review and classify feature request New feature or request labels Jun 28, 2022
@github-actions
Copy link

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

@GregoryKimball GregoryKimball added 0 - Backlog In queue waiting for assignment and removed inactive-30d labels Feb 12, 2024
rapids-bot bot pushed a commit that referenced this issue Nov 7, 2024
This merge request follows up on #10744.
It attempts to simplify managing expressions by adding a class called an ast tree. The ast tree manages and holds related expressions together. When the tree is destroyed, all the expressions are also destroyed. Ideally we would use a bump allocator for allocating the expressions instead of `std::vector<std::unique_ptr<expression>>`.

We'd also ideally use a `cuda::std::inplace_vector` for storing the operands of the `operation` class, but that's in a newer version of CCCL.

Authors:
  - Basit Ayantunde (https://github.com/lamarrr)
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Vyas Ramasubramani (https://github.com/vyasr)
  - Lawrence Mitchell (https://github.com/wence-)
  - Bradley Dice (https://github.com/bdice)
  - Karthikeyan (https://github.com/karthikeyann)

URL: #17156
@vyasr
Copy link
Contributor Author

vyasr commented Nov 13, 2024

We decided in #17156 to go a different direction. Rather than having operations owning subexpressions, we introduced a new tree class that manages all expressions that it owns. That approach was chosen instead of owning subexpressions to avoid the recursive copying that would be otherwise introduced by having operations own their subexpressions if you wanted to support assignment or similar operations where an expression could get added to multiple operations, in which case each operation would have to copy the subexpression.

@lamarrr please let me know if I've missed anything in my summary or if there's any reason to reopen this issue, but I'm going to close this as resolved by #17156 for now.

@vyasr vyasr closed this as completed Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0 - Backlog In queue waiting for assignment libcudf Affects libcudf (C++/CUDA) code. proposal Change current process or code
Projects
None yet
Development

No branches or pull requests

3 participants