-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UCXX: C++ implementation #913
Comments
Would some/all of this make sense to add to UCX itself? Noticing that is not called out on this list above (unless I've missed it). So also wondering if there is a reason not to |
Yes, this is part of the plan, it has been for at least 2-3 years, but partly my fault that it did not happen yet. Eventually we hope all of UCX-Py/UCXX to move into the upstream UCX repo. |
Not at all. This is impressive work! Not surprised it took some time to get here Ok so this is envisioned as a staging location before upstreaming everything? That makes sense I'd lightly suggest keeping things in one repo as you have done (just from an ergonomic perspective), but would defer to the judgement of yourself and others here |
Took a very quick initial glance and this looks very exciting. Awesome work @pentschev! :partykirby: One thing that stuck out to me thus far is that Python enablement has leaked into the C++ library, albeit with a compile-time option to avoid it. That being said, this is going to become annoying when it comes to packaging and deployment. For example, if I'm a pure C++ library wanting to use this I almost certainly don't want to take on a Python dependency so I'd want a build of UCXX that isn't dependent on the Python pieces. Then say someone wants to use something else in the same environment that is dependent on the Python bits. In theory, we could use typical conda shenanigans of having different package variants with build strings and something like It also looks like there's pieces of the code that change the ABI depending on the Python compile-time flag that would probably need to be reworked. For example:
I wonder if it would be possible to restructure the code to have the Python enablement in C++ be a separate library that depends on the "core" C++ library. |
@jakirkham the reason I'm not really fond of this is the duplication that I had to resort to and the mess with the commit history this has caused. As a personal preference, I would make this a new repo, but I also understand there are reasons why people would prefer to do as you say. |
@kkraus14 thank you for taking the time to look and comment here. All your comments are valid and sensible to me. The potential for ABI breakage is something I'm not particularly happy with in the current implementation, there are so many things to improve on that front as you noted above. The way I was thinking of resolving both the Python ABI breakage and the C++ dependency on the C++ was to follow more or less the concept that UCX is applying, |
To make the structs ABI compatible with/without python, one option would just be to store all the python-related data in an opaque struct and define the (say) worker struct as: class Worker : public Component {
private:
...
void *_data{nullptr};
} And then when python is enabled you have (syntax probably wrong, etc...):
That might also generically be useful if third-party callers of the C++ implementation need somewhere to hang their application data. |
I overall like what I'm seeing. Though I'm a bit lost on the request lifecycle. Would it be possible to move the delayed submission parts to a Python helper? I remember the UCF presentation and about how deferring operations for a bit before dropping the GIL was a big win, but it seems to add a lot of mental overhead to the core C++ library. |
Delayed submission is not an exclusive Python feature, you can use it the same way for C++. Effectively it moves any transfer calls, e.g. |
Okay, my main concern is that I want to make a model where there is a worker thread that manages outstanding requests (where |
This is no problem, if you set
I guess this is mostly a matter of getting people "sign off" on this. Once we are somewhat confident that major issues have been addressed, we could release. The original intent was to have this ready for RAPIDS 23.02 (release expected for February 9th), but it's a lot of code and I don't know if everyone will have enough time to do that. In any case we could still release as an experimental piece of code I guess, not sure if any packaging would be available (such as cc @quasiben for the release discussion above. |
Just a note on packaging, this probably makes more sense as a |
A conda package would be ideal, but we could always take the code and produce our own internal package for now until it's ready to be published in the rapids channels. |
@kkraus14 @MattBBaker @jakirkham I did some changes to split the shared libraries and |
@pentschev I took a quick look and other than the Only thought that crossed my mind is that if you could use the |
@kkraus14 this is now done in https://github.com/rapidsai/ucx-py/compare/99e911b..2b563ea . We now have a |
Took a quick pass and this looks much cleaner from my perspective. There's still a few dangling things that I think could be cleaned up:
|
In general I agree with your comments @kkraus14 , the plan is that we make UCXX public for 23.04 (this time for real) in experimental status, so I want to prioritize addressing those that are potentially a blocker for distribution and for your usage as well. In that sense, I think most of those comments would be in the nice-to-have category for 23.04, and then potentially addressed in 23.06. I'm responding to the individual items below, but please let me know if any of them is a must-have for 23.04.
Right now yes, but the plan is to add C++ future support as well (thus I changed it from
I am not sure this is possible to be done the way things are currently implemented. Passing the
Indeed this is not currently necessary, I decided to leave it for now though because it may still be useful to check from C++ whether the Python library is available. That should anyway not have any Python dependencies and harmless in that sense.
This is definitely an oversight by my part, thanks for bringing it up, I'll address it shortly.
Agreed, this would be a nice separation too and we should work on it as well. |
We have made the decision to have the UCXX codebase in a new repo which is publicly available now: https://github.com/rapidsai/ucxx . @kkraus14 I think from the C++/Python separation ask we have resolved all issues you've raised, including moving the Python dependency as a CMake component with its own Also feel free to file issues/submit PRs in the new repo! |
Should we close this issue in favor of new focused issues/PRs on the new repo? |
For a while I have been working and discussing internally a rewrite of UCX-Py, splitting the hardcore logic into a C++ backend, with a (much) thinner Cython layer, and a Python async layer, including also some new Python optimizations. This new library is currently named UCXX and has now been merged into the UCX-Py repository under a separate branch with a
ucxx/
subdirectory.We have not yet decided on a permanent place for this library to exist, since its intent is to replace the current UCX-Py implementation, some of the options we have are:
ucxx/
subdirectory permanently;ucxx/
into the root directory of this repository;I hope the few instructions available following the link above should suffice to get people started building/testing the new implementation.
At this time we would like to invite people to try it out and review the code. The new implementation is definitely not complete nor bug-free, so any and all help is welcome.
A non-exhaustive, and in no particular order, list of features/issues that make this implementation incomplete is below:
I appreciate everyone's help, and pinging some of the usual suspects who may be interested: @quasiben @wence- @madsbk @MattBBaker
The text was updated successfully, but these errors were encountered: