Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rolling out uwot's C++ code as a header-only library #80

Open
LTLA opened this issue Aug 5, 2021 · 10 comments
Open

Rolling out uwot's C++ code as a header-only library #80

LTLA opened this issue Aug 5, 2021 · 10 comments

Comments

@LTLA
Copy link
Contributor

LTLA commented Aug 5, 2021

I wonder whether this would be of interest; to squeeze out the C++ code in here to a separate header-only library, in much the same way that https://github.com/LTLA/CppIrlba contains the relevant contents of irlba. Mostly so that I can use it for other applications without the challenge of dragging in R (or Python) runtimes. And then you could chuck the library into inst/include and we would be able to share single implementation with relative ease.

I was planning to give it a go on the weekend. Will need to strip out all the Rcpp stuff, I don't know how pervasive that is. Will also need to add a "no-parallel" option that avoids any calls to <thread> as my target system's support for that is kinda wonky.

@jlmelville
Copy link
Owner

To the extent it's possible, that's a good idea (I subsequently did a better job of keeping R-specifics separate in rnndescent). This is something I thought about doing at some point in the future, but the main reason I never took it more seriously is because the pure C++ parts aren't very useful on their own (also I don't know any CMake). The nearest neighbor calculations and initialization all have to be provided separately, so you're really just getting the optimization bit. If that's useful to you, I'm happy to provide what assistance I can.

@LTLA
Copy link
Contributor Author

LTLA commented Aug 6, 2021

Yep, I was going to supply the NN's myself (https://github.com/LTLA/knncolle). On a tangentially related note, it would be nice to make a pure C++ port of nndescent available from that interface. Would be happy to help out there if you're interested.

The initialization... is within the realm of feasibility. Could link to Spectra, or could modify CppIrlba to handle smallest = TRUE. Not quite sure which one is less work - will have to try it out. Was there a reason for the use of Spectra as the default?

Anyway, testing out the initialization is probably a solid weekend project on my side. If you have the bandwidth, maybe you could reorganize the stuff across src/ and inst/include to create a pure C++ interface to your optimization code. Then we might eventually be able to plug and play with all the three components (NN, init, optim).

@jlmelville
Copy link
Owner

Yep, I was going to supply the NN's myself (https://github.com/LTLA/knncolle). On a tangentially related note, it would be nice to make a pure C++ port of nndescent available from that interface. Would be happy to help out there if you're interested.

That can happen too... eventually.

The initialization... is within the realm of feasibility. Could link to Spectra, or could modify CppIrlba to handle smallest = TRUE. Not quite sure which one is less work - will have to try it out. Was there a reason for the use of Spectra as the default?

At the time, the irlba partial_eigen was described as "somewhat experimental" and in practice was a lot slower than using RSpectra. Maybe that's changed now.

Anyway, testing out the initialization is probably a solid weekend project on my side. If you have the bandwidth, maybe you could reorganize the stuff across src/ and inst/include to create a pure C++ interface to your optimization code. Then we might eventually be able to plug and play with all the three components (NN, init, optim).

Not sure about timelines but I will start taking a look and see if this seems achievable in some reasonable amount of time or if it's going to reveal some larger structural changes will be required.

@jlmelville
Copy link
Owner

I failed to make any progress this weekend, but it is closer to the top of my to-do pile.

@LTLA
Copy link
Contributor Author

LTLA commented Aug 9, 2021

No worries. I also failed to make any progress as well, got distracted by https://github.com/LTLA/qdtsne.

@LTLA
Copy link
Contributor Author

LTLA commented Aug 16, 2021

Made a start on the initialization: https://github.com/LTLA/umappp.

The most that can be said right now is that it compiles and runs.

@jlmelville
Copy link
Owner

Sorry I have made zero contributions to this so far. I was traveling for the last two weeks and had little to no internet access.

@LTLA
Copy link
Contributor Author

LTLA commented Sep 12, 2021

No problems whatsoever - it is, in fact, already done! The code in uwot's inst/include was easier to read than I thought, so it was fairly straightforward to get what I needed. Check it out:

demo

Close enough, I'd say. I know we're identical up to the optimization, so I'm guessing that the differences are due to our different PRNGs - I'm using std::mt19937_64 to avoid the need to manage another dependency.

I didn't add any of the other bells and whistles, e.g., no support for supervised training, no support for tumap or largevis. I don't need them personally, but I could work on that if we wanted to turn uwot into an R wrapper around a fully-featured C++ library. Interested to hear your thoughts here - I don't mind either way.

In the meantime, I'll post a few more issues on things I discovered along the way.

@jlmelville
Copy link
Owner

Close enough, I'd say. I know we're identical up to the optimization, so I'm guessing that the differences are due to our different PRNGs - I'm using std::mt19937_64 to avoid the need to manage another dependency.

Does changing the PRNG away from the Tausworthe88 have an effect on the speed?

I didn't add any of the other bells and whistles, e.g., no support for supervised training, no support for tumap or largevis. I don't need them personally, but I could work on that if we wanted to turn uwot into an R wrapper around a fully-featured C++ library. Interested to hear your thoughts here - I don't mind either way.

It would be a shame to not use UMAPPP if possible. I wouldn't want to weigh it down with features that not many people care about (I have zero idea if anyone makes use of tumap or largevis), but I also don't want to keep track of two separate but similar C++ code bases (although they haven't actually changed very much).

@PedroMilanezAlmeida
Copy link

tumap is great 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants