Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reorganization proposal #419

Open
tjstum opened this issue Jan 26, 2024 · 6 comments
Open

Reorganization proposal #419

tjstum opened this issue Jan 26, 2024 · 6 comments

Comments

@tjstum
Copy link

tjstum commented Jan 26, 2024

Hi there,

We are consider using datasketches in our C++ environment. The current organization, where there a bunch of folders, each with an include/ directory, is a bit awkward for integrating into a foreign build system. Projects like pybind11 put all of the headers in a single include/ directory (perhaps with subdirectories underneath). Then there is just a single path to add as an include path.

Would you be open to a reorganization that moved around files to look a bit like this:

include/datasketches
  count/
    count_min.hpp
    count_min_impl.hpp
  common/
    MurmurHash3.h
    ...

Thanks for your consideration.

@jmalkin
Copy link
Contributor

jmalkin commented Jan 26, 2024

Interesting. I believe we install everything into a single datasketches directory, so we'd assumed the library would typically be installed. You're probable not doing that?

Certainly not trying to tell you how to develop your project, just trying to understand what people do in practice. We don't get much feedback about that.

I don't think I have an inherent objection to this. Will try to look into it in the near term.

@tjstum
Copy link
Author

tjstum commented Jan 26, 2024

Happy to send a PR for this as well.

Yeah, given that the project is (currently) header-only, there's not a huge benefit for us to install it instead of vendoring the files (which is an approach we've taken with other libraries that we plan to integrate with tightly). Currently, I've just flattened all the files into a single directory (basically what the build/install does).

@jmalkin
Copy link
Contributor

jmalkin commented Jan 26, 2024

If I try to add any new matrix sketches I think I'd try using Eigen (we have Frequent Directions in java for approximate SVD, for instance, but there may be other approaches that make that less interesting to port this many years later) which is also header-only. That's a combination of header-only considerations as well as playing nicely with our python wrapper (nanobind, which we adopted recently instead of pybind11).

@AlexanderSaydakov noted on Slack that this would be a major version change, though, and we did just do one of those. We do try not to do major version bumps toooooo frequently. So that'd probably be the main consideration.

@jmalkin
Copy link
Contributor

jmalkin commented May 3, 2024

@AlexanderSaydakov Several months later, I don't know if we want to revisit this yet?

@AlexanderSaydakov
Copy link
Contributor

This seems reasonable. I can see how fragmented "include" directory structure can complicate someone's life. However, if this is not urgent, I would just keep it in mind for the next major version.

@jmalkin
Copy link
Contributor

jmalkin commented Oct 23, 2024

@tjstum I know it's been a while but we haven't forgotten this. We discovered a change we merged recently does break the API slightly in some edge cases so we'll likely pick this up for the next release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants