Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introducing Rust code into Biotite #688

Open
padix-key opened this issue Oct 27, 2024 · 3 comments
Open

Introducing Rust code into Biotite #688

padix-key opened this issue Oct 27, 2024 · 3 comments

Comments

@padix-key
Copy link
Member

padix-key commented Oct 27, 2024

To achieve high performance where vectorization with NumPy is not possible, Biotite currently uses Cython code. However, there are some limitations in Cython:

  • Apart from fused types Cython does not support generics
  • Separation into multiple modules can be a bit cumbersome
  • Fast string operations are basically impossible unless one wants to use rather unsafe ASCII-only C-strings
  • To get a very high performance, safe guards need to be disabled leading to potential memory issues (out-of-bounds, leaks, etc.)
  • In my opinion Rust is much cleaner than Cython, when low-level, typed operations are involved.

Hence, this issue should initiate the discussion if Rust code using PyO3 should be allowed in Biotite, as it has become quite mature in recent years. This would address all the issues mentioned above. More specifically, these are the places in the code base where the limitations become quite clear:

  • PDBFile: There already is fastpdb written in Rust, but it needs to be maintained separately in the moment.
  • connect_via_residue_names(): The Python string operations in this Cython function makes it quite slow and it is actually the bottleneck in pdbx.get_structure(), when the input is a BinaryCIFFile.
  • Line parsing in CIFFile: It has been addressed multiple times now to make it faster (Handle embedded quote in mmcif #619, Update _split_one_line and remove whitespace parameter #686), but this short function is still the bottleneck when reading CIF files.

Probably even more places in Biotite would benefit from routines written in Rust.
However there would also be a few disadvantages:

  • Cython is easier to learn than Rust, as it is close to Python.
  • Development would become more complex, as there would be 3 programming languages (Python, Cython, Rust), as long as Cython code still exists in the code base.
  • The build process would become more complex as the Rust compiler needs to be involved, but this should not change the user experience.
  • People cannot install Biotite from a source distribution, if they do not have a Rust compiler installed (will hopefully be solved by Implement new sdist feature to download rust toolchain PyO3/maturin#2177).

I lean towards accepting Rust in Biotite (otherwise I would not have opened this issue 😉), but I really like to hear your opinion about this @t0mdavid-m @JHKru @MaxGreil and other contributors/users with an opinion on this topic.

@t0mdavid-m
Copy link
Member

I am not worried about the increased complexity of another programming language. Speed is a major selling point of Biotite - therefore moving from Cython to C++/Rust bindings seems to be a logical next step.

When deciding between C++ and Rust, given we do not have preexisting legacy code in either langugage, Rust, due to its memory safety, seems to be the logical choice.

I think keeping the build complexity for all user groups low is also quite important. However, PyO3/maturin#2177 seems to be quite active. So given that it is merged, I would vote for adding Rust code to Biotite.

@JHKru
Copy link
Member

JHKru commented Oct 29, 2024

Hi, I haven't contributed anything on the Rust side so far, but the arguments
for accepting Rust sound convincing to me. :)

Do you plan to still accept Cython in new contributions in the long-term?
If feasible maintenance-wise, I'd be in favour of accepting both Cython/Rust, to have more options
for new contributors and cases for which Cython is not too limiting.

@padix-key
Copy link
Member Author

Do you plan to still accept Cython in new contributions in the long-term?

Yes, at least for the foreseeable future. However, I think @JacobAnter and me were the only ones contributing Cython code to the project so far. So realistically, if at some point most Cython code would be migrated to Rust code, we could discuss again, whether we discourage new Cython modules.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants