Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add substrait.proto convenience module and document it #50

Merged
merged 3 commits into from
Mar 15, 2024

Conversation

amol-
Copy link
Contributor

@amol- amol- commented Mar 13, 2024

Add a substrait.proto module that gives access to the Substrait protocol classes
removing the need to navigate the hierarchy automatically generated by protobuf.
It also provides access to the modules without the _pb2 suffix
which is an implementation detail of the protobuf version used.

Provides examples on how to generate and read back Substrait plans
using the substrait-python module itself.

Copy link

ACTION NEEDED

Substrait follows the Conventional Commits
specification
for
release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

@amol- amol- changed the title feat: Add substrait.proto convenience module and document it feat: add substrait.proto convenience module and document it Mar 13, 2024
Copy link
Contributor

@danepitkin danepitkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @amol- this is already much cleaner for the user :)

@@ -12,3 +12,21 @@ def test_imports():
from substrait.gen.proto.type_expressions_pb2 import DerivationExpression
from substrait.gen.proto.type_pb2 import Type
from substrait.gen.proto.extensions.extensions_pb2 import SimpleExtensionURI

Copy link
Contributor

@danepitkin danepitkin Mar 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can go ahead and delete the test_imports() function above since the new test indirectly tests these imports.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a value in keeping this test to ensure backward compatibility.
substrait.proto might continue to work but import the classes from modules with different names than the current ones.

In such case we would be breaking backward compatibility for anyone directly accessing the classes from the _pb2 modules and that's something we want to catch and be aware of.

Copy link
Member

@gforsyth gforsyth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good overall -- two small suggestions for the tests.

tests/test_proto.py Outdated Show resolved Hide resolved
tests/test_proto.py Outdated Show resolved Hide resolved
@EpsilonPrime
Copy link
Member

So the purpose of this PR is to get the protos exposed with the simpler package name without changing the fact that they are registered with the existing one?

One reason the current package name is used is to prevent conflicts when the C++ version of the protos is also loaded under ::substrait::proto. We should probably document this desire along with the fact that we aren't simply changing what package we generate the protos in.

Co-authored-by: Gil Forsyth <gforsyth@users.noreply.github.com>
@amol-
Copy link
Contributor Author

amol- commented Mar 14, 2024

So the purpose of this PR is to get the protos exposed with the simpler package name without changing the fact that they are registered with the existing one?

Correct, I wanted to retain backward compatibility for anyone importing the classes from the autogenerated modules, but also make the automatic generation an implementation detail.

One reason the current package name is used is to prevent conflicts when the C++ version of the protos is also loaded under ::substrait::proto. We should probably document this desire along with the fact that we aren't simply changing what package we generate the protos in.

Can you elaborate? How does the C++ library namespace influence the Python namespace?

@EpsilonPrime
Copy link
Member

Depending on your options you can actually use C++ protos for the Python under the hood. But ignoring that you can still end up with two definitions for protobufs which will collide and cause an exception when the module loads. If they are in different namespaces that collision won't happen (you can pretend to have two different sets of protos).

On a side note, I have been having issues with the current protos (lint and IDE) because the attributes aren't available for the package until runtime. The solution I used for the spark connect protos was to use mypy-protobuf (add --mypy_out to the protoc call in gen_proto.sh) which created pyi files which also would need to be checked in. Would this approach work here?

@gforsyth
Copy link
Member

Depending on your options you can actually use C++ protos for the Python under the hood. But ignoring that you can still end up with two definitions for protobufs which will collide and cause an exception when the module loads. If they are in different namespaces that collision won't happen (you can pretend to have two different sets of protos).

I don't think that renaming / aliasing the generated python functions will mess with the C++ descriptor pool, I believe it's only what is set as the proto namespace at code-generation time, but also I haven't messed with that particular intersection in a while.

Copy link
Member

@gforsyth gforsyth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @amol!

@gforsyth gforsyth merged commit 18802c8 into substrait-io:main Mar 15, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants