-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add substrait.proto convenience module and document it #50
Conversation
ACTION NEEDED Substrait follows the Conventional Commits The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks @amol- this is already much cleaner for the user :)
@@ -12,3 +12,21 @@ def test_imports(): | |||
from substrait.gen.proto.type_expressions_pb2 import DerivationExpression | |||
from substrait.gen.proto.type_pb2 import Type | |||
from substrait.gen.proto.extensions.extensions_pb2 import SimpleExtensionURI | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can go ahead and delete the test_imports()
function above since the new test indirectly tests these imports.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is a value in keeping this test to ensure backward compatibility.
substrait.proto
might continue to work but import the classes from modules with different names than the current ones.
In such case we would be breaking backward compatibility for anyone directly accessing the classes from the _pb2
modules and that's something we want to catch and be aware of.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good overall -- two small suggestions for the tests.
So the purpose of this PR is to get the protos exposed with the simpler package name without changing the fact that they are registered with the existing one? One reason the current package name is used is to prevent conflicts when the C++ version of the protos is also loaded under ::substrait::proto. We should probably document this desire along with the fact that we aren't simply changing what package we generate the protos in. |
Co-authored-by: Gil Forsyth <gforsyth@users.noreply.github.com>
Correct, I wanted to retain backward compatibility for anyone importing the classes from the autogenerated modules, but also make the automatic generation an implementation detail.
Can you elaborate? How does the C++ library namespace influence the Python namespace? |
Depending on your options you can actually use C++ protos for the Python under the hood. But ignoring that you can still end up with two definitions for protobufs which will collide and cause an exception when the module loads. If they are in different namespaces that collision won't happen (you can pretend to have two different sets of protos). On a side note, I have been having issues with the current protos (lint and IDE) because the attributes aren't available for the package until runtime. The solution I used for the spark connect protos was to use mypy-protobuf (add --mypy_out to the protoc call in gen_proto.sh) which created pyi files which also would need to be checked in. Would this approach work here? |
I don't think that renaming / aliasing the generated python functions will mess with the C++ descriptor pool, I believe it's only what is set as the proto namespace at code-generation time, but also I haven't messed with that particular intersection in a while. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @amol!
Add a
substrait.proto
module that gives access to the Substrait protocol classesremoving the need to navigate the hierarchy automatically generated by protobuf.
It also provides access to the modules without the
_pb2
suffixwhich is an implementation detail of the protobuf version used.
Provides examples on how to generate and read back Substrait plans
using the substrait-python module itself.