-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve oneof
support
#40
base: master
Are you sure you want to change the base?
Conversation
Given a message: ``` message Foo { oneof timestamp { int64 created; int64 updated; } } ``` Previously the generated classes by crystal-protoc-gen would allow to set multiple values of the oneof: ``` foo.created = 123 foo.updated = 345 foo.created # => 123 foo.updated # => 345 ``` With this patch the sibling oneof-values will be cleared when one is set: ``` foo.created = 123 foo.updated = 345 foo.created # => nil foo.updated # => 345 ``` Also when deserializing a message that contains oneofs there was previously no way of knowing which oneof value is populated. With this patch we synthesize a new instance variable by the name of the oneof to contain the string-name of the member that is populated: ``` foo.created = 123 foo.updated = 345 foo.created # => nil foo.updated # => 345 foo.timestamp # => "updated" foo[foo.timestamp] # => 345 ```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's awesome that you're trying to fix this up! I've somehow never had to use oneof
before, so I'm a bit less familiar with it. I had a look at the docs and it looks like unsetting sibling values matches the behavior of the Google-owned libraries, so this is great.
Just some notes about details, which might require some changes, but the premise is great. Also, if you could add the .proto
files, that'd be awesome.
#require "protobuf" | ||
|
||
module TestMessagesProto2 | ||
|
||
struct TestOneof1 | ||
include Protobuf::Message | ||
|
||
contract do | ||
optional :a, :int32, 1 | ||
|
||
optional :oo0_a, :int32, 2, oneof_index: 0 | ||
optional :oo0_b, :int32, 3, oneof_index: 0 | ||
optional :oo0_c, :int32, 4, oneof_index: 0 | ||
|
||
optional :b, :int32, 5 | ||
|
||
optional :oo1_a, :int32, 6, oneof_index: 1 | ||
optional :oo1_b, :int32, 7, oneof_index: 1 | ||
optional :oo1_c, :int32, 8, oneof_index: 1 | ||
|
||
oneof 0, "oo0" | ||
oneof 1, "oo1" | ||
end | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you may have forgotten to commit the TestOneofMessage.proto
here. That'd be helpful to understand what all of the oo0_*
properties come from. 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, sorry, I wrote that above file by hand, didn't use an input proto.
(the layout of the fixtures wasn't entirely clear to me so I just did what worked 😅)
The corresponding proto file would look like this:
syntax = "proto3";
message TestOneof1 {
int32 a = 1;
oneof oo0 {
int32 oo0_a = 2;
int32 oo0_b = 3;
int32 oo0_c = 4;
}
int32 b = 5;
oneof oo1 {
int32 oo1_a = 6;
int32 oo1_b = 7;
int32 oo1_c = 8;
}
}
The goal was to have more than one oneof
and have it
interleaved with other fields for a meaningful test.
Lemme know if I should add that proto to the repo somewhere
(perhaps as a comment on top of TestOneofMessage.pb.cr
? 🤔).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, we'd have it there to verify that the generator generates working Crystal code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well spec/protobuf/generator_spec.cr
is empty. 😬
I wasn't sure how these existing fixture files were generated, so I just put mine next to them.
I don't think there's currently anything in the specs that actually invokes the generator,
so not sure where I should put that above proto source. Just drop it into spec/fixtures/
?
spec/fixtures/test.pb.cr
Outdated
repeated :enum, SomeEnum, 15 | ||
# "enum" is a reserved keyword in | ||
# crystal so we use "my_enum" for this one | ||
repeated :my_enum, SomeEnum, 15 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. It's a reserved word, definitely, but it seems to work okay as a method as long as you have an explicit receiver for the call.
struct Test
property enum : String = ""
end
msg = Test.new
msg.enum # => ""
msg.enum = "hi"
msg.enum # => "hi"
This seems to be how the Range
type can have begin
and end
methods despite them being keywords, as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm yes, I believe it may have to do with me calling the setters
from the constructor now (was necessary to get the new oneof-exclusive
behavior when doing e.g. MyMessage.new(a: 1, b: 2, c: 3)
).
This produces code like self.enum = @enum
which the compiler may not like.
(ah, PS: i think actually it produces code like self.enum = enum
- perhaps adding the pretzel there would help)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps adding the pretzel there would help
Agreed. If this code breaks with that change, that's a regression with no workaround. Definitely don't want that! 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, adding the @
indeed fixes the enum issue.
But it also changes the behavior - the constructor then keeps the first value instead of the last.
Before:
MyMsg.new(a: 1, b: 2, c: 3) # => { a: nil, b: nil, c: 3 }
After:
MyMsg.new(a: 1, b: 2, c: 3) # => { a: 1, b: nil, c: nil }
No idea what that is about, gonna take a closer look tomorrow.
(maybe it doesn't even matter anyway as this is not behavior that should be relied on 😬)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(pushed an update with the enum fix and the above changed behavior for now)
src/protobuf/message.cr
Outdated
{% for tag, field in FIELDS %} | ||
self.{{field[:name].id}} = {{field[:name].id}} | ||
{% end %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this use the ONEOFS
hash instead to avoid reassigning every property every time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. Good question! 🤔
I guess the setter-behavior is only needed for the oneofs
so it could iterate over ONEOFS
instead of FIELDS
.
Don't really have a strong opinion there. I guess it would be slightly faster
but could also set a trap for people who subclass to add their own setters
and expect consistent behavior on initialization. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't support message inheritance. We implement messages as structs (with the sole exception being recursive messages) to reduce GC pressure and non-abstract structs can't be subclassed in Crystal:
The main reason I'm wondering about this is not so much performance but more that setting values in two places gives us two sources of truth. Ideally we'd only set them once. Maybe iterating over ONEOFS
isn't the right approach in that case, either. I'm not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good point about the inheritance, yes, then that can indeed be ignored.
Hmm yes, I'm not too fond of that part of the PR either but couldn't think of
a better way to do it. But I can't see where a conflict could occur - when you
do Msg.new
the setters are called, msg.foo = bar
also uses the setter.
And when deserializing a message it doesn't matter unless your message is
corrupt to begin with (contains multiple values for the same oneof).
spec/protobuf/message_spec.cr
Outdated
# When multiple oneof values are populated | ||
# by the ctor only the last survives |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question about this verbiage. If I have the following protobuf message:
message MyMessage {
oneof pick_one {
int32 foo = 1;
int32 bar = 2;
}
}
And I create these 2 messages:
MyMessage.new(foo: 1, bar: 2)
MyMessage.new(bar: 2, foo: 1)
Is the expectation that these produce different results?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, yet another good question!
Well, we call the setters at initialization time in FIELDS-order;
{% for tag, field in FIELDS %}
So I believe both invocations should produce the same result.
Might be worth adding a test for this although it's not behavior
that should be relied on (strictly speaking we should probably raise
an Exception when someone tries to set multiple values of a oneof
during init).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I believe both invocations should produce the same result.
They do, it's why I asked 🙂 The hard part is that this would require someone working with message containing a oneof
to know the order of the declarations in the proto source (assuming we output them in the same order in the Crystal code, which I believe we do). This is unfortunately not very ergonomic.
we should probably raise an Exception when someone tries to set multiple values of a oneof during init
This might be a good idea. Ideally, we'd use the type system to make this a compile-time error by defining two separate initialize
methods. Unfortunately, that would only work if all of the members of the oneof
are different types because Crystal sees def a(b : Int32)
and def a(c : Int32)
as the same method.
Click for code
def a(b : Int)
"B"
end
def a(c : Int)
"C"
end
a(b: 42)
a(c: 42)
Maybe we could still do that, though. A while back, they added the ability to force some arguments to be passed as named arguments. So in the snippet pasted above, forcing them as named arguments allows the methods to be seen differently:
Click for code
def a(*, b : Int)
"B"
end
def a(*, c : Int)
"C"
end
a(b: 42)
a(c: 42)
This would require generating all permutations of initialize
for all oneof
declarations. So for example:
message MyMessage {
oneof one {
string foo = 1;
string bar = 2;
}
oneof two {
string baz = 3;
string quux = 4;
string omg = 5;
}
}
The resulting Crystal code (after macro expansion) would need to have 6 initialize
methods:
struct MyMessage
include ::Protobuf::Message
# ... property declarations ...
def initialize(*, @foo : String, @baz : String)
end
def initialize(*, @foo : String, @quux : String)
end
def initialize(*, @foo : String, @omg : String)
end
def initialize(*, @bar : String, @baz : String)
end
def initialize(*, @bar : String, @quux : String)
end
def initialize(*, @bar : String, @omg : String)
end
end
I think this would be an excellent way to enforce oneof
at compile time, but requires significantly more work.
If we wanted to get fancy, we could even have a pseudo-initialize
whose only purpose is to show a better error at compile time if you pass both. But that's probably not unnecessary. The compiler error is probably enough for someone using the message to look at the definition and realize "oh, right, I can only specify one".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Holy ravioli, that looks complicated! 🤯 😅
What exactly is this trying to solve, though?
(update: n/m, took me a second pass, now I get it)
Hmm, yes, having "too many enum values provided" a compile time error
would indeed seem luxurious. Not sure if it's worth the effort, though.
I think erroneously having an explicit Foo.new(a: 1, b:2)
in the code
(where only one value should be given) is a fairly exotic case and would
normally be caught in testing.
The risk for exclusivity violations is probably higher when protobuf objects
are populated dynamically (e.g. from an ORM or from_json
) - and from that
the compiler can't save us. An Exception could - but that's runtime.
Not sure how other protobuf impls behave in that regard but my concern
would be that it might add a bunch of complexity to the generator for not much gain. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Holy ravioli, that looks complicated! 🤯 😅
It does when the macros are expanded, but it would all be abstracted from anyone reading the generated code.
The implementation could be relatively simple:
- Are there any
oneof
s in this message?- If yes: iterate over them in a nested loop generating a new
initialize
method for each one - If no: generate the one
initialize
- If yes: iterate over them in a nested loop generating a new
The downside is that it's a breaking change for anyone using positional arguments to initialize
, which I believe we do in the test suite.
What exactly is this trying to solve, though? 🤔
Setting two values in a oneof
group. You have to know you can only supply one key from each oneof
group no matter what the implementation is, but right now we're just not checking your work on that. Your implementation would check your work, which is awesome, but if you get it wrong we just overwrite all but one of your values. You may have no idea that's happened.
Your idea of raising an exception will let them know, which is arguably a huge improvement over silently dropping values because it lets them know that their code has a bug, but it happens at run time. People will likely discover the bug in production. Generating multiple initialize
methods will move that exception to compile time. They won't even be able to deploy the bug because it won't be a valid Crystal program.
Just to be clear: I think incoming messages are not supposed to have multiple values for a oneof.
(the current version of protobuf.cr will indeed create such messages but I don't think they are legal)
Agreed.
From what I gathered from the docs I thought it should be enough to enforce their
exclusivity at object creation time (i.e. in the constructor or when a setter is called).But could be I missed something 🤔
We don't have a guarantee when we're deserializing anything coming in from an untrusted host, but the risk there is probably relatively low. It should be pretty rare that nobody using of a particular message is using a library that checks that unless they're all using the same Protobuf library to serialize/deserialize — at which point, Protobuf is overkill and they should probably be using something else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
breaking change for anyone using positional arguments
Hmm yes, as mentioned in my prev. comment (sorry I had edited it later on - it got late that night)
I'm skeptical about the benefits we'd gain from it.
Having an explicit Foo.new(a: 1, b:2)
in the code (where only one value should
be provided) seems like a rare case and should usually be easily caught in testing.
I think the other protobuf impls (at least the ones I've looked at)
don't go to that length either.
But in any case, your scrutiny has led me to an actual bug in the current impl! 🐞
Now fixed here.
Tldr: We do need to call the setters after deserialisation or the instance var
that identifies which oneof has a value doesn't get set.
I've also added a test for it and deduplicated them here.
With this fix I believe the current impl now works as intended and now it
also applies the "keep only first value"-behavior to incoming messages.
I'm not opposed to refactoring into multiple ctor's as you propose
but am still not quite convinced it's worth it. I feel like it may also bring
some undesirable side-effects such as potentially confusing compiler errors
as messages with many fields may end up with dozens of ctors. 🤔
bd0407d
to
a4b6860
Compare
Note: * This also changes the constructor semantics when multiple oneof-values are given (now the *first* value is kept instead of the last) * Using reserved keywords as protobuf field-names remains a bad idea regardless. For example a field called `delete` will break ts-proto / protoc-gen-js and probably others
a4b6860
to
8035bd5
Compare
(pardon the force-push, got mixed up with my branches here) |
Previously deserializing a message containing `oneofs` would not set the oneof instance var. E.g. you would get: `{ oneof_member1: 1, oneof_member_2: nil, my_oneof: nil }` Now it works as intended and you get: `{ oneof_member1: 1, oneof_member_2: nil, my_oneof: "oneof_member1" }`
|
||
{% for tag, field in FIELDS %} | ||
self.{{field[:name].id}} = @{{field[:name].id}} | ||
{% end %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why a second {% for .. %}
-block is needed here.
It tried to append the self.x = @x
to the preceding block
but that gave me:
The instance variable '@is_extension' is initialized in other 'initialize' methods,
and by not initializing it here it's not clear if the variable is supposed
to be nilable or if this is a mistake.
But well, this version works, and I guess it's actually a bit clearer to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PS: Ah I suppose adding it to the existing {% for %}
didn't work because
in the case of a oneof
the setter will iterate over FIELDS again - and may thereby attempt to access fields before they have been initialized. So yep, that makes sense.
`myobj.myoneof_value` is more convenient than `myobj[myobj.myoneof]` esp. when dealing with nested objects.
The return-type of `myobj.myoneof_value` is now restricted to the types of the possible oneof-values. Previously it would include the types of *all* message-fields (even those that are not part of the queried oneof).
Now it should actually work. ;) Note: Resist the temptation to store ONEOF as Symbol's. It appears to work at first and saves some String-conversions but wreaks havoc when JSON::Serializable comes into play.
While hacking on this I realized that this So with the last commit I've added experimental support for that. And the resulting fields behave as requested by the impl guide - identical to proto2 optionals. All proto3 scalar-fields have always behaved like proto2 optionals. I.e. they would become nilable's I feel like this last bit might actually be easy'ish to fix, perhaps just by assigning |
Given a message:
Previously the generated classes by crystal-protoc-gen
would allow to set multiple values of the oneof:
With this patch the sibling oneof-values
will be cleared when one is set:
Also when deserializing a message that contains
oneofs there was previously no way of knowing
which oneof value is populated.
With this patch we synthesize a new instance
variable by the name of the oneof to contain
the string-name of the member that is populated: