Improve `oneof` support #40

m-o-e · 2021-07-16T20:26:42Z

Given a message:

message Foo {
  oneof timestamp {
    int64 created;
    int64 updated;
  }
}

Previously the generated classes by crystal-protoc-gen
would allow to set multiple values of the oneof:

foo.created = 123
foo.updated = 345

foo.created # => 123
foo.updated # => 345

With this patch the sibling oneof-values
will be cleared when one is set:

foo.created = 123
foo.updated = 345

foo.created # => nil
foo.updated # => 345

Also when deserializing a message that contains
oneofs there was previously no way of knowing
which oneof value is populated.

With this patch we synthesize a new instance
variable by the name of the oneof to contain
the string-name of the member that is populated:

foo.created = 123
foo.updated = 345

foo.created # => nil
foo.updated # => 345
foo.timestamp # => "updated"
foo[foo.timestamp] # => 345

Given a message: ``` message Foo { oneof timestamp { int64 created; int64 updated; } } ``` Previously the generated classes by crystal-protoc-gen would allow to set multiple values of the oneof: ``` foo.created = 123 foo.updated = 345 foo.created # => 123 foo.updated # => 345 ``` With this patch the sibling oneof-values will be cleared when one is set: ``` foo.created = 123 foo.updated = 345 foo.created # => nil foo.updated # => 345 ``` Also when deserializing a message that contains oneofs there was previously no way of knowing which oneof value is populated. With this patch we synthesize a new instance variable by the name of the oneof to contain the string-name of the member that is populated: ``` foo.created = 123 foo.updated = 345 foo.created # => nil foo.updated # => 345 foo.timestamp # => "updated" foo[foo.timestamp] # => 345 ```

jgaskins

It's awesome that you're trying to fix this up! I've somehow never had to use oneof before, so I'm a bit less familiar with it. I had a look at the docs and it looks like unsetting sibling values matches the behavior of the Google-owned libraries, so this is great.

Just some notes about details, which might require some changes, but the premise is great. Also, if you could add the .proto files, that'd be awesome.

jgaskins · 2021-07-16T23:34:11Z

spec/fixtures/generated_proto2/TestOneofMessage.pb.cr

+#require "protobuf"
+
+module TestMessagesProto2
+
+  struct TestOneof1
+    include Protobuf::Message
+
+    contract do
+      optional :a, :int32, 1
+
+      optional :oo0_a, :int32, 2, oneof_index: 0
+      optional :oo0_b, :int32, 3, oneof_index: 0
+      optional :oo0_c, :int32, 4, oneof_index: 0
+
+      optional :b, :int32, 5
+
+      optional :oo1_a, :int32, 6, oneof_index: 1
+      optional :oo1_b, :int32, 7, oneof_index: 1
+      optional :oo1_c, :int32, 8, oneof_index: 1
+
+      oneof 0, "oo0"
+      oneof 1, "oo1"
+    end
+  end
+end


Looks like you may have forgotten to commit the TestOneofMessage.proto here. That'd be helpful to understand what all of the oo0_* properties come from. 🙂

Oh, sorry, I wrote that above file by hand, didn't use an input proto.
(the layout of the fixtures wasn't entirely clear to me so I just did what worked 😅)

The corresponding proto file would look like this:

syntax = "proto3"; message TestOneof1 { int32 a = 1; oneof oo0 { int32 oo0_a = 2; int32 oo0_b = 3; int32 oo0_c = 4; } int32 b = 5; oneof oo1 { int32 oo1_a = 6; int32 oo1_b = 7; int32 oo1_c = 8; } }

The goal was to have more than one oneof and have it
interleaved with other fields for a meaningful test.

Lemme know if I should add that proto to the repo somewhere
(perhaps as a comment on top of TestOneofMessage.pb.cr? 🤔).

Ideally, we'd have it there to verify that the generator generates working Crystal code.

Well spec/protobuf/generator_spec.cr is empty. 😬
I wasn't sure how these existing fixture files were generated, so I just put mine next to them.

I don't think there's currently anything in the specs that actually invokes the generator,
so not sure where I should put that above proto source. Just drop it into spec/fixtures/?

jgaskins · 2021-07-16T23:42:35Z

spec/fixtures/test.pb.cr

-    repeated :enum, SomeEnum, 15
+    # "enum" is a reserved keyword in
+    # crystal so we use "my_enum" for this one
+    repeated :my_enum, SomeEnum, 15


Interesting. It's a reserved word, definitely, but it seems to work okay as a method as long as you have an explicit receiver for the call.

struct Test property enum : String = "" end msg = Test.new msg.enum # => "" msg.enum = "hi" msg.enum # => "hi"

This seems to be how the Range type can have begin and end methods despite them being keywords, as well.

Hm yes, I believe it may have to do with me calling the setters
from the constructor now (was necessary to get the new oneof-exclusive
behavior when doing e.g. MyMessage.new(a: 1, b: 2, c: 3)).

This produces code like self.enum = @enum which the compiler may not like.
(ah, PS: i think actually it produces code like self.enum = enum - perhaps adding the pretzel there would help)

perhaps adding the pretzel there would help

Agreed. If this code breaks with that change, that's a regression with no workaround. Definitely don't want that! 🙂

Hmm, adding the @ indeed fixes the enum issue.
But it also changes the behavior - the constructor then keeps the first value instead of the last.

Before:

MyMsg.new(a: 1, b: 2, c: 3) # => { a: nil, b: nil, c: 3 }

After:

MyMsg.new(a: 1, b: 2, c: 3) # => { a: 1, b: nil, c: nil }

No idea what that is about, gonna take a closer look tomorrow.
(maybe it doesn't even matter anyway as this is not behavior that should be relied on 😬)

(pushed an update with the enum fix and the above changed behavior for now)

src/protobuf/generator.cr

jgaskins · 2021-07-16T23:48:37Z

src/protobuf/message.cr

+        {% for tag, field in FIELDS %}
+          self.{{field[:name].id}} = {{field[:name].id}}
+        {% end %}


Should this use the ONEOFS hash instead to avoid reassigning every property every time?

Hmm. Good question! 🤔

I guess the setter-behavior is only needed for the oneofs
so it could iterate over ONEOFS instead of FIELDS.

Don't really have a strong opinion there. I guess it would be slightly faster
but could also set a trap for people who subclass to add their own setters
and expect consistent behavior on initialization. 🤔

We don't support message inheritance. We implement messages as structs (with the sole exception being recursive messages) to reduce GC pressure and non-abstract structs can't be subclassed in Crystal:

The main reason I'm wondering about this is not so much performance but more that setting values in two places gives us two sources of truth. Ideally we'd only set them once. Maybe iterating over ONEOFS isn't the right approach in that case, either. I'm not sure.

Ah, good point about the inheritance, yes, then that can indeed be ignored.

Hmm yes, I'm not too fond of that part of the PR either but couldn't think of
a better way to do it. But I can't see where a conflict could occur - when you
do Msg.new the setters are called, msg.foo = bar also uses the setter.

And when deserializing a message it doesn't matter unless your message is
corrupt to begin with (contains multiple values for the same oneof).

jgaskins · 2021-07-17T00:15:29Z

spec/protobuf/message_spec.cr

+    # When multiple oneof values are populated
+    # by the ctor only the last survives


Question about this verbiage. If I have the following protobuf message:

message MyMessage { oneof pick_one { int32 foo = 1; int32 bar = 2; } }

And I create these 2 messages:

MyMessage.new(foo: 1, bar: 2) MyMessage.new(bar: 2, foo: 1)

Is the expectation that these produce different results?

Hmm, yet another good question!

Well, we call the setters at initialization time in FIELDS-order;
{% for tag, field in FIELDS %}

So I believe both invocations should produce the same result.
Might be worth adding a test for this although it's not behavior
that should be relied on (strictly speaking we should probably raise
an Exception when someone tries to set multiple values of a oneof
during init).

So I believe both invocations should produce the same result.

They do, it's why I asked 🙂 The hard part is that this would require someone working with message containing a oneof to know the order of the declarations in the proto source (assuming we output them in the same order in the Crystal code, which I believe we do). This is unfortunately not very ergonomic.

we should probably raise an Exception when someone tries to set multiple values of a oneof during init

This might be a good idea. Ideally, we'd use the type system to make this a compile-time error by defining two separate initialize methods. Unfortunately, that would only work if all of the members of the oneof are different types because Crystal sees def a(b : Int32) and def a(c : Int32) as the same method.

Click for code

def a(b : Int) "B" end def a(c : Int) "C" end a(b: 42) a(c: 42)

Maybe we could still do that, though. A while back, they added the ability to force some arguments to be passed as named arguments. So in the snippet pasted above, forcing them as named arguments allows the methods to be seen differently:

Click for code

def a(*, b : Int) "B" end def a(*, c : Int) "C" end a(b: 42) a(c: 42)

This would require generating all permutations of initialize for all oneof declarations. So for example:

message MyMessage { oneof one { string foo = 1; string bar = 2; } oneof two { string baz = 3; string quux = 4; string omg = 5; } }

The resulting Crystal code (after macro expansion) would need to have 6 initialize methods:

struct MyMessage include ::Protobuf::Message # ... property declarations ... def initialize(*, @foo : String, @baz : String) end def initialize(*, @foo : String, @quux : String) end def initialize(*, @foo : String, @omg : String) end def initialize(*, @bar : String, @baz : String) end def initialize(*, @bar : String, @quux : String) end def initialize(*, @bar : String, @omg : String) end end

I think this would be an excellent way to enforce oneof at compile time, but requires significantly more work.

If we wanted to get fancy, we could even have a pseudo-initialize whose only purpose is to show a better error at compile time if you pass both. But that's probably not unnecessary. The compiler error is probably enough for someone using the message to look at the definition and realize "oh, right, I can only specify one".

Holy ravioli, that looks complicated! 🤯 😅

~~What exactly is this trying to solve, though?~~
(update: n/m, took me a second pass, now I get it)

Hmm, yes, having "too many enum values provided" a compile time error
would indeed seem luxurious. Not sure if it's worth the effort, though.

I think erroneously having an explicit Foo.new(a: 1, b:2) in the code
(where only one value should be given) is a fairly exotic case and would
normally be caught in testing.

The risk for exclusivity violations is probably higher when protobuf objects
are populated dynamically (e.g. from an ORM or from_json) - and from that
the compiler can't save us. An Exception could - but that's runtime.

Not sure how other protobuf impls behave in that regard but my concern
would be that it might add a bunch of complexity to the generator for not much gain. 🤔

Holy ravioli, that looks complicated! 🤯 😅

It does when the macros are expanded, but it would all be abstracted from anyone reading the generated code.

The implementation could be relatively simple:

Are there any oneofs in this message?

If yes: iterate over them in a nested loop generating a new initialize method for each one

If no: generate the one initialize

The downside is that it's a breaking change for anyone using positional arguments to initialize, which I believe we do in the test suite.

What exactly is this trying to solve, though? 🤔

Setting two values in a oneof group. You have to know you can only supply one key from each oneof group no matter what the implementation is, but right now we're just not checking your work on that. Your implementation would check your work, which is awesome, but if you get it wrong we just overwrite all but one of your values. You may have no idea that's happened.

Your idea of raising an exception will let them know, which is arguably a huge improvement over silently dropping values because it lets them know that their code has a bug, but it happens at run time. People will likely discover the bug in production. Generating multiple initialize methods will move that exception to compile time. They won't even be able to deploy the bug because it won't be a valid Crystal program.

Just to be clear: I think incoming messages are not supposed to have multiple values for a oneof.
(the current version of protobuf.cr will indeed create such messages but I don't think they are legal)

Agreed.

From what I gathered from the docs I thought it should be enough to enforce their
exclusivity at object creation time (i.e. in the constructor or when a setter is called).

But could be I missed something 🤔

We don't have a guarantee when we're deserializing anything coming in from an untrusted host, but the risk there is probably relatively low. It should be pretty rare that nobody using of a particular message is using a library that checks that unless they're all using the same Protobuf library to serialize/deserialize — at which point, Protobuf is overkill and they should probably be using something else.

breaking change for anyone using positional arguments

Hmm yes, as mentioned in my prev. comment (sorry I had edited it later on - it got late that night)
I'm skeptical about the benefits we'd gain from it.

Having an explicit Foo.new(a: 1, b:2) in the code (where only one value should
be provided) seems like a rare case and should usually be easily caught in testing.

I think the other protobuf impls (at least the ones I've looked at)
don't go to that length either.

But in any case, your scrutiny has led me to an actual bug in the current impl! 🐞

Now fixed here.
Tldr: We do need to call the setters after deserialisation or the instance var
that identifies which oneof has a value doesn't get set.

I've also added a test for it and deduplicated them here.

With this fix I believe the current impl now works as intended and now it
also applies the "keep only first value"-behavior to incoming messages.

I'm not opposed to refactoring into multiple ctor's as you propose
but am still not quite convinced it's worth it. I feel like it may also bring
some undesirable side-effects such as potentially confusing compiler errors
as messages with many fields may end up with dozens of ctors. 🤔

Note: * This also changes the constructor semantics when multiple oneof-values are given (now the *first* value is kept instead of the last) * Using reserved keywords as protobuf field-names remains a bad idea regardless. For example a field called `delete` will break ts-proto / protoc-gen-js and probably others

m-o-e · 2021-07-17T02:59:59Z

(pardon the force-push, got mixed up with my branches here)

Previously deserializing a message containing `oneofs` would not set the oneof instance var. E.g. you would get: `{ oneof_member1: 1, oneof_member_2: nil, my_oneof: nil }` Now it works as intended and you get: `{ oneof_member1: 1, oneof_member_2: nil, my_oneof: "oneof_member1" }`

m-o-e · 2021-07-17T10:28:28Z

src/protobuf/message.cr

+
+        {% for tag, field in FIELDS %}
+          self.{{field[:name].id}} = @{{field[:name].id}}
+        {% end %}


I'm not sure why a second {% for .. %}-block is needed here.

It tried to append the self.x = @x to the preceding block
but that gave me:

The instance variable '@is_extension' is initialized in other 'initialize' methods,
and by not initializing it here it's not clear if the variable is supposed
to be nilable or if this is a mistake.

But well, this version works, and I guess it's actually a bit clearer to read.

PS: Ah I suppose adding it to the existing {% for %} didn't work because
in the case of a oneof the setter will iterate over FIELDS again - and may thereby attempt to access fields before they have been initialized. So yep, that makes sense.

`myobj.myoneof_value` is more convenient than `myobj[myobj.myoneof]` esp. when dealing with nested objects.

The return-type of `myobj.myoneof_value` is now restricted to the types of the possible oneof-values. Previously it would include the types of *all* message-fields (even those that are not part of the queried oneof).

Now it should actually work. ;) Note: Resist the temptation to store ONEOF as Symbol's. It appears to work at first and saves some String-conversions but wreaks havoc when JSON::Serializable comes into play.

cf. https://github.com/protocolbuffers/protobuf/blob/master/docs/implementing_proto3_presence.md#updating-a-code-generator

m-o-e · 2021-07-26T22:07:00Z

While hacking on this I realized that this oneof-stuff is actually closely
related to the new proto3 field presence.

So with the last commit I've added experimental support for that.
The generator now signals support for the new feature (proto3-files with optional fields can now be compiled).

And the resulting fields behave as requested by the impl guide - identical to proto2 optionals.
However, I'm afraid this is mostly due to the fact that proto3 support was never completed ;)

All proto3 scalar-fields have always behaved like proto2 optionals. I.e. they would become nilable's
rather than following the proto3 "return default-value when absent" semantics.

I feel like this last bit might actually be easy'ish to fix, perhaps just by assigning
the defaults here (but haven't tried it out yet).

m-o-e mentioned this pull request Jul 16, 2021

Add support for Proto3 optional fields? #38

Open

Fix comment

7587e68

jgaskins reviewed Jul 17, 2021

View reviewed changes

m-o-e force-pushed the improve-oneof-support branch from bd0407d to a4b6860 Compare July 17, 2021 02:52

m-o-e force-pushed the improve-oneof-support branch from a4b6860 to 8035bd5 Compare July 17, 2021 02:59

m-o-e added 3 commits July 17, 2021 05:03

Avoid double-negative

53df591

DRY the specs (unify the duplicated tests)

0c22b97

m-o-e commented Jul 17, 2021

View reviewed changes

m-o-e added 5 commits July 17, 2021 15:45

Re-add oneof! invocation (had gotten lost in merge mixup)

ca66c71

Add direct getter for oneof value

83296aa

`myobj.myoneof_value` is more convenient than `myobj[myobj.myoneof]` esp. when dealing with nested objects.

Restrict return type of oneof-by-name getter

93f3afc

The return-type of `myobj.myoneof_value` is now restricted to the types of the possible oneof-values. Previously it would include the types of *all* message-fields (even those that are not part of the queried oneof).

Fix return type restriction (String/Symbol mixup)

8ae57a8

Now it should actually work. ;) Note: Resist the temptation to store ONEOF as Symbol's. It appears to work at first and saves some String-conversions but wreaks havoc when JSON::Serializable comes into play.

Add *experimental* support for proto3 optional (Field Presence)

2f015e4

cf. https://github.com/protocolbuffers/protobuf/blob/master/docs/implementing_proto3_presence.md#updating-a-code-generator

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve `oneof` support #40

Improve `oneof` support #40

m-o-e commented Jul 16, 2021

jgaskins left a comment

jgaskins Jul 16, 2021

m-o-e Jul 17, 2021

jgaskins Jul 17, 2021

m-o-e Jul 17, 2021 •

edited

Loading

jgaskins Jul 16, 2021

m-o-e Jul 17, 2021 •

edited

Loading

jgaskins Jul 17, 2021

m-o-e Jul 17, 2021

m-o-e Jul 17, 2021

jgaskins Jul 16, 2021

m-o-e Jul 17, 2021

jgaskins Jul 17, 2021

m-o-e Jul 17, 2021 •

edited

Loading

jgaskins Jul 17, 2021

m-o-e Jul 17, 2021

jgaskins Jul 17, 2021

m-o-e Jul 17, 2021 •

edited

Loading

jgaskins Jul 17, 2021

m-o-e Jul 17, 2021

m-o-e commented Jul 17, 2021

m-o-e Jul 17, 2021 •

edited

Loading

m-o-e Jul 17, 2021

m-o-e commented Jul 26, 2021

		# When multiple oneof values are populated
		# by the ctor only the last survives

Improve oneof support #40

Are you sure you want to change the base?

Improve oneof support #40

Conversation

m-o-e commented Jul 16, 2021

jgaskins left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

m-o-e Jul 17, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

m-o-e Jul 17, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

m-o-e Jul 17, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

m-o-e Jul 17, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

m-o-e commented Jul 17, 2021

m-o-e Jul 17, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

m-o-e commented Jul 26, 2021

Improve `oneof` support #40

Improve `oneof` support #40

m-o-e Jul 17, 2021 •

edited

Loading

m-o-e Jul 17, 2021 •

edited

Loading

m-o-e Jul 17, 2021 •

edited

Loading

m-o-e Jul 17, 2021 •

edited

Loading

m-o-e Jul 17, 2021 •

edited

Loading