Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate byte buffer based apis #816

Closed
thomasvl opened this issue Dec 10, 2018 · 27 comments
Closed

Investigate byte buffer based apis #816

thomasvl opened this issue Dec 10, 2018 · 27 comments

Comments

@thomasvl
Copy link
Collaborator

Swift GRPC (via swiftnio?) is having to turn their buffers into Data object to then parse the protos. It might make sense to see if we can extend the parsing apis (binary atleast, but maybe also json and TextFormat), to make use of SE-0237 so other projects can bridge their own storage types into something that swift protobuf can parse without having to do extra copies.

In their, the MutableCollection part of SE-0237 might also work for writing out message, but I'm not completely sure how worth it would be since they likely can extra from the Data pretty efficiently and likely have to for other things.

@allevato
Copy link
Collaborator

allevato commented Jun 6, 2019

From a Foundation session at WWDC: Data is always only going to be contiguous. They've introduced the new DataProtocol and MutableDataProtocol and provide many of the same Data operations but don't promise contiguity.

It looks like NIOFoundationCompat has a ByteBufferView that conforms to DataProtocol & ContiguousBytes.

So, it may be worth figuring out how to make our serialization methods generic over DataProtocol and also provide fast-paths over DataProtocol & ContiguousBytes.

@ddunbar
Copy link
Member

ddunbar commented Aug 12, 2019

There is also init(bytesNoCopy:count:deallocator:) if it is synchronous.

@FranzBusch
Copy link
Member

Just going to hijack this issue, we are going to investigate how we can get swift-protobuf off of Foundation, more specifically off of Data. We want to enable easy interop with various bag of bytes types. Right now we are only toying around with some ideas how to achieve this but I wanted to give you all a heads up.

cc @tbkka @thomasvl

@thomasvl
Copy link
Collaborator Author

thomasvl commented Sep 9, 2022

On thing I've debating in the past was to do something like the C++ Coded(Input|Output)Streams to wrap part of the serialization, move the existing plumbing on top of those. That might be a way to provide a non Data interface for that part of the api. There's still the issue/question of what to do for the bytes fields, but the two spaces might be worth looking at independently.

@FranzBusch
Copy link
Member

So our current thinking is to provide a protocol based abstraction that users of SwiftProtobuf can conform their bag of bytes types to. They can then pass their types to our generic methods and we can either read the bytes or write the bytes from their types. This is currently just an idea we have been thinking about. Once we dive into this more we will share more information here.

We are also aware of the bytes field type. Maybe it is best to just use UInt8 here. The protocol will not work in that place because then every type would become generic.

@thomasvl
Copy link
Collaborator Author

thomasvl commented Sep 9, 2022

I know I've mentioned it in some contexts; one thing that would help performance was if we could create binary format in reverse order. This would allow creation of binary to be single pass as we wouldn't have to pre-compute the length for message fields. It clearly doesn't work if one is trying to support an api that could directly stream to a file/socket. And growing on the front isn't something most buffer apis do either (with resizes/memcopys). The few impls I've seen that do this usually "grow" by allocating discrete chunks so the file output is assembled via those chunks (sorta like dispatch_data has)

Edit: Found it, #713 has some of that.

@gjcairo
Copy link
Contributor

gjcairo commented Oct 26, 2022

We have started working on this now. The first PR is out now (#1326) with more to follow. Please feel free to look at it and leave any comments.

@gjcairo
Copy link
Contributor

gjcairo commented Oct 28, 2022

I'd like to give some more context on what the proposed changes are, and why we're doing this.

Motivation

As stated above by @FranzBusch, the end goal is to remove Data (and eventually Foundation) from swift-protobuf. The main reasons for this are to avoid unnecessary copies between Data and other bag of bytes types (particularly ByteBuffer), and to reduce binary sizes for applications that don't need Foundation.
This is particularly important for swift-protobuf since it is a very low level library, and if it already brings Foundation in, higher level libraries have no way to avoid it.

Proposed changes

Overall, the idea is to make this transition opt-in instead of opt-out. There are two big places where Data is used in the current implementation:

  1. The (de)serialisation of binary and JSON protos
  2. The byte-type fields in protos

For the first item, the way we plan to achieve the removal of Data is by changing the current API defined in the Message protocol from this...

func serializedData() throws -> Data
init(serializedData: Data) throws

func jsonUTF8Data() throws -> Data
init(jsonUTF8Data: Data) throws

...to this:

func serializedBytes<Bytes: SwiftProtobufContiguousBytes>(as: Bytes.Type = [UInt8]) throws -> Bytes
init<Bytes: SwiftProtobufContiguousBytes>(serializedBytes: Bytes) throws

func jsonUTF8Bytes<Bytes: SwiftProtobufContiguousBytes>(as: Bytes.Type = [UInt8]) throws -> Bytes
init<Bytes: SwiftProtobufContiguousBytes>(jsonUTF8Bytes: Bytes) throws

Users will be able to conform their bag of bytes types to a new SwiftProtobufContiguousBytes protocol and use it for (de)serialisation. The default choice for protocol-conforming type will be [UInt8] (conformance will be included in the core protobuf module). On top of this, we'll provide Data conformance to SwiftProtobufContiguousBytes in a separate module, which users will be able to import if they want to keep using Data.

Names are TBC, but the overall end-goal structure of modules will be as follows:

  • SwiftProtobufCore - basically a rename of what currently is SwiftProtobuf, but without the usage of Foundation. This will also include the new SwiftProtobufContiguousBytes protocol and conformance of [UInt8] to it.
  • SwiftProtobufFoundationCompat - containing conformance of Data to the new protocol, and anything else (if needed) to keep backwards compatibility (such as an extension to Message to preserve the old serializedData()/jsonUTF8Data() methods).
  • SwiftProtobuf - will reexport the two packages above, so users wanting to keep the old behaviour (i.e. keep using Foundation) won't have to add/change any imports or make any other code changes.
  • SwiftProtobufNIOCompat - where this module lives is TBD, but the idea is to add conformance of ByteBuffer to SwiftProtobufContiguousBytes here.

For the second item, the short-term plan is to move away from Data by using [UInt8]. This means changing the generated code / protoc plugin to stop using Foundation. We can default to using Data to avoid breaking backwards compatibility, but add a new --foundationless (name TBC) flag to the plugin that uses [UInt8] instead for the generated code, and changes SwiftProtobuf imports to import SwiftProtobufFoundation.
protoc allows users to (re)generate code for specific protos only, so users could choose only a subset of protos to migrate at a time, without having to migrate all of them simultaneously.
For any other APIs that may need to change in the code to support [UInt8] instead of Data, where the SwiftProtobufContiguousBytes protocol can't be used, we can provide overloads in the SwiftProtobufFoundationCompat module.

Other changes to consider

  • The SwiftProtobuf package currently defines a few protos that are not only used internally but also are exposed to end users. To maintain backwards compatibility, we could have a version of the old generated code (using Data) in SwiftProtobufFoundationCompat, and a new one (using [UInt8]) in SwiftProtobufCore. Users could then choose the appropriate one to use. This is potentially a breaking change, and I'm open to suggestions on alternatives for how to solve this.
  • As for Bazel Swift Rules, changes would require to add the new codegen option to this file and update this other file with the new target structure.

Hope this clarifies some of the questions: please let me know your thoughts and if you have any follow-up queries.

@thomasvl
Copy link
Collaborator Author

I guess SwiftProtobufContiguousBytes is needed because ContiguousBytes comes out of Foundation? Is it at all worth opening an issue for the Swift Standard Library to potentially support some of these things to generally aid in not forcing the Foundation dependency?

@FranzBusch
Copy link
Member

I guess SwiftProtobufContiguousBytes is needed because ContiguousBytes comes out of Foundation? Is it at all worth opening an issue for the Swift Standard Library to potentially support some of these things to generally aid in not forcing the Foundation dependency?

Right, ContiguousBytes lives in Foundation so we need our own protocol here. We have brought this up to the standard library people already not only for SwiftProtobuf but also for our NIO use-cases. At this moment, we don't know if they will do something in this area so the best course of action right now is probably to just use [UInt8] and our own protocol.
I think a good thing about this work is that we can provide the stdlib people insights into what we would need.

@thomasvl
Copy link
Collaborator Author

thomasvl commented Oct 28, 2022

Other changes to consider.

I'm not following that first bullet, what are the protos you are referring too here?

For the second, if we go the route of a generation option, that could lead into complexity since generation is via an aspect, there isn't a good way to pass things down the build graph, and if two swift_proto_library targets wanted different values, the overlap becomes a problem. They might have to go down the route of making swift_proto_library expose two modules: [Target_Name] and [Target_Name]_Core, with the idea being the Core one is foundation less, and the existing name would have the foundation support generated into it instead. But since we're talking about the generated messages here, it also means twice as many types being generated. ☹️

To the overall proposal, both points make sense to me; but I do wonder about landing (1) The (de)serialisation of binary and JSON protos completely first. Then doing a second wave of PRs for (2) The byte-type fields in protos. That seems like it may make the reviews along the way a little easier by focusing on the two spaces independently.

edit: and the second change seems to be the most source disrupting, so it would allow the first to hopefully be completed while sorting out the details of the second. (and I (personally) feel the second is worth while no matter what happens on the second).

@thomasvl
Copy link
Collaborator Author

For completeness we should likely also document how we think we'll handle things in the SwiftPM plugin integration.

@thomasvl
Copy link
Collaborator Author

thomasvl commented Oct 28, 2022

Other changes to consider.

I'm not following that first bullet, what are the protos you are referring too here?

Is this specifically about bytes fields within the WKTs (any.proto, wrapper.proto, and descriptor.proto)? i.e. - do they expose Data or not and how do we migrate folks?

@gjcairo
Copy link
Contributor

gjcairo commented Oct 28, 2022

To the overall proposal, both points make sense to me; but I do wonder about landing (1) The (de)serialisation of binary and JSON protos completely first. Then doing a second wave of PRs for (2) The byte-type fields in protos. That seems like it may make the reviews along the way a little easier by focusing on the two spaces independently.

I agree, and I will split the PR that's currently in draft in two, to follow this recommendation.

For completeness we should likely also document how we think we'll handle things in the SwiftPM plugin integration.

Noted - I will add a mention to this, but I believe this should be pretty straightforward, simply exposing the new --foundationless option.

Other changes to consider.

I'm not following that first bullet, what are the protos you are referring to here?

Is this specifically about bytes fields within the WKTs (any.proto, wrapper.proto, and descriptor.proto)? i.e. - do they expose Data or not and how do we migrate folks?

Yes, that's exactly what I was referring to, sorry for the confusion.

@thomasvl
Copy link
Collaborator Author

This isn't going to be thought out, but I'll record it incase it spurs ideas for anyone else…

Re: --foundationless – It would be nice if we could do something like bytes_files=(data,array,hybrid). For data and array, it does the obvious thing, but for hybrid if it could make the property [UInt8], but say generate an extension that used Data with custom getter/setter to paper over things. But you can't overload a property like that. Then the WKTs could put that extension in a compatibility module.

Anyway, just wondering if there are other things that could be done with overloads/custom types/etc. to help in this space.

@FranzBusch
Copy link
Member

Re: --foundationless – It would be nice if we could do something like bytes_files=(data,array,hybrid). For data and array, it does the obvious thing, but for hybrid if it could make the property [UInt8], but say generate an extension that used Data with custom getter/setter to paper over things. But you can't overload a property like that. Then the WKTs could put that extension in a compatibility module.

I am not sure if exposing that kind of flexibility is what we want in the end. Let's assume the stdlib at some point provides types that can be used for the byte fields. If that happens we probably want to migrate to that type for the bytes field and at some point might want to drop the Data option completely (probably with a SemVer in-between). So I am thinking of this option more as a migration option than something we should support in the long run.

W.r.t. to the WKTs: I was thinking along the same lines as you with the overloads in the compat module; however, the only thing we really can do there is provide overloads that both copy and are suffix with something, i.e.:

var someFieldData: Data { Data(self.someField }

Which is both not nice and a performance footgun.

@thomasvl
Copy link
Collaborator Author

Re: --foundationless – It would be nice if we could do something like bytes_files=(data,array,hybrid). For data and array, it does the obvious thing, but for hybrid if it could make the property [UInt8], but say generate an extension that used Data with custom getter/setter to paper over things. But you can't overload a property like that. Then the WKTs could put that extension in a compatibility module.

I am not sure if exposing that kind of flexibility is what we want in the end. Let's assume the stdlib at some point provides types that can be used for the byte fields. If that happens we probably want to migrate to that type for the bytes field and at some point might want to drop the Data option completely (probably with a SemVer in-between). So I am thinking of this option more as a migration option than something we should support in the long run.

The generation option idea (like --foundationless (or so I thought)) was just something for the migration path. Long term we'd want to drop the option like we'd probably want to drop SwiftProtobufFoundationCompat and rename SwiftProtobufCore back to SwiftProtobuf, no?

If we think SwiftProtobufFoundationCompat would be something long-lived as useful, doesn't that sorta imply we'd want some long term option for still allowing folks to use Foundation (i.e. - Data) for the bytes fields?

W.r.t. to the WKTs: I was thinking along the same lines as you with the overloads in the compat module; however, the only thing we really can do there is provide overloads that both copy and are suffix with something, i.e.: ...

There are already some custom hooks for any, so we might be able to minimize some of the costs for that one. But the other two could be non idea, I don't have much visibility on their usage to have an idea on the impact. But if we made the *Data ones the new/expensive ones, folks would have to update to use them and we could also tag them as deprecated to help get folks to realize they need to update code paths.

@gjcairo
Copy link
Contributor

gjcairo commented Oct 31, 2022

I've created the PR above changing the target structure of the project to have the three modules: SwiftProtobufCore, SwiftProtobufFoundationCompat, and SwiftProtobuf.

With regards to the WKTs, we think the best way forward is to do what was discussed above:

  • Change the autogenerated code in SwiftProtobufCore to use [UInt8] (instead of Data) for bytes-typed properties.
  • We add extensions in SwiftProtobufFoundationCompat that provide Data-typed accessors for these properties (in the form of computed properties). We can annotate them as deprecated as Thomas suggested.

I'll work on updating the other existing PRs with these changes and your comments next.

@thomasvl
Copy link
Collaborator Author

thomasvl commented Nov 2, 2022

The multiple module aspect of this is likely to be an issue for CocoaPods. Opened #1334 for tracking since we accepted the first patch on the steps towards this.

@FranzBusch
Copy link
Member

Ah yeah pod support is an interesting one. I might throw in a bit of a controversial take here. Maybe we should consider dropping pod support in the next major swift-protobuf release? SPM is mature now and the ecosystem is moving towards packages.

Furthermore, I think that a major version is perfect for this since CocoaPod users can still use the 1.x release which remains completely functional. Additionally, they can never get into version conflict problems where a downstream pod is using 2.x if we don't offer it. This would be a way better transition than what we have down in NIO etc. where we dropped support in a minor.

Interested to hear more opinions on this @thomasvl @tbkka

@thomasvl
Copy link
Collaborator Author

thomasvl commented Nov 2, 2022

Furthermore, I think that a major version is perfect for this since CocoaPod users can still use the 1.x release which remains completely functional.

It's functionally complete today (I don't think we've pulled over one or two things we recently did related to generation option case handling and the SwiftPM plugin). But nothing says it will stay that way in the future. I don't think we can say protobuf won't get something new at some point, and if they put a conformance test on #1159 or #1160, it won't be able to pass without some likely non trivial work. I wish there was a way to know how many people use the CocoaPods support.

Edit: I'll say keeping/dropping CocoaPods support is ultimately up to Apple folks as owners of this project.

@thomasvl
Copy link
Collaborator Author

thomasvl commented Nov 2, 2022

Following up to @FranzBusch's idea –

Going to https://cocoapods.org/ and typing in protobuf:

Going to https://cocoapods.org/ and typing in grpc:

  • https://cocoapods.org/pods/gRPC-Swift would also get killed off
  • There are a lot of other things that come up, not sure how many of those might be alternatives that folks would move too.

@tbkka
Copy link
Contributor

tbkka commented Nov 2, 2022

Maybe we should consider dropping pod support in the next major swift-protobuf release? SPM is mature now and the ecosystem is moving towards packages.

Personally, I'd be happy to continue supporting CocoaPods if there's someone interested in doing the work.

@thomasvl
Copy link
Collaborator Author

thomasvl commented Nov 2, 2022

Looking in ~/.cocoapods/repos/master, doing grep -rl SwiftProtobuf * and processing that a little to only count the spec instead of counting each version, seems to say there are 50+ things that depend on the SwiftProtobuf podspec. Still doesn't say how popular any of those specs are, but says what else is built on it.

Maybe we should consider dropping pod support in the next major swift-protobuf release? SPM is mature now and the ecosystem is moving towards packages.

Personally, I'd be happy to continue supporting CocoaPods if there's someone interested in doing the work.

If CocoaPods really does have a single module per podspec limit, it is likely gonna take changes to the sources to support it. I'm not sure if the system could be gamed to use multiple podspec files from the single repo to pull off the multiple modules.

@thomasvl
Copy link
Collaborator Author

thomasvl commented Nov 2, 2022

We also decided 2.0 would be where we dropped the checked in SwiftProtobuf.xcodeproj, so we've dropped on other integration system "method" already.

@FranzBusch
Copy link
Member

So our experience with NIO and gRPC was that supporting Pods/Carthage was getting harder and harder while being able to make use of the latest SPM features.
It also came with a significant overhead for us every time we released. Furthermore, since we dropped pod support we haven't gotten many (maybe no) request to add it back in.

I would be in favor of only supporting SPM with 2.0 and keeping the 1.x branch open for Cocoapods. We did the same for gRPC with a call to the community to support this going forward.

Let's see where we are at when we wanna release 2.0 and we can make decision then?

@thomasvl
Copy link
Collaborator Author

thomasvl commented Jul 9, 2024

I'm going to close this out as 1.27 has some new apis around serialization that don't require Data. If there is interest in reconsidering Data for the bytes fields, that should probably be a new topic as that would be a much more difficult migration path for existing clients. Editions might make it something that can be rolled out on a field/message/file basis, but still deserves it's own issue for the discussions in the future.

@thomasvl thomasvl closed this as completed Jul 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants