Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider supporting an API for serializing a protobuf to the end of a binary buffer #1020

Open
Lukasa opened this issue Jul 1, 2020 · 6 comments

Comments

@Lukasa
Copy link
Contributor

Lukasa commented Jul 1, 2020

A number of use-cases will want to be able to send a binary protobuf appended to a pre-existing buffer, or use some non-Data object to store the protobuf. I’d like to begin a discussion about what such an API would look like, both in our ideal case (potentially requiring Swift enhancements) and what such a thing would look like with the current Swift functionality.

@thomasvl
Copy link
Collaborator

thomasvl commented Jul 1, 2020

I've talked about being able to write backward (which has issues with our current traverse method and required ordering of fields), but if we had that, and something like dispatch_data_create_concat & dispatch_data_create_subrange we wouldn't have to walk twice and could simply make chunks and the return them as one conceptual buffer.

If there was some other data/buffer type that allow us to add/merge things like that, we might not need an api to directly allow appending, but could instead return something that could then be merged with another to make the larger set out of it (i.e.- dispatch_data_create_concat).

With the current serialization support, we'd need a way to tell the "buffer" to grow, and then we could write directly into it (i.e. - still be two pass).

A smaller change to our current approach might be to revise things so we determine a byte count needed on a per field basis, ask for a byte range to fill in of that size, and then directly write to it. We'd still be two pass (compute size, compute types), but wouldn't require the whole buffer at once. On the down side, it means we'd grown the buffer in chunks which can have other performance issues.

Do any other projects (NIO?) have similar use cases and how do they deal with it?

@Lukasa
Copy link
Contributor Author

Lukasa commented Jul 1, 2020

Do any other projects (NIO?) have similar use cases and how do they deal with it?

Yeah, NIO has this use-case pretty directly. Right now if you use protobuf with NIO (e.g. for grpc-swift) protobuf will allocate a Data and serialize into it, only for us to allocate our own ByteBuffer, copy those bytes, and deinit the Data. That’s not really ideal.

The idea of writing chunks is an interesting one, but I don’t know if it gives us a performance win. Many users will still need to flatten that buffer, though in principle they can do vector writes. The bigger downside for serialization like protobuf is that the allocator pressure of that is quite large: one allocation per chunk makes it fairly expensive to serialize large messages.

The “how big should this be” question is hard, but NIO addresses it simply by, as you say, growing the buffer in chunks. NIO has an API that may be illustrative here: ByteBuffer.writeWithUnsafeMutableBytes(minimumWritableBytes:_:). This API can be called to say: “please give me a pointer to uninitialized interior memory at the end of the buffer guaranteeing that there is space to write at least minimumWritableBytes”. NIO takes care of validating that there is sufficient space, resizing if needed, CoWing if needed, and then vending the interior pointer.

I’m not too worried about the resizing-and-copying cost. In general users who are performance-conscious will begin by ensuring there is a decent capacity available to them already (1kB seems reasonable in many cases), and so the resizes will be extremely infrequent. Once we get past 4kB/16kB (2 or 4 resizes later) some implementations (such as ByteBuffer) will resize with realloc and, as these are now page-sized operations, realloc will often be essentially free, no copy required. On top of that, many NIO programs will attempt to re-use these buffers, meaning that the allocation cost will be further reduced.

I also wonder if it’s possible for us to provide a worst-case size fairly cheaply. If this is not too terribly unreasonable we might find that this would be useful to avoid the resizes altogether, at the cost of sometimes allocating memory we don’t use.

@thomasvl
Copy link
Collaborator

thomasvl commented Jul 1, 2020

From past talks with some of the other implementors, no one has found a good way to quickly estimate sizes. While a message might have a small number of fields, the moment one is a message field, it opens the door to any depth (you can build a graph out of messages), so there's no good way to compute the cost of those. Likewise, a string/bytes in one message may tend to be small, but another they could be paragraphs/images and thus large.

Java for Android does a reverse serialization, so that's why they use chunks because if you realloc, you have to keep also copying the data to the end, and it was a noticeable impact vs. just doing chunks and exposing buffer as a final union (and likely using vectored write support in other places).

So NIO needs the append case because you already need to assemble things from multiple places? And right now it happens that everything else is NIO based so it can use ByteBuffer directly? I'm guessing wire protocol overhead/framing followed by the payload of data?

Normally with framing, you need to know the size of the data, how does NIO handle something like that? If the size part of framing is fixed width, you can reserve the space and then back fill it; but if it is something variable width, how do you avoid using a temp buffer to collect the data so you know the size to then write the framing and data? This is what forces protobuf to be two pass (really more that two with nesting); writing in reverse avoids these problems because aren't trying to reserve variable space for a the length framing, instead of track what you write and then write the framing/size infront of it.

@Lukasa
Copy link
Contributor Author

Lukasa commented Jul 2, 2020

So NIO needs the append case because you already need to assemble things from multiple places? And right now it happens that everything else is NIO based so it can use ByteBuffer directly? I'm guessing wire protocol overhead/framing followed by the payload of data?

I wouldn't say NIO needs the append case per se: it wants the "write the data into this case", it doesn't so much care about whether it appends or not. It's useful to build this in a way that allows appending, but not mandatory. This is because NIO can perform vector writes, so having a header and a protobuf body can be represented either in a single flat buffer or as two separate buffers that NIO will vector-write later.

Note that NIO also does quadratic-resizing (just as Array does), so if you're about to write data into the buffer and need more space, NIO will double the size of the buffer. At small allocations this is cheapish (linear copies are fast) and at large allocation sizes this is cheapish (realloc can usually just allocate pages at the back of the buffer), so this approach works quite well for most NIO serialisers. It would probably work well for protobuf too.

Normally with framing, you need to know the size of the data, how does NIO handle something like that? If the size part of framing is fixed width, you can reserve the space and then back fill it; but if it is something variable width, how do you avoid using a temp buffer to collect the data so you know the size to then write the framing and data?

NIO's support for vector writes helps here: if you don't know how long the length will be you can put it into a different write and let NIO's vector write support handle it. This is what NIO does for HTTP chunked encoding, for example. That means the cost of having a "chunked" buffer is not a problem for NIO applications per-se.

What is a problem for NIO applications (and indeed any other application) is that they might have opinions about what the buffer type should be. The idea of having a "chunked" buffer is fine, but if that chunked buffer vended Data objects then NIO would have to replace all those Data objects with ByteBuffers. That will force another allocation and compaction, throwing away the gain from the chunked write.

I think chunked writes are definitely worth investigating, but I don't think they get us out of solving the issue of "what type do you want this protobuf serialised into".

@thomasvl
Copy link
Collaborator

thomasvl commented Jul 6, 2020

It would probably work well for protobuf too.

Do you have an upper bound where you stop doubling? Some things have protos that get into the tens of megabytes, so doubling can start to get overly expensive if a server is getting a high rate of traffic.

NIO's support for vector writes helps here: if you don't know how long the length will be you can put it into a different write and let NIO's vector write support handle it. This is what NIO does for HTTP chunked encoding, for example. That means the cost of having a "chunked" buffer is not a problem for NIO applications per-se.

It doesn't seem like buffer supports a vector model, is this support purely on a the writing apis where one could pass multiple buffers?

Has NIO given any thought to trying to factor out a protocol for buffers that would be a smaller dependency that other projects could then also try to support? Short of getting something in Swift itself, a common, minimal buffer abstraction might be a good starting point.

@Lukasa
Copy link
Contributor Author

Lukasa commented Jul 6, 2020

Do you have an upper bound where you stop doubling?

Nope, we’ll write the whole thing into contiguous memory if we have to.

Again I’ll note that at these sizes “contiguous memory” is kind of a lie. Above page size, the biggest risk is that your heap is so fragmented you literally cannot allocate this many pages next to each other in virtual memory. This can definitely happen, but it’s not terribly likely in most applications which come nowhere near close to exhausting the 48-bit address space they’re working with.

It doesn't seem like buffer supports a vector model, is this support purely on a the writing apis where one could pass multiple buffers?

Yeah, this is purely on the writing side.

Has NIO given any thought to trying to factor out a protocol for buffers that would be a smaller dependency that other projects could then also try to support? Short of getting something in Swift itself, a common, minimal buffer abstraction might be a good starting point.

We could, but I really want this to be something the standard library actually tries to solve. It’s just such a common impedance mismatch in Swift that I’d like to see a high-level, consistent attempt to solve the problem.

In the case of this use-case, it may be better to invert the idea. It seems like it might be better, rather than have the owners of buffer types be the people designing this interface, to have the consumers of those types be the ones to define it. If you had your ideal interface for working with buffers in protobuf, what would it be? We can then try to fit the big-three buffer types into it (Data, [UInt8], ByteBuffer in that order of importance) and see what is easy and what is hard, make enhancements where needed, and then loop around and repeat. I suspect this iteration will give us a much easier basis to identify what this interface would look like.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants