Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Actually support non-standard offset sizes (64-bit, 16-bit) #206

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

TYoungSL
Copy link

@TYoungSL TYoungSL commented Aug 11, 2021

Makes FLATCC_OFFSET_SIZE configurable and makes generated code export the compiler's configuration.

It also puts forward some work to make FLATCC_VOFFSET_SIZE configurable, but does not provide a facility to configure it directly through cmake.

@mikkelfj
Copy link
Contributor

I have to look more into this, but please consider that the original design intended for some of these files to be entirely replaced for different size configurations.

src/runtime/builder.c Outdated Show resolved Hide resolved
@mikkelfj
Copy link
Contributor

mikkelfj commented Aug 12, 2021

Just a heads up:
This change, if approved, is too big to merged onto master before a new release. A release is already over due.

@TYoungSL
Copy link
Author

TYoungSL commented Aug 12, 2021

It got bigger with the requirement of not using strncpy. :)

Probably for the best, this eliminates any chance of buffer overflows and improves portability.

Minor possible errata in the original parser code;
When strictly parsing the file identifier, the interpreted length may not be correct when escape sequences are used (e.g. to add nulls).

Added an option FLATBUFFERS_STRICT_IDENTIFIER_SIZE to use the original code (without fixing the errata) and FLATBUFFERS_RELAXED_IDENTIFIER_SIZE which implies truncation and padding to fit.

@TYoungSL
Copy link
Author

TYoungSL commented Aug 12, 2021

FYI
nsc is intended to be the namespace that changes when using different derived flatbuffer formats such that original formats can be still be accessed in the same user code.

Yeah, I think I'm using it correctly - you may have to replace the comment after this push (it was a force push, sorry).

Wait for the tests to pass though.

Are you saying I should append the offset size (_16/_64) or something to the namespace?

@mikkelfj
Copy link
Contributor

Are you saying I should append the offset size (_16/_64) or something to the namespace?
something like that, or small or wide or large, haven't given this a lot of though.

Note that this is probably a much larger PR than you anticipate.
There may be hardcoded uses of "flatbuffer_" prefixes that may need to change.
It has to be analyzed how compilation libraries can or cannot coexist with multiple formats.
Changing identifier width also spills over to the concept of type hashes - should they be 64-bit in some formats?
Should the identifier width change just because the uoffset type does? And if not, where are the hardcoded assumptions to the contrary.
etc. etc.

I think the best option is to merge this to a separate branch and mature up from there over time, if you are up for it.

@TYoungSL
Copy link
Author

TYoungSL commented Aug 12, 2021

We're currently using it to compare against BigBuffers (a fork of FlatBuffers that is specifically 64-bit) and have had good success with it. Much cleaner than flatc. :)

Separate branch is fine, StirlingLabs will maintain a separate fork and hope to contribute it upstream to master.

The intention is that offset sizes are de facto influences of other sizes save for voffsets, but end users should be allowed to specify any size.

Binary compatibility is dependent on the configuration. Non-standard values do not have any expectation of compatibility outside of that configuration - but it's generally justified by some requirement. (e.g. larger offset address space)

something like that, or small or wide or large, haven't given this a lot of though.

Would probably need to append it to the filename too. Maybe allow the user to specify a suffix (and/or prefix) to the namespace or require one when it's non-default?

@mikkelfj
Copy link
Contributor

On a related note: eventually I'd like to see support for StreamBuffers that build buffers front to back so data can be flushed to disk or set over network without reassembly.
There are the same namespace concerns here, although the size will not necessarily change.

@mikkelfj
Copy link
Contributor

Would probably need to append it to the filename too. Maybe allow the user to specify a suffix (and/or prefix) to the namespace or require one when it's non-default?

I'm not really sure - it could also be handled using separate output directories - but since include guards would have to be unique, it kind of suggests that the filenames should be too.

@mikkelfj
Copy link
Contributor

Added an option FLATBUFFERS_STRICT_IDENTIFIER_SIZE to use the original code (without fixing the errata) and FLATBUFFERS_RELAXED_IDENTIFIER_SIZE which implies truncation and padding to fit.

It seems like an easy way to get conflicts - couldn't this be handled with one setting?

Are you sure this is needed. Flatcc already has several ways to specify file identifiers, the typehash version in effect deals with any kind of binary identifier even if not actually a hash.

Minor possible errata in the original parser code;
When strictly parsing the file identifier, the interpreted length may not be correct when escape sequences are used (e.g. to add nulls).

As to errata: the file identifier code is surprisingly difficult to get right. But, I think the intention was for consuming human strings up to a maximum length, not to handle binary identifiers which AFAIR can be specified differently.

@TYoungSL
Copy link
Author

It seems like an easy way to get conflicts - couldn't this be handled with one setting?

Are you sure this is needed. Flatcc already has several ways to specify file identifiers, the typehash version in effect deals with any kind of binary identifier even if not actually a hash.

The settings would be using cmake_dependent_option (but I suppose the project uses an older cmake?) to provide a mutually exclusive pair, it's a pretty common convention to have an explicit restricting option when one has an implied scenario. Do you just mean it's confusing?

As to errata: the file identifier code is surprisingly difficult to get right. But, I think the intention was for consuming human strings up to a maximum length, not to handle binary identifiers which AFAIR can be specified differently.

That was the original idea behind using strncpy, but it's still possible to specify e.g. "\0\0" or "\r\n" and get a 2 byte string out of 4 characters, or "\x00" to get a 1 byte string. Anyone who does this is met with further problems so it's not plausible that someone would rely on this behavior, so it is errata.

@mikkelfj
Copy link
Contributor

I'm not a CMake expert, it was just my intuitive impression. The CMake version needs to be old to be portable, but there is seperate work on upgrade CMake build and I think they bumped the required version. See unmerged PR pending next release.

As to file identifiers - I don't really recall the details, so I can only provide a general overview and I might not be right in all I say.

@TYoungSL
Copy link
Author

TYoungSL commented Aug 12, 2021

By convention C++ comments are never used. Maybe some compilation flag?

Yep, I'll just remove those lines. (edit: Ended up keeping them, modified slightly.)

There's also line 66 in grisu3_parse.h, looks like cdump.h, elapsed.h and cmetrohash.h also have some...

I think all the compilers allow it.

@mikkelfj
Copy link
Contributor

BTW: way back in version 0.4.0 a branch was made with big endian support. Here the 4-byte file identifier was reversed, so "MONS" became "SNOM". It lead to all sort of problems, as you can imagine.

…ptions

use struct size assertions when FLATCC_RELAXED_IDENTIFIER_SIZE is not defined
@Rjvs
Copy link

Rjvs commented Aug 21, 2021

Supporting multiple offsets within the same codebase is a feature that this PR opens the gateway toward... but the ability to specify 64-bit offsets (which is mentioned in the docs as possible) seems useful in it's own right. For anyone that just wants to be able to specify 16 or 64bit offsets, I think this PR does the job, without changing anything for normal users.

There are a number of reasons why a user might want to modify namespaces and this PR does add another one but to me it is a completely separate issue.

@mikkelfj
Copy link
Contributor

@Rjvs

Supporting multiple offsets within the same codebase is a feature that this PR opens the gateway toward... but the ability to specify 64-bit offsets (which is mentioned in the docs as possible) seems useful in it's own right

Good point. It should be possible to just redefine the offset size and have it work. Different namespaces is the advanced version and perhaps it would be good to focus on the basics first. In either case, it won't go in before the next release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants