Actually support non-standard offset sizes (64-bit, 16-bit) #206

TYoungSL · 2021-08-11T18:36:57Z

Makes FLATCC_OFFSET_SIZE configurable and makes generated code export the compiler's configuration.

It also puts forward some work to make FLATCC_VOFFSET_SIZE configurable, but does not provide a facility to configure it directly through cmake.

mikkelfj · 2021-08-12T07:57:06Z

I have to look more into this, but please consider that the original design intended for some of these files to be entirely replaced for different size configurations.

src/runtime/builder.c

mikkelfj · 2021-08-12T17:26:56Z

Just a heads up:
This change, if approved, is too big to merged onto master before a new release. A release is already over due.

TYoungSL · 2021-08-12T17:42:59Z

It got bigger with the requirement of not using strncpy. :)

Probably for the best, this eliminates any chance of buffer overflows and improves portability.

Minor possible errata in the original parser code;
When strictly parsing the file identifier, the interpreted length may not be correct when escape sequences are used (e.g. to add nulls).

Added an option FLATBUFFERS_STRICT_IDENTIFIER_SIZE to use the original code (without fixing the errata) and FLATBUFFERS_RELAXED_IDENTIFIER_SIZE which implies truncation and padding to fit.

TYoungSL · 2021-08-12T17:50:30Z

FYI
nsc is intended to be the namespace that changes when using different derived flatbuffer formats such that original formats can be still be accessed in the same user code.

Yeah, I think I'm using it correctly - you may have to replace the comment after this push (it was a force push, sorry).

Wait for the tests to pass though.

Are you saying I should append the offset size (_16/_64) or something to the namespace?

mikkelfj · 2021-08-12T17:58:15Z

Are you saying I should append the offset size (_16/_64) or something to the namespace?
something like that, or small or wide or large, haven't given this a lot of though.

Note that this is probably a much larger PR than you anticipate.
There may be hardcoded uses of "flatbuffer_" prefixes that may need to change.
It has to be analyzed how compilation libraries can or cannot coexist with multiple formats.
Changing identifier width also spills over to the concept of type hashes - should they be 64-bit in some formats?
Should the identifier width change just because the uoffset type does? And if not, where are the hardcoded assumptions to the contrary.
etc. etc.

I think the best option is to merge this to a separate branch and mature up from there over time, if you are up for it.

TYoungSL · 2021-08-12T18:03:08Z

We're currently using it to compare against BigBuffers (a fork of FlatBuffers that is specifically 64-bit) and have had good success with it. Much cleaner than flatc. :)

Separate branch is fine, StirlingLabs will maintain a separate fork and hope to contribute it upstream to master.

The intention is that offset sizes are de facto influences of other sizes save for voffsets, but end users should be allowed to specify any size.

Binary compatibility is dependent on the configuration. Non-standard values do not have any expectation of compatibility outside of that configuration - but it's generally justified by some requirement. (e.g. larger offset address space)

something like that, or small or wide or large, haven't given this a lot of though.

Would probably need to append it to the filename too. Maybe allow the user to specify a suffix (and/or prefix) to the namespace or require one when it's non-default?

mikkelfj · 2021-08-12T18:13:52Z

On a related note: eventually I'd like to see support for StreamBuffers that build buffers front to back so data can be flushed to disk or set over network without reassembly.
There are the same namespace concerns here, although the size will not necessarily change.

mikkelfj · 2021-08-12T18:22:16Z

Would probably need to append it to the filename too. Maybe allow the user to specify a suffix (and/or prefix) to the namespace or require one when it's non-default?

I'm not really sure - it could also be handled using separate output directories - but since include guards would have to be unique, it kind of suggests that the filenames should be too.

… header

mikkelfj · 2021-08-12T18:30:13Z

Added an option FLATBUFFERS_STRICT_IDENTIFIER_SIZE to use the original code (without fixing the errata) and FLATBUFFERS_RELAXED_IDENTIFIER_SIZE which implies truncation and padding to fit.

It seems like an easy way to get conflicts - couldn't this be handled with one setting?

Are you sure this is needed. Flatcc already has several ways to specify file identifiers, the typehash version in effect deals with any kind of binary identifier even if not actually a hash.

Minor possible errata in the original parser code;
When strictly parsing the file identifier, the interpreted length may not be correct when escape sequences are used (e.g. to add nulls).

As to errata: the file identifier code is surprisingly difficult to get right. But, I think the intention was for consuming human strings up to a maximum length, not to handle binary identifiers which AFAIR can be specified differently.

TYoungSL · 2021-08-12T18:35:30Z

It seems like an easy way to get conflicts - couldn't this be handled with one setting?

Are you sure this is needed. Flatcc already has several ways to specify file identifiers, the typehash version in effect deals with any kind of binary identifier even if not actually a hash.

The settings would be using cmake_dependent_option (but I suppose the project uses an older cmake?) to provide a mutually exclusive pair, it's a pretty common convention to have an explicit restricting option when one has an implied scenario. Do you just mean it's confusing?

As to errata: the file identifier code is surprisingly difficult to get right. But, I think the intention was for consuming human strings up to a maximum length, not to handle binary identifiers which AFAIR can be specified differently.

That was the original idea behind using strncpy, but it's still possible to specify e.g. "\0\0" or "\r\n" and get a 2 byte string out of 4 characters, or "\x00" to get a 1 byte string. Anyone who does this is met with further problems so it's not plausible that someone would rely on this behavior, so it is errata.

mikkelfj · 2021-08-12T18:39:15Z

I'm not a CMake expert, it was just my intuitive impression. The CMake version needs to be old to be portable, but there is seperate work on upgrade CMake build and I think they bumped the required version. See unmerged PR pending next release.

As to file identifiers - I don't really recall the details, so I can only provide a general overview and I might not be right in all I say.

TYoungSL · 2021-08-12T18:41:21Z

By convention C++ comments are never used. Maybe some compilation flag?

Yep, I'll just remove those lines. (edit: Ended up keeping them, modified slightly.)

There's also line 66 in grisu3_parse.h, looks like cdump.h, elapsed.h and cmetrohash.h also have some...

I think all the compilers allow it.

mikkelfj · 2021-08-12T18:42:56Z

BTW: way back in version 0.4.0 a branch was made with big endian support. Here the 4-byte file identifier was reversed, so "MONS" became "SNOM". It lead to all sort of problems, as you can imagine.

…ptions use struct size assertions when FLATCC_RELAXED_IDENTIFIER_SIZE is not defined

Rjvs · 2021-08-21T09:51:26Z

Supporting multiple offsets within the same codebase is a feature that this PR opens the gateway toward... but the ability to specify 64-bit offsets (which is mentioned in the docs as possible) seems useful in it's own right. For anyone that just wants to be able to specify 16 or 64bit offsets, I think this PR does the job, without changing anything for normal users.

There are a number of reasons why a user might want to modify namespaces and this PR does add another one but to me it is a completely separate issue.

mikkelfj · 2021-08-21T11:55:25Z

@Rjvs

Supporting multiple offsets within the same codebase is a feature that this PR opens the gateway toward... but the ability to specify 64-bit offsets (which is mentioned in the docs as possible) seems useful in it's own right

Good point. It should be possible to just redefine the offset size and have it work. Different namespaces is the advanced version and perhaps it would be good to focus on the basics first. In either case, it won't go in before the next release.

add FLATCC_OFFSET_SIZE to CMakeLists.txt

b561f67

mikkelfj reviewed Aug 12, 2021

View reviewed changes

src/runtime/builder.c Outdated Show resolved Hide resolved

TYoungSL force-pushed the feature/bitness branch from 926e814 to b0d78ef Compare August 12, 2021 14:41

TYoungSL force-pushed the feature/bitness branch from b0d78ef to 217261a Compare August 12, 2021 17:42

support variable offset sizes for real

6b72542

TYoungSL force-pushed the feature/bitness branch from 217261a to 6b72542 Compare August 12, 2021 17:49

modified gen_prologue to not plug in custom definitions in the common…

9144c94

… header

replace FLATBUFFERS_ option prefix with FLATCC_ for IDENTIFIER_SIZE o…

8cc37ac

…ptions use struct size assertions when FLATCC_RELAXED_IDENTIFIER_SIZE is not defined

TYoungSL mentioned this pull request Aug 26, 2021

Offset Model (8-bit, 16-bit, 64-bit offset sizes, file identifier sizes) jamescourtney/FlatSharp#207

Closed

mikkelfj force-pushed the master branch 4 times, most recently from a8e7a8a to d7f50b2 Compare November 1, 2022 10:48

mikkelfj force-pushed the master branch 2 times, most recently from bd8f4fa to 3e4bd05 Compare March 12, 2024 14:09

mikkelfj force-pushed the master branch from aeccd46 to 56d2559 Compare March 12, 2024 14:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Actually support non-standard offset sizes (64-bit, 16-bit) #206

Actually support non-standard offset sizes (64-bit, 16-bit) #206

TYoungSL commented Aug 11, 2021 •

edited

Loading

mikkelfj commented Aug 12, 2021

mikkelfj commented Aug 12, 2021 •

edited

Loading

TYoungSL commented Aug 12, 2021 •

edited

Loading

TYoungSL commented Aug 12, 2021 •

edited

Loading

mikkelfj commented Aug 12, 2021

TYoungSL commented Aug 12, 2021 •

edited

Loading

mikkelfj commented Aug 12, 2021

mikkelfj commented Aug 12, 2021

mikkelfj commented Aug 12, 2021

TYoungSL commented Aug 12, 2021

mikkelfj commented Aug 12, 2021

TYoungSL commented Aug 12, 2021 •

edited

Loading

mikkelfj commented Aug 12, 2021

Rjvs commented Aug 21, 2021

mikkelfj commented Aug 21, 2021

Actually support non-standard offset sizes (64-bit, 16-bit) #206

Are you sure you want to change the base?

Actually support non-standard offset sizes (64-bit, 16-bit) #206

Conversation

TYoungSL commented Aug 11, 2021 • edited Loading

mikkelfj commented Aug 12, 2021

mikkelfj commented Aug 12, 2021 • edited Loading

TYoungSL commented Aug 12, 2021 • edited Loading

TYoungSL commented Aug 12, 2021 • edited Loading

mikkelfj commented Aug 12, 2021

TYoungSL commented Aug 12, 2021 • edited Loading

mikkelfj commented Aug 12, 2021

mikkelfj commented Aug 12, 2021

mikkelfj commented Aug 12, 2021

TYoungSL commented Aug 12, 2021

mikkelfj commented Aug 12, 2021

TYoungSL commented Aug 12, 2021 • edited Loading

mikkelfj commented Aug 12, 2021

Rjvs commented Aug 21, 2021

mikkelfj commented Aug 21, 2021

TYoungSL commented Aug 11, 2021 •

edited

Loading

mikkelfj commented Aug 12, 2021 •

edited

Loading

TYoungSL commented Aug 12, 2021 •

edited

Loading

TYoungSL commented Aug 12, 2021 •

edited

Loading

TYoungSL commented Aug 12, 2021 •

edited

Loading

TYoungSL commented Aug 12, 2021 •

edited

Loading