Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve support for serializing data containing offsets (FilePtr, etc.) #4

Open
kitlith opened this issue Nov 1, 2020 · 3 comments
Labels
enhancement New feature or request

Comments

@kitlith
Copy link
Collaborator

kitlith commented Nov 1, 2020

Here's what roblabla had to say (on discord):

actually with seek+write, fileptr seems doable with some global context. Here's the idea: when you get a FilePtr, record the current location, write some zeroes, and schedule a sort of "late serialization" pass. When the structure is fully serialized, all scheduled late serialization pass will run sequentially. That serialization pass will just serialize the underlying structure, seek to the previously saved location, and write the proper offset

that does mean you'd need some sort of heap allocation support to keep track of the serialization passes

but I can't think of a better way.

I had a similar idea awhile back (though I wasn't looking to implement at the library level) but i don't think this'll quite work for some types of files -- specifically those which break it into different sections, and different things go into different sections. i.e.:

struct File {
    info: Vec<Info>,
    data: DataSection
}

struct DataSection {
    // header...
    // data blocks go here
}

struct InfoSection {
    // header...
    list: Vec<Info>,
    // string blocks go here
}

struct Info {
    data_name: FilePtr32<String>, // points into InfoSection
    data_ptr: FilePtr32<Vec<u8>> // points into DataSection
}

under the scheme of the current idea i don't think we have a good way to represent this: we'd write out info section, info, info, info, data section, string, data, string, data, string, data instead of info section, info, info, info, string, string, string, data section, data, data, data. I think this is fixable by adding some sort of marker type Pool with some way of specifying a identifier such that you can do the same thing as the original idea, but for each pool in turn as you come across them while writing the file.

Are there any other methods we might want to consider?

@jam1garner
Copy link
Owner

I'd say we might want to go the route of file offset calculations, leading to a BinWrite trait that looks something like...

trait BinWrite {
    type Args;

    fn write_options<W: binrw::io::Write>(&self, writer: &mut W, options: &WriterOption, args: Self::Args, file_heap_pos: &mut u64) -> binrw::Result<()>;
    fn get_write_size(&self, options: &WriterOption, args: Self::Args) -> binrw::Result<u64>;
    fn write_file_heap_contents(&self, writer: &mut W, options: &WriterOption, args: Self::Args, file_heap_pos: &mut u64) -> binrw::Result<()>;
}

The general concept being:

  1. The file_heap_pos starts at a value of relative_to + first_value.get_write_size() (that is typically just... immediately after the top-level BinWrite struct). So if I have a header of pointers of size 0x10, the file_heap_pos is initialized to 0x10.
  2. The file_heap_pos is updated any time a pointer is written. So in a FilePtr<T>'s write_options implementation, you would write the current value of file_heap_pos and then increment it by inner.get_write_size().
  3. After the write_options pass on the top-level struct, write_file_heap_contents is called on it (and then it recursively calls it on everything else), thus writing the file contents in the same order as before.

Thoughts?

Some known drawbacks:

  1. this doesn't allow much room for deciding the layout of how things are allocated without a manually binwrite implementation. Not sure if there's really a great way to handle that though?
  2. this should allow for alignment of file pointers, but we need to figure out the interface for that

jam1garner added a commit that referenced this issue Mar 12, 2021
@kitlith
Copy link
Collaborator Author

kitlith commented Apr 3, 2021

fwiw my current take on this is that we should probably not provide an implementation of BinWrite for FilePtr for now and experiment out-of-tree for a bit. File formats are going to have different requirements for how things are positioned, and I don't think we have enough information at the moment to properly cover every file format.

what i think we should do instead is focus on having tools available such that people can implement a serialization scheme for their file pointers on top using a newtype, (or, heck, i guess serialize_with could be a thing) in a somewhat ergonomic fashion. To me, this means exposing some sort of user-expandable ReadOptions type thing that the user can stick some sort of mutable state (through rc?) in and perhaps implement the scheme that jam suggested in previous message, or some other scheme. Or maybe we do something else, idk.

once we experiment enough out of tree we can revisit actually including something in binrw proper. does that make sense?

@ScanMountGoat
Copy link
Contributor

I've successfully implemented a two pass writing approach for offsets that I've used for a number of different binary formats across multiple projects. In most cases, the writing can be fully derived. Some formats require manually defining the second "layout" pass to match the original layout. I'll link the solution here for anyone else that stumbles onto this issue. It's also fairly simple to implement.
https://docs.rs/xc3_write/0.13.0/xc3_write/

Original post on discord:

How big of a change would it be to add an associated type for the return type for binwrite? I'm trying to see it's doable to add support for calculating offsets for binwrite. My separate write implementation uses the return type for the first pass to return the positions of offset fields to update later.

// Fill in 0's for any offsets.
let offsets = value.write(...)?;
// Go back and write the pointed to data.
// This often needs to be implemented manually
// for adjusting item order, string sections, etc.
offsets.offset3.update_and_write(...)?;
offsets.offset1.update_and_write(...)?;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: No status
Status: Todo
Development

No branches or pull requests

4 participants