-
Notifications
You must be signed in to change notification settings - Fork 13
FileFormatDrafts
Note: At the moment all of these pages are draft pages, so feel free to add things you think are missing or correct errors. There is a Gitter chat room where things can be discussed. Some topics where further discussion might be useful are marked with (std) ("subject to discussion")
Current draft by Márcio Pais: https://files.gitter.im/encode-ru-Community-Archiver/Lobby/cnFl/Fairytale-File-Format.pdf
Alternative description by Christian Schneider:
The length column in the tables below can contain "VLI" which stands for "Variable Length Integer". This data structure is of variable length (1-9 bytes) and encodes a 64 bit integer the following way: The first bit in each byte is a flag. If it is set, there will be more bytes following. If not, this byte is the last one. The other bits of the bytes each encode 7 bits of the integer value. For example, the following VLI: 11010101 00110110 encodes the binary value 0110110 1010101 which is 6997 in decimal. Note that the order of the 7 bit "packages" is reversed. The code for encoding and decoding VLIs can be found in fairytale.cpp, methods vliEncode
and vliDecode
.
When files are compressed by Fairytale, an archive file with extension .ftl (std) is created that contains everything that is needed to restore the original files. The current draft of the format is:
Description | Length |
---|---|
Magic bytes | 6 bytes (std) |
Offset to the first structure | 8 bytes |
Compressed block data | variable |
Directory tree structure | variable |
File structure | variable |
Codec structure | variable |
Block segmentation structure | variable |
The beginning of the file identifies it as an Fairytale archive. Storing the version number is important because it's very likely that different versions of Fairytale will create incompatible files that can't be processed by other versions.
Description | Length |
---|---|
"FTL" (std) | 3 bytes |
version number | 3 bytes (std) |
As the size of the compressed data is not fixed, this offset is stored to allow skipping it. This allows to read the "meta" structures following the compressed data without the need to parse the data itself.
Everything following the compressed data is a "structure" with the following format:
Description | Length |
---|---|
Structure size in bytes | VLI |
Data | variable |
CRC32 checksum | 4 bytes |
The checksum takes both the first field (structure size) and the data into account.
(to be done)
The file structure contains data for one or more files stored in the archive. For each of these files, this format is used:
Description | Length |
---|---|
Directory ID | VLI |
Length of filename | VLI |
Filename | variable |
Length of metadata (std) | VLI |
File metadata | variable (std) |
Number of blocks | VLI |
Block 0 ID | VLI |
... | |
Block N ID | VLI |
(to be done)
(to be done)
(to be done)
The recovery file format is intended to protect against different types of corruption. It is intended as a wrapper around the fairytale file format similar to .tar.gz. That way the recovery file format can also be used independently of fairytale. During decompression Fairytale will check for the recovery header, if it is present it will used i/o classes to transparently access the Fairytale file protected inside the recovery format.
- protection against flipped bits. May it be single bits or whole hdd-sectors
- recovery from failed storage media.
- multi part archives
- encryption?
Data will be split into blocks which should ideally correspond th file system blocks / hdd sectors. 4k may be reasonable. Each block has the following structure:
- Marker: 2 bytes
- UUID: 8 bytes
- Frame ID: VLI (starts with 1, indicator for last block is 0)
- Recovery parameters. Possibly only present in Blocks 1, 2, 4, 8, 16, ... and 0.
- Payload data
- Checksum: 4 bytes CRC32C
It may still be subject to change. The important info for the Fairytale format right now is that it only needs to check the first two bytes to decide on how to read its data. This way the recovery format can be implemented later.
- How to protect against lost frames as efficiently as possible?