FileFormatDrafts

File Format drafts

Note: At the moment all of these pages are draft pages, so feel free to add things you think are missing or correct errors. There is a Gitter chat room where things can be discussed. Some topics where further discussion might be useful are marked with (std) ("subject to discussion")

Current draft by Márcio Pais: https://files.gitter.im/encode-ru-Community-Archiver/Lobby/cnFl/Fairytale-File-Format.pdf

Alternative description by Christian Schneider:

The length column in the tables below can contain "VLI" which stands for "Variable Length Integer". This data structure is of variable length (1-9 bytes) and encodes a 64 bit integer the following way: The first bit in each byte is a flag. If it is set, there will be more bytes following. If not, this byte is the last one. The other bits of the bytes each encode 7 bits of the integer value. For example, the following VLI: 11010101 00110110 encodes the binary value 0110110 1010101 which is 6997 in decimal. Note that the order of the 7 bit "packages" is reversed. The code for encoding and decoding VLIs can be found in fairytale.cpp, methods vliEncode and vliDecode.

Archive file format

When files are compressed by Fairytale, an archive file with extension .ftl (std) is created that contains everything that is needed to restore the original files. The current draft of the format is:

Description	Length
Magic bytes	6 bytes (std)
Offset to the first structure	8 bytes
Compressed block data	variable
Directory tree structure	variable
File structure	variable
Codec structure	variable
Block segmentation structure	variable

Magic bytes

The beginning of the file identifies it as an Fairytale archive. Storing the version number is important because it's very likely that different versions of Fairytale will create incompatible files that can't be processed by other versions.

Description	Length
"FTL" (std)	3 bytes
version number	3 bytes (std)

Offset to the first structure

As the size of the compressed data is not fixed, this offset is stored to allow skipping it. This allows to read the "meta" structures following the compressed data without the need to parse the data itself.

Structures

Everything following the compressed data is a "structure" with the following format:

Description	Length
Structure size in bytes	VLI
Data	variable
CRC32 checksum	4 bytes

The checksum takes both the first field (structure size) and the data into account.

Directory tree structure

(to be done)

File structure

The file structure contains data for one or more files stored in the archive. For each of these files, this format is used:

Description	Length
Directory ID	VLI
Length of filename	VLI
Filename	variable
Length of metadata (std)	VLI
File metadata	variable (std)
Number of blocks	VLI
Block 0 ID	VLI
...
Block N ID	VLI

File metadata

(to be done)

Codec structure

(to be done)

Block segmentation structure

(to be done)

Recovery file format

The recovery file format is intended to protect against different types of corruption. It is intended as a wrapper around the fairytale file format similar to .tar.gz. That way the recovery file format can also be used independently of fairytale. During decompression Fairytale will check for the recovery header, if it is present it will used i/o classes to transparently access the Fairytale file protected inside the recovery format.

Features

protection against flipped bits. May it be single bits or whole hdd-sectors
recovery from failed storage media.
multi part archives
encryption?

File format

Data will be split into blocks which should ideally correspond th file system blocks / hdd sectors. 4k may be reasonable. Each block has the following structure:

Marker: 2 bytes
UUID: 8 bytes
Frame ID: VLI (starts with 1, indicator for last block is 0)
Recovery parameters. Possibly only present in Blocks 1, 2, 4, 8, 16, ... and 0.
Payload data
Checksum: 4 bytes CRC32C

It may still be subject to change. The important info for the Fairytale format right now is that it only needs to check the first two bytes to decide on how to read its data. This way the recovery format can be implemented later.

open questions:

How to protect against lost frames as efficiently as possible?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FileFormatDrafts

File Format drafts

Archive file format

Magic bytes

Offset to the first structure

Structures

Directory tree structure

File structure

File metadata

Codec structure

Block segmentation structure

Recovery file format

Features

File format

open questions:

Clone this wiki locally