Skip to content

Commit

Permalink
WIP: S2++
Browse files Browse the repository at this point in the history
If the first bytes of a block is `0x40, 0x00` (repeat, length 4), this indicates that all [Copy with 4-byte offset (11)](https://github.com/google/snappy/blob/main/format_description.txt#L106) are all 3 bytes instead for the remainder of the block.

There can be no literals before this tag and no repeats before a match as specified above.
This will only trigger on this exact tag.

> These are like the copies with 2-byte offsets (see previous subsection),
> except that the offset is stored as a 24-bit integer instead of a
> 16-bit integer (and thus will occupy three bytes).

When in this mode the maximum backreference offset is 16777215.

This *cannot* be combined with dictionaries.
  • Loading branch information
klauspost committed Aug 7, 2023
1 parent c1dcc38 commit 8b85d78
Showing 1 changed file with 14 additions and 0 deletions.
14 changes: 14 additions & 0 deletions s2/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -1022,6 +1022,7 @@ See [using indexes](https://github.com/klauspost/compress/tree/master/s2#using-i
* Frame [Stream identifier](https://github.com/google/snappy/blob/master/framing_format.txt#L68) changed from `sNaPpY` to `S2sTwO`.
* [Framed compressed blocks](https://github.com/google/snappy/blob/master/format_description.txt) can be up to 4MB (up from 64KB).
* Compressed blocks can have an offset of `0`, which indicates to repeat the last seen offset.
* If the first bytes of a block is `0x80, 0x00, 0x00` (copy, 2 byte offset = 0), this indicates that all [Copy with 4-byte offset (11)](https://github.com/google/snappy/blob/main/format_description.txt#L106) are all 3 bytes instead for the remainder of the block.

Repeat offsets must be encoded as a [2.2.1. Copy with 1-byte offset (01)](https://github.com/google/snappy/blob/master/format_description.txt#L89), where the offset is 0.

Expand All @@ -1047,6 +1048,19 @@ The first copy of a block cannot be a repeat offset and the offset is reset on e

Default streaming block size is 1MB.

## 3 Byte Offsets

If the first bytes of a block is `0x80, 0x00, 0x00` (copy, 2 byte offset = 0), this indicates that all [Copy with 4-byte offset (11)](https://github.com/google/snappy/blob/main/format_description.txt#L106) are all 3 bytes instead for the remainder of the block.

There can be no literals before this tag and no repeats before a match as specified above.
This will only trigger on this exact tag.

> These are like the copies with 2-byte offsets (see previous subsection),
> except that the offset is stored as a 24-bit integer instead of a
> 16-bit integer (and thus will occupy three bytes).
When in this mode the maximum backreference offset is 16777215.

# Dictionary Encoding

Adding dictionaries allow providing a custom dictionary that will serve as lookup in the beginning of blocks.
Expand Down

0 comments on commit 8b85d78

Please sign in to comment.