[read-fonts] gvar: improve delta decoding perf #1235

dfrg · 2024-11-13T18:04:00Z

Determining the delta count and offset to the y-deltas requires processing the full stream, but we really only need to read the control byte for each run to do so.

This gives a roughly 30%(!) boost in loading outlines from variable TrueType fonts.

Determining the delta count and offset to the y-deltas requires processing the full stream, but we really only need to read the control byte for each run to do so. This gives a roughly 30%(!) boost in loading outlines from variable TrueType fonts.

dfrg · 2024-11-13T18:08:35Z

read-fonts/src/tables/variations.rs

-        // if no items remain in this run, start the next one.
-        // NOTE: we use `while` so we can sanely handle the case where some
-        // run in the middle of the data has an explicit zero length
-        //TODO: create a font with data of this shape and go crash some font parsers
-        while self.remaining_in_run == 0 {
+        if self.remaining_in_run == 0 {
            let control: u8 = self.cursor.read().ok()?;
            self.value_type = DeltaRunType::new(control);
            self.remaining_in_run = (control & DELTA_RUN_COUNT_MASK) + 1;
        }


Given the + 1, it looks like encoding an explicit zero length run is impossible, so I removed this loop and comment.

cmyr

looks good!

dfrg · 2024-11-15T02:05:11Z

Turns out my new code had slightly different behavior in that it made an assumption that the boundary between x and y coordinate deltas always aligned with a packed run boundary. This seems to be the case in all fonts I tested. write-fonts specifically encodes the coordinates separately so this is always true and the same probably applies to fontmake. But the spec notably does not make this guarantee so I didn't want this to bite us later.

Fixed the code to handle this case and added a test for it.

rsheeter · 2024-11-15T14:17:28Z

This gives a roughly 30%(!) boost in loading outlines from variable TrueType fonts.

Can this be specified in terms of a reproducible test? Run x on main, get result y, run x on my branch, get result less-than-y?

rsheeter · 2024-11-15T14:19:46Z

read-fonts/src/tables/variations.rs

@@ -473,6 +482,41 @@ impl<'a> DeltaRunIter<'a> {
        while self.next().is_some() {}
        self.cursor
    }
+
+    /// Skips `n` deltas without reading the actual delta values.


Seems like it would be faster still to have precomputed start of y deltas. Spec opportunity that we could support with measurable perf impact?

I imagine it would improve performance but the size increase might be substantial. I was actually thinking the opposite— that teaching write-fonts to pack the x/y streams together could shave off a nice chunk of bytes in a large CJK variable font.

While it's not specified in the spec, I don't think anyone in their sane mind would get the idea that they can be encoded together. HB doesn't support decoding them as such. I don't think FreeType does either.

It would be an interesting data point to know what the impact on perf would be if we could jump rathern than read-to-skip.

rsheeter · 2024-11-15T14:21:36Z

read-fonts/src/tables/variations.rs

@@ -806,7 +856,7 @@ pub struct TupleDeltaIter<'a, T> {
    points: Option<PackedPointNumbersIter<'a>>,
    next_point: usize,
    x_iter: DeltaRunIter<'a>,
-    y_iter: Option<Skip<DeltaRunIter<'a>>>,
+    y_iter: Option<DeltaRunIter<'a>>,


Optional, feels a little weird that only y is an option, could it just be an iter that's empty instead of None?

The iterator would need to produce an infinite stream of zeroes to fit into the current logic and I think that complication might lead to bugs.

I’ll replace this with an enum that holds two iters for points and one for scalars which should make this more clear.

That rationale would make a fine comment :)

dfrg · 2024-11-16T00:18:57Z

This gives a roughly 30%(!) boost in loading outlines from variable TrueType fonts.

Can this be specified in terms of a reproducible test? Run x on main, get result y, run x on my branch, get result less-than-y?

Agreed something like this would be nice. Filed #1247 to investigate doing some automated performance regression testing.

dfrg commented Nov 13, 2024

View reviewed changes

clippy

9fd47aa

cmyr approved these changes Nov 13, 2024

View reviewed changes

dfrg added 2 commits November 14, 2024 20:57

handle case where packed run crosses coord boundary

4f05c4c

correct comment

0d66868

rsheeter reviewed Nov 15, 2024

View reviewed changes

rsheeter approved these changes Nov 15, 2024

View reviewed changes

enum instead of option for delta run iters

44f8518

dfrg merged commit 4bc59b1 into main Nov 17, 2024
10 checks passed

dfrg deleted the faster-deltas branch November 17, 2024 17:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[read-fonts] gvar: improve delta decoding perf #1235

[read-fonts] gvar: improve delta decoding perf #1235

dfrg commented Nov 13, 2024

dfrg Nov 13, 2024

cmyr left a comment

dfrg commented Nov 15, 2024

rsheeter commented Nov 15, 2024

rsheeter Nov 15, 2024

dfrg Nov 16, 2024

behdad Nov 17, 2024

rsheeter Nov 18, 2024

rsheeter Nov 15, 2024

dfrg Nov 16, 2024

rsheeter Nov 18, 2024

dfrg commented Nov 16, 2024

[read-fonts] gvar: improve delta decoding perf #1235

[read-fonts] gvar: improve delta decoding perf #1235

Conversation

dfrg commented Nov 13, 2024

Choose a reason for hiding this comment

cmyr left a comment

Choose a reason for hiding this comment

dfrg commented Nov 15, 2024

rsheeter commented Nov 15, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dfrg commented Nov 16, 2024