Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix fuzzer bug with cmap4 #1101

Merged
merged 1 commit into from
Oct 3, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 36 additions & 0 deletions font-test-data/src/cmap.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
//! cmap test data for scenarios not readily produced with ttx
use read_fonts::{be_buffer, be_buffer_add, test_helpers::BeBuffer};

/// Contains two codepoint ranges, both [6, 64]. Surely you don't duplicate them?
pub fn repetitive_cmap4() -> BeBuffer {
// <https://learn.microsoft.com/en-us/typography/opentype/spec/cmap#format-4-segment-mapping-to-delta-values>
be_buffer! {
4_u16, // uint16 format
0_u16, // uint16 length, unused
0_u16, // uint16 language, unused
4_u16, // uint16 segCountX2, 2 * 2 segments
0_u16, // uint16 searchRange, unused
0_u16, // uint16 entrySelector, unused
0_u16, // uint16 rangeShift, unused
// segCount endCode entries
64_u16, // uint16 endCode[0]
64_u16, // uint16 endCode[1]

0_u16, // uint16 reservedPad, unused

// segCount startCode entries
6_u16, // uint16 startCode[0]
6_u16, // uint16 startCode[1]

// segCount idDelta entries
0_u16, // uint16 idDelta[0]
0_u16, // uint16 idDelta[1]

// segCount idRangeOffset entries
0_u16, // uint16 idRangeOffset[0]
0_u16 // uint16 idRangeOffset[1]

// no glyphIdArray entries
}
}
1 change: 1 addition & 0 deletions font-test-data/src/lib.rs
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
//! test data shared between various fontations crates.
pub mod cmap;
pub mod gdef;
pub mod gpos;
pub mod gsub;
Expand Down
27 changes: 26 additions & 1 deletion read-fonts/src/tables/cmap.rs
Original file line number Diff line number Diff line change
Expand Up @@ -164,7 +164,16 @@ impl<'a> Iterator for Cmap4Iter<'a> {
return Some((codepoint, glyph_id));
} else {
self.cur_range_ix += 1;
self.cur_range = self.subtable.code_range(self.cur_range_ix)?;
let next_range = self.subtable.code_range(self.cur_range_ix)?;
// Groups should be in order and non-overlapping so make sure
// that the start code of next group is at least current_end + 1.
// Also avoid start sliding backwards if we see data where end < start by taking the max
// of next.end and curr.end as the new end.
// This prevents timeout and bizarre results in the face of numerous overlapping ranges
// https://github.com/googlefonts/fontations/issues/1100
// cmap4 ranges are u16 so no need to stress about values past char::MAX
self.cur_range = next_range.start.max(self.cur_range.end)
..next_range.end.max(self.cur_range.end);
self.cur_start_code = self.cur_range.start as u16;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it matters in broken fonts, but I think self.cur_start_code should still be set to next_range.start even if we chop up the range because the remainder of the encoded data is likely dependent on the original value.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point but also don't want to produce duplicate values. I'm a touch low on time so lets get the immediate fix landed and defer that part to a follow-on. Filed as #1175.

}
}
Expand Down Expand Up @@ -749,4 +758,20 @@ mod tests {
_ => None,
})
}

/// <https://github.com/googlefonts/fontations/issues/1100>
///
/// Note that this doesn't demonstrate the timeout, merely that we've eliminated the underlying
/// enthusiasm for non-ascending ranges that enabled it
#[test]
fn cmap4_bad_data() {
let buf = font_test_data::cmap::repetitive_cmap4();
let cmap4 = Cmap4::read(FontData::new(buf.as_slice())).unwrap();

// we should have unique, ascending codepoints, not duplicates and overlaps
assert_eq!(
(6..=64).collect::<Vec<_>>(),
cmap4.iter().map(|(cp, _)| cp).collect::<Vec<_>>()
);
}
}