-
Notifications
You must be signed in to change notification settings - Fork 688
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Full font embedding #1322
Full font embedding #1322
Conversation
b1a6232
to
fcf1e74
Compare
fcf1e74
to
4abd211
Compare
4abd211
to
59c1221
Compare
This add an option to disable font subsetting. Original fonts can be embedded in full original form. This feature can make documents substantially bigger. In addition to embedded fonts being bigger PDF requires additional information in order to properly render text. Specifically, it requires glyph widths. Some fonts contain thousands of glyps. A thousand of glyph widths on average would result in about 4 Kb additional size of the document. Additionally, PDF requires another mapping to make the text intelligible when copying. This additional size is much harder to estimate as it greatly depend on the font coverage but usually on the order of ~1-10 Kb per font. Intended use case is a workaround for when TTFunk breaks fonts in subsetting. But also this might be useful for documents that are going to be edited. For example, documents that are templates and more text would be added later, or AcroForm feature that allows end users to fill forms.
59c1221
to
528a37d
Compare
This change caused numerous test errors in the visual tests for Asciidoctor PDF even without the new option turned on to embed the full font. It seems there's more to this change than just the option to embed the full font. Missing glyphs now seem to have variable width, whereas before they had the width of the missing/not glyph (typically a fixed-width box). (See https://github.com/asciidoctor/asciidoctor-pdf/blob/main/spec/reference/font-i18n-default.pdf). It may be this is just going to be how it is now (and I'll have to figure out how to accommodate this change in the tests). But I'm just letting you know. Perhaps notice may be warranted to inform users of the change. Ideally, however, it would continue to operate as it does today without the new option. |
@mojavelinux Text is hard, man… Do you have by chance a document that I could look at before and after that demonstrates the issue? |
Trust me, I get it. I've been messing around with text in PDF for over a decade now. It's not only hard, it's tedious ;)
The following test demonstrates the problem: https://github.com/asciidoctor/asciidoctor-pdf/blob/main/spec/font_spec.rb#L7-L11 There's nothing really special that test is doing other than converting the file with a bunch of non-Latin characters in it to PDF and comparing it against the expected result. If you are interested in digging into it, I suppose I could distill that down to code that only uses Prawn. |
@mojavelinux I'd love a minimal reproduction script that demonstrates the issue. But even just the same document before and after the change might be useful. At the moment I don't even know where to start. Admittedly, I'm not familiar with asciidoc source code. |
This has nothing to do with AsciiDoc. The issue is when a glyph needed by the text is missing from a non-AFM (e.g., TTF) font. Prior to this change, the width would be fixed to the size of the missing glyph (so each missing glyph in a fragment was adjacent to the previous one). After this change, the width is 0. Here's a reproducible test case using Prawn's own test suite: # frozen_string_literal: true
require 'spec_helper'
class PDF::Inspector::TextWithStringWidths < PDF::Inspector::Text
attr_reader :string_widths
def initialize(*)
super
@string_widths = []
end
def show_text text, kerned = false
super
@string_widths << ((@state.current_font.unpack text).reduce(0) do |width, code|
width + (@state.current_font.glyph_width code) * @font_settings[-1][:size] / 1000.0
end)
end
end
describe Prawn::Font do
it '.only does not change width of unknown glyph' do
pdf = Prawn::Document.new do
font_families.update(
'DejaVu Sans' => {
normal: "#{Prawn::DATADIR}/fonts/DejaVuSans.ttf",
bold: "#{Prawn::DATADIR}/fonts/DejaVuSans-Bold.ttf"
},
)
# changing option to subset: false fixes issue (albeit using different behavior)
font 'DejaVu Sans', subset: true do
text '日本語<b>end</b>', inline_format: true
end
end
rendered_pdf = pdf.render
File.write '/tmp/debug.pdf', rendered_pdf
analyzed_pdf = PDF::Inspector::TextWithStringWidths.analyze(rendered_pdf)
expect(analyzed_pdf.string_widths.length).to eql(2)
expect(analyzed_pdf.string_widths[0]).to be > 0
end
end On the master branch, this test fails. If you roll back to the commit before this change (to 772a41e), it passes. You can also look at debug.pdf to see that the missing glyphs within the fragment are all sitting on top of one another. Note that the position of the next fragment seems to be correct in either case. Changing the option to |
@mojavelinux Could you please try #1327 and let me know if it fixes the issue? |
Yep, that did it! Thanks a bunch. I appreciate you taking the time to circle back and apply this update. (My apologies again for my misdirected comment on 1103. Since these were both font-related issues, I wrongly assumed it was the same thread). |
See #1347 having issues since 2.5 but only with Adobe Acrobat |
# A hash font definition can specify a number of options: | ||
# | ||
# - :file -- path to the font file (required) | ||
# - :subset -- whether to subset the font (default false). Only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the default is true
Had issues with characters not rendered after 2.5 upgrade using Example: |
This add an option to disable font subsetting. Original fonts can be embedded in full original form.
This feature can make documents substantially bigger. In addition to embedded fonts being bigger PDF requires additional information in order to properly render text. Specifically, it requires glyph widths. Some fonts contain thousands of glyps. A thousand of glyph widths on average would result in about 4 Kb additional size of the document. Additionally, PDF requires another mapping to make the text intelligible when copying. This additional size is much harder to estimate as it greatly depend on the font coverage but usually on the order of ~1-10 Kb per font.
Intended use case is a workaround for when TTFunk breaks fonts in subsetting. But also this might be useful for documents that are going to be edited. For example, documents that are templates and more text would be added later, or AcroForm feature that allows end users to fill forms.