Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full font embedding #1322

Merged
merged 1 commit into from
Jan 15, 2024
Merged

Full font embedding #1322

merged 1 commit into from
Jan 15, 2024

Conversation

pointlessone
Copy link
Member

This add an option to disable font subsetting. Original fonts can be embedded in full original form.

This feature can make documents substantially bigger. In addition to embedded fonts being bigger PDF requires additional information in order to properly render text. Specifically, it requires glyph widths. Some fonts contain thousands of glyps. A thousand of glyph widths on average would result in about 4 Kb additional size of the document. Additionally, PDF requires another mapping to make the text intelligible when copying. This additional size is much harder to estimate as it greatly depend on the font coverage but usually on the order of ~1-10 Kb per font.

Intended use case is a workaround for when TTFunk breaks fonts in subsetting. But also this might be useful for documents that are going to be edited. For example, documents that are templates and more text would be added later, or AcroForm feature that allows end users to fill forms.

This add an option to disable font subsetting. Original fonts can be
embedded in full original form.

This feature can make documents substantially bigger. In addition to
embedded fonts being bigger PDF requires additional information in order
to properly render text. Specifically, it requires glyph widths. Some
fonts contain thousands of glyps. A thousand of glyph widths on average
would result in about 4 Kb additional size of the document.
Additionally, PDF requires another mapping to make the text intelligible
when copying. This additional size is much harder to estimate as it
greatly depend on the font coverage but usually on the order of ~1-10
Kb per font.

Intended use case is a workaround for when TTFunk breaks fonts in
subsetting. But also this might be useful for documents that are going
to be edited. For example, documents that are templates and more text
would be added later, or AcroForm feature that allows end users to fill
forms.
@pointlessone pointlessone merged commit 385116d into master Jan 15, 2024
28 of 30 checks passed
@pointlessone pointlessone deleted the full-font-embedding branch January 15, 2024 14:18
@mojavelinux
Copy link
Contributor

This change caused numerous test errors in the visual tests for Asciidoctor PDF even without the new option turned on to embed the full font. It seems there's more to this change than just the option to embed the full font. Missing glyphs now seem to have variable width, whereas before they had the width of the missing/not glyph (typically a fixed-width box). (See https://github.com/asciidoctor/asciidoctor-pdf/blob/main/spec/reference/font-i18n-default.pdf). It may be this is just going to be how it is now (and I'll have to figure out how to accommodate this change in the tests). But I'm just letting you know. Perhaps notice may be warranted to inform users of the change. Ideally, however, it would continue to operate as it does today without the new option.

@pointlessone
Copy link
Member Author

@mojavelinux Text is hard, man…

Do you have by chance a document that I could look at before and after that demonstrates the issue?

@mojavelinux
Copy link
Contributor

Text is hard, man…

Trust me, I get it. I've been messing around with text in PDF for over a decade now. It's not only hard, it's tedious ;)

Do you have by chance a document that I could look at before and after that demonstrates the issue?

The following test demonstrates the problem:

https://github.com/asciidoctor/asciidoctor-pdf/blob/main/spec/font_spec.rb#L7-L11

There's nothing really special that test is doing other than converting the file with a bunch of non-Latin characters in it to PDF and comparing it against the expected result.

If you are interested in digging into it, I suppose I could distill that down to code that only uses Prawn.

@pointlessone
Copy link
Member Author

@mojavelinux I'd love a minimal reproduction script that demonstrates the issue. But even just the same document before and after the change might be useful. At the moment I don't even know where to start. Admittedly, I'm not familiar with asciidoc source code.

@mojavelinux
Copy link
Contributor

mojavelinux commented Jan 22, 2024

This has nothing to do with AsciiDoc. The issue is when a glyph needed by the text is missing from a non-AFM (e.g., TTF) font. Prior to this change, the width would be fixed to the size of the missing glyph (so each missing glyph in a fragment was adjacent to the previous one). After this change, the width is 0.

Here's a reproducible test case using Prawn's own test suite:

# frozen_string_literal: true

require 'spec_helper'

class PDF::Inspector::TextWithStringWidths < PDF::Inspector::Text
  attr_reader :string_widths

  def initialize(*)
    super
    @string_widths = []
  end

  def show_text text, kerned = false
    super
    @string_widths << ((@state.current_font.unpack text).reduce(0) do |width, code|
      width + (@state.current_font.glyph_width code) * @font_settings[-1][:size] / 1000.0
    end)
  end
end

describe Prawn::Font do
  it '.only does not change width of unknown glyph' do
    pdf = Prawn::Document.new do
      font_families.update(
        'DejaVu Sans' => {
          normal: "#{Prawn::DATADIR}/fonts/DejaVuSans.ttf",
          bold: "#{Prawn::DATADIR}/fonts/DejaVuSans-Bold.ttf"
        },
      )

      # changing option to subset: false fixes issue (albeit using different behavior)
      font 'DejaVu Sans', subset: true do
        text '日本語<b>end</b>', inline_format: true
      end
    end

    rendered_pdf = pdf.render
    File.write '/tmp/debug.pdf', rendered_pdf
    analyzed_pdf = PDF::Inspector::TextWithStringWidths.analyze(rendered_pdf)
    expect(analyzed_pdf.string_widths.length).to eql(2)
    expect(analyzed_pdf.string_widths[0]).to be > 0
  end
end

On the master branch, this test fails. If you roll back to the commit before this change (to 772a41e), it passes. You can also look at debug.pdf to see that the missing glyphs within the fragment are all sitting on top of one another. Note that the position of the next fragment seems to be correct in either case.

Changing the option to subset: false also fixes the issue, which means that when the full font is embedded, it works as before. While that's a workaround, it does not preserve the previous behavior when font subsetting is enabled.

@pointlessone
Copy link
Member Author

@mojavelinux Could you please try #1327 and let me know if it fixes the issue?

@mojavelinux
Copy link
Contributor

Yep, that did it! Thanks a bunch. I appreciate you taking the time to circle back and apply this update.

(My apologies again for my misdirected comment on 1103. Since these were both font-related issues, I wrongly assumed it was the same thread).

@Skulli
Copy link

Skulli commented Apr 8, 2024

See #1347 having issues since 2.5 but only with Adobe Acrobat

# A hash font definition can specify a number of options:
#
# - :file -- path to the font file (required)
# - :subset -- whether to subset the font (default false). Only

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the default is true

@laurafeier
Copy link

Had issues with characters not rendered after 2.5 upgrade using .ttf fonts. Fixed this by setting font to subset: false.

Example:
font_families.update("family_name" => { bold: { file: "path/to/file", subset: false } })

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants