Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PageLabel Num Trees not starting with 0 #118

Open
asciim0 opened this issue Jun 4, 2024 · 5 comments
Open

PageLabel Num Trees not starting with 0 #118

asciim0 opened this issue Jun 4, 2024 · 5 comments
Assignees
Labels
enhancement New feature or request

Comments

@asciim0
Copy link

asciim0 commented Jun 4, 2024

Quick question about what is in scope for Arlington and what isn't. If I understand correctly, for PageLabels Arlington checks against the Table 161 in ISO32000-2:2017. However, there is this additional sentence in the spec for the numbered tree:
"The tree shall include a value for page index 0." (see p. 455 and within table 161).
I've encountered a file in the wild that was created with a numbered tree not starting with 0 and have build a small synthetic file (attached) based on that where the tree is:
/Nums [1 <</S /D>> 2 <</S /r>>]

I know that different PDF readers handle Page Label display with a numbered tree not starting at 0 differently - some just stick to decimal Arabic numerals per default, some interprete the tree as is, ignoring the error.
My questions are:

  • do you consider the numbered tree as above an error?
  • is it considered within scope for the Arlington model to detect these errors that are contained in the spec but outside of the tables describing the dictionaries

I checked the file with Arlington model (via veraPDF's implementation) and it came back as having no deviations.

hello_label_wrong.pdf

@petervwyatt
Copy link
Member

petervwyatt commented Jun 5, 2024

Any explicit "shall" statement related to file format in ISO 32000-2 is in scope for Arlington... but locating and encoding them all via predicates is a huge and ongoing task! ATM most of the captured requirements are those stated within the Tables. So, yes, the requirement "The tree shall include a value for page index 0." is definitely in scope.

Because PageLabels use the complexity of a number-tree it's not as simple as just testing for a specific key value. Please refer to the "Proposals for future predicates" in INTERNAL_GRAMMAR.md - such a rule might look like:

fn:Eval(fn:IsPresent(fn:IsNumberTreeValue(PageLabels,0))) in the "SpecialCase"field of Catalog.tsv for PageLabels

I also try to make the predicates "read aloud" as close as possible to the ISO 32000-2 requirement - I think for this case that is close enough.

@petervwyatt petervwyatt self-assigned this Jun 5, 2024
@petervwyatt petervwyatt added this to the PDF Data Model milestone Jun 5, 2024
@petervwyatt petervwyatt added the enhancement New feature or request label Jun 5, 2024
@MaximPlusov
Copy link
Contributor

MaximPlusov commented Jun 5, 2024

Probably rule should look like: fn:Eval(fn:IsNumberTreeIndex(PageLabels,0))

@petervwyatt
Copy link
Member

Right you are, but I'd still want to wrap it with an IsPresent: fn:IsPresent(fn:IsNumberTreeIndex(PageLabels,0)) so it reads aloud closer the spec wording.

@MaximPlusov
Copy link
Contributor

I think using IsPresent is wrong:
INERNAL GRAMMAR: For a single parameter: asserts that the current row must be present in a PDF ... when the expression expr is true.
So fn:IsPresent(fn:IsNumberTreeIndex(PageLabels,0)) checks only that PageLabels must be present when PageLabels tree include 0.

@petervwyatt
Copy link
Member

I was thinking about the negative case where index 0 is mapped to the null object and how to guard against that. I do NOT want to encode or assume the null processing rule as that is a processing behaviour and not part of the model itself. This comes down to how the name-tree and number-tree predicates are defined since *-tree are primitive types in Arlington - may well need another fn:IsNotNull()-style predicate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants