-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
arlington-pdf-model-checker fails if there are unknown dictionary keys #1349
Comments
PDF-1.7 specification reads at the start of section I.3 (it used to be available at https://opensource.adobe.com/dc-acrobat-sdk-docs/standards/pdfstandards/pdf/PDF32000_2008.pdf#G22.1086391):
I’m not sure a new entry from a newer format version is also backward-compatible with an older format version. I would say such an entry has to be just ignored (for compatibility, it should be recognized). I’m aware that the quote describes earlier software versions and not earlier format versions. But since almost everything in PDF is an entry in With a case, adding multimedia (section 13.2 of the PDF-1.7 spec) in a PDF-1.4 document. Media objects were added in version 1.5. If the reader can deal with them (from PDF-1.5, since they haven’t been deprecated), should the reader handle media objects in PDF-1.4? If the answer to my previous question is affirmative, in that case format versions might be a mess. Among many other things, because earlier format versions might incorporate entries from newer versions, but they may not follow any deprecation of entries defined in any newer version. At least for consistency between format versions and format features, each format version should ignore entries from newer format versions. I might be missing the whole point and I’m more than happy to be corrected. Many thanks for your help. |
That would be quite fine with me. But this not happening here: they are not ignored but gives errors. (My use case are structure destinations. They are a PDF 2.0 feature, but if I add them to a PDF 1.7. too, it makes it easier to reimport its annotations with the newpax package, and it makes it also easier for ngpdf to create links.)
No, personally I think it shouldn't. If the PDF says it is 1.4, the reader should only handle keys that have a meaning in 1.4. |
I have tested it before and I know it complains about entries undefined in a given format version. As far as I know, Annex I.3 prevents such error messages.
It might be poor wording in the quote from my previous comment and the text intended to prescribe exactly that behavior. In any case, the original error message should be a warning about entries being ignored (because undefined) in the given format version. |
Please only refer to ISO 32000-2:2020 as well as https://pdf-issues.pdfa.org/ as these documents include 1000s of corrections and clarifications agreed upon by many experts in the vendor-neutral ISO forums. Previous core PDF specs do not have this benefit. ISO 32000-2:2020 incl. the soon-to-be-published Amd1 are available to everyone for no cost via https://www.pdfa-inc.org/product/iso-32000-2-pdf-2-0-bundle-sponsored-access/ I don't know how veraPDF engineers have codified the Arlington rules (they will need to reply), but PDF 2.0 is now very clear that non-standardized keys in most dictionaries generally need to be 2nd class names with registered developer prefixes to avoid conflict with future (1st class) changes. There are a few exceptions to this rule that are mostly noted - see also pdf-association/pdf-issues#229. 2nd and 3rd class names are easily detectable in software so reporting anything that is not standardized and not a 2nd or 3rd class name key is useful. So if you are using 1st class names (or incorrectly constructed 2nd class names) in many places then that is officially wrong according to the latest spec. I also cannot speak to how veraPDF engineers have codified the detection and reporting of officially deprecated features, but relying on deprecated features (by themselves, since some features have "modern" better alternatives) in a PDF is also generally not a good idea in the long run. In the same way that some PDF 1.0 and PDF 1.1 no longer work today in 99.99% of implementations and that very old low-bit encryption cannot be expected to withstand today's attackers. So if PDF, as a "document of record", is deprecating something then there is a very good reason for it. |
@petervwyatt well the vera-checker errors also if I add a third class name to the catalog:
I would say that is clearly wrong. In the case of my second error with Side remark: I would love to simply force PDF 2.0 in all documents. But as long as support in readers and accessibility checker regarding the tagging is squetchy and pdf/UA-2 is not released we have to support other older PDF versions. Structure destinations improve their accessiblity too, even if they are not mentioned in the 1.7 spec, and if would be a pity to have to drop the SD key because of errors from an arlington checker (warnings are fine ...). |
veraPDF implementation of Arlington model does not support (yet) 2nd and 3rd class names. It just follows Arlington rules defining the permitted 1st class names and reports all other keys as deviations from the model. @ousia Whether the presence of undefined keys 1st class keys is a warning or an error is a question of a policy. I can imagine some workflows where having such keys is fine, and some others, where a more secure policy is preferred. @u-fischer your case of |
The support for 2nd and 3rd class names is added to the latest dev build 1.25.14: https://software.verapdf.org/develop/arlington/1.25/verapdf-arlington-1.25.14-installer.zip All other unknown keys are still reported as deviations from the standard. |
I'm not sure if this an issue with
arlington-pdf-model-checker
or if this should be reported to arlington model.If I add to dictionaries unknown keys, or keys that are not yet known in the used PDF version of the key, I get a failure, e.g.:
Now, I.3 Feature compatibility of the pdf 2.0 says that
So why is such an unknown key a "shall not"?
The text was updated successfully, but these errors were encountered: