Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

veraPDF fails when an embedded file Subtype contains media type parameters #1460

Open
petervwyatt opened this issue Jul 11, 2024 · 0 comments
Assignees
Labels
question Currently discussed by the Validation TWG

Comments

@petervwyatt
Copy link

In PDF Errata #155 it was decided that for embedded file stream dictionary Subtype entries, Media Type parameters were not prohibited by PDF. Listen to the recording if you want the full gist of the discussions. Since no PDF/A standard makes any further remarks on this (such as prohibiting), PDF/A files may thus contain Media Type parameters as per RFC 2046.

e.g. for an email with "Content-Type: text/xml; charset=UTF-8", the matching Media Type would be:
/Subtype /text#2fxml;#20charset=UTF-8 but this fails validation by veraPDF.

It appears that veraPDF is using regex /^[-\w+\.]+\/[-\w+\.]+$/ (such as here). This doesn't account for =, or # AFAICT (Java \w = [a-zA-Z_0-9]).

PS. Found as part of EA-PDF

@bdoubrov bdoubrov self-assigned this Jul 12, 2024
@bdoubrov bdoubrov added the question Currently discussed by the Validation TWG label Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Currently discussed by the Validation TWG
Projects
None yet
Development

No branches or pull requests

2 participants