Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make logging system more useful for end users #1445

Open
MaximPlusov opened this issue May 15, 2024 · 5 comments
Open

Make logging system more useful for end users #1445

MaximPlusov opened this issue May 15, 2024 · 5 comments

Comments

@MaximPlusov
Copy link
Contributor

Originally posted by @ozross in https://github.com/duallab/ngPDF/issues/2#issuecomment-2067394063 and https://github.com/duallab/ngPDF/issues/2#issuecomment-2071152016

In a picture sent there are warnings about duplicated dictionary keys.
The object ID given is to the /StructTreeRoot dictionary, which seems rather strange.
I've traced these to be resulting from the same name being used as a key in both the /RoleMap and /ClassMap dictionaries, which surely is valid though maybe not best practice.

Screenshot 2024-04-18 at 11 30 13 am

By changing the /RoleMap entry to a relative reference to a separate dictionary object, and similarly for the /ClassMap entry, the warnings no longer occur. Previously these dictionaries were given as direct entries of /StructTreeRoot .

Is there a lesson to be learned here, that could/should be shared in some documentation?

FallMT2022-Jul28.pdf

Here are the warnings:

Apr 23, 2024 7:04:58 AM org.verapdf.parser.COSParser getDictionary
WARNING: Dictionary/Stream contains duplicated key /CRDclause(object key = 470 0 obj, offset = 281158)
Apr 23, 2024 7:04:58 AM org.verapdf.parser.COSParser getDictionary
WARNING: Dictionary/Stream contains duplicated key /onPages(object key = 470 0 obj, offset = 281217)
Apr 23, 2024 7:04:58 AM org.verapdf.parser.COSParser getDictionary
WARNING: Dictionary/Stream contains duplicated key /NOAAtype(object key = 470 0 obj, offset = 281676)
Apr 23, 2024 7:04:58 AM org.verapdf.parser.COSParser getDictionary
WARNING: Dictionary/Stream contains duplicated key /CRDcitation(object key = 470 0 obj, offset = 281776)
Apr 23, 2024 7:04:58 AM org.verapdf.parser.COSParser getDictionary
WARNING: Dictionary/Stream contains duplicated key /CRDfishimages(object key = 470 0 obj, offset = 281807)
Apr 23, 2024 7:04:58 AM org.verapdf.parser.COSParser getDictionary
WARNING: Dictionary/Stream contains duplicated key /PRPcomment(object key = 470 0 obj, offset = 281844)
Apr 23, 2024 7:05:01 AM org.verapdf.gf.model.factory.operators.OperatorParser parseOperator
WARNING: Content stream contains duplicate MCID - 1
Apr 23, 2024 7:05:01 AM org.verapdf.gf.model.factory.operators.OperatorParser parseOperator
WARNING: Content stream contains duplicate MCID - 2
Apr 23, 2024 7:05:01 AM org.verapdf.gf.model.factory.operators.OperatorParser parseOperator
WARNING: Content stream contains duplicate MCID - 3

Here is a picture displaying some of what I think is happening — but it doesn't indicate or explain all of it.

FallMT2022-duplicated-keys

There are 3 keys that are used in both the RoleMap and ClassMap:
/CRDclause , /CRDfishimages , /PRPcomment
one key that differs in the case of a single letter:
/CRDcitation role, as opposed to /CRDCitation class
with no duplication for 2 others: /onPages and /NOAAtype .

Indeed the latter /NOAAtype is not used at all within the structure tree, except as a title NOAAtype of 2 different objects: top.01 and top.05 objects 551 and ??? respectively.

Hope this helps.

@MaximPlusov
Copy link
Contributor Author

In this document the RoleMap contains duplicated entries:

 /CRDclause /Span
 /CRDclause /P
 /onPages /Div
 /onPages /Reference
 /NOAAtype /P
 /NOAAtype /P
 /CRDcitation /P
 /CRDcitation /P
 /CRDfishimages /Div 
 /CRDfishimages /Div
 /PRPcomment /Para
 /PRPcomment /Div

verapdf and Acrobat using the value that was found later

@MaximPlusov
Copy link
Contributor Author

Originally posted by @ozross in https://github.com/duallab/ngPDF/issues/2#issuecomment-2073556947

OK. That is a simple explanation ...

... and I now know the way to prevent it from happening within my LaTeX processing.

But it begs the question of how I could have found this for myself.
When listing the RoleMap in Acrobat, or its Preflight utility, it shows only 1 entry in a list sorted alphabetically — which ordering is not how it appears within the PDF itself.

What software do you use to see the RoleMap, ClassMap and other internal code, in a compressed PDF ? Is it free, for Unix/Linux/MacOS ? Or relatively inexpensive ?

@MaximPlusov
Copy link
Contributor Author

I don't know such programs. I used the veraPDF debugging to explore this case.

@MaximPlusov
Copy link
Contributor Author

MaximPlusov commented May 15, 2024

Originally posted by @ozross in https://github.com/duallab/ngPDF/issues/2#issuecomment-2095087287 and https://github.com/duallab/ngPDF/issues/2#issuecomment-2112138406

Great; that's an option of which I was not aware.
How does one turn it on? I'm using the GreenfieldGuiWrapper .
I can see how to adjust Settings, such as the Logging Level, and the Features Config check-box marks (not sure what these give). But don't see anything more detailed for debugging.

With Logging set to ALL I'm getting messages such as:
FINE: Can't get PSObject for COSType COS_UNDEFINED
from getPSObject
and
FINE: Unknown ColorSpace name
from getColorSpaceFromName
and
FINE: java.lang.NumberFormatException: For input string ""
from readNumber

Yet the Compliance is Passed (for PDF/UA-1).
If Logging is set as anything else, these messages do not occur; so I'm guessing that they aren't really important. Nevertheless it would be nice to see just what they refer to, and where it occurs.
Maybe I'm setting a null string () instead of 0 ?

I'd like to learn how to diagnose these completely, even if it isn't crucial.

FallMT2023 2.pdf

The image below shows that the PDF validates for both PDF/UA-1 and PDF/A-3a, but there are a significant number of messages written to the shell window, from which the GUI interface was launched.

Screen Shot 2024-05-15 at 8 00 41 pm

After unchecking everything in that "Features Config" there are still many messages; so I cannot tell whether any of those settings were relevant. Probably not.
Also, I realise now that I can run veraPDF from a command-line shell, and that there's a --debug option. But so far I've not been able to use it to explore the cause of these messages.

Cheers.

  Ross

@bdoubrov bdoubrov changed the title Find place in the document for given logs Make logging system more useful for end users May 17, 2024
@MaximPlusov
Copy link
Contributor Author

@ozross
All these messages are incorrect and we will try to remove it:
FINE: Can't get PSObject for COSType COS_UNDEFINED from getPSObject (already disabled)
and FINE: Unknown ColorSpace name from getColorSpaceFromName (connected with using Indexed color space)
and FINE: java.lang.NumberFormatException: For input string "" from readNumber (connected with '-|' inside Type1 Font Private Part)
Option --debug used only for showing all processed file names

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant