-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Blank pages in pdf lead to the wrong number of pages #72
Comments
interesting, can you email me a sample file to test this out on @niuzaisheng? |
+1 I have the same issue. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I was dealing with a document triggered this error in
papermage/rasterizers/rasterizer.py
:raise ValueError(f"Failed to attach. {len(images)}
images
!= {len(pages)} pages indoc
.")I did a deep debug found that the reason is my pdf has a blank page, and this code, in
papermage/parsers/pdfplumber_parser.py
, to determine the number of pages is by traversing the existence of all the objects, which will skip the blank page, resulting in the number of page objects inpage_annos
list to be less than the actual number of pages.papermage/papermage/parsers/pdfplumber_parser.py
Line 338 in 6a0a4a2
Some further modifications may be needed here to deal with this rare case. Thank you.
The text was updated successfully, but these errors were encountered: