How to verify corrupted PDF file #2205
Replies: 4 comments
-
Generally, detecting if a PDF file completely matches the specs probably is rather complex and from my experience, lots of files would fail this. Some PDFs might be partially invalid and could be fixed, while others cannot. The easiest solution I see with pypdf is to create a |
Beta Was this translation helpful? Give feedback.
-
@pubpub-zz @MartinThoma any other suggestion ? :) |
Beta Was this translation helpful? Give feedback.
-
the idea @stefan6419846 proposed is also what I would propose: create a PdfReader, for each page extract text, images collecting possible exceptions; if any exception is raised it means the Pdf is quite damaged. From what I have observed : |
Beta Was this translation helpful? Give feedback.
-
PS : this is not an issue I convert it as a discussion |
Beta Was this translation helpful? Give feedback.
-
In my Python program, I need to need to know if a PDF is corrupted or not.
How can I achieve it with this library ? Is it possible or not for the moment ?
One case to study, get a PDF file (a valid one).
Open the content in an editor, copy paste it in a pdf_file.txt, convert this .txt to .pdf
When you double click on it, the file can not be open in a PDF viewer (it tells you that it's corrupted)
What I would like to do is to have a function to know if a PDF is valid or corrupted
Thanks,
Have a good day :)
Beta Was this translation helpful? Give feedback.
All reactions