Replies: 1 comment 2 replies
-
I'm not aware of any out-of-the-box solution that would use |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello,
I am using 'extract_words(keep_blank_chars=True)' to extract blocks of text from my input pdfs, and so far the text extraction is working exactly as it should (thanks for the great work done in pdfplumber).
I am then using the array of text extracted in the previous step and applying a transformation on it such that the text is now modified. Next up, I want to replace the updated array of text at the same place as from where each of them were initially extracted from, so that all the background non-text items are preserved as they are. Lastly, I want to save the updated pdf (the one with the new text).
I understand that the documentation specifically mentions that pdfplumber can not be used for pdf modification and pdf writing, but can be used to save a pdf page as an image. I wanted to know if there is a workaround that I can apply to help me modify the pdf, as saving each page in image format is not a problem for me.
I am able to do the above mentioned transformation using PyPDF2, but the problem there is that PyPDF2 is not able to extract text from a lot of PDFs I am providing it. The problem would also not get solved there since the library is no longer maintained.
Is there a solution which I can apply using pdfplumber, if not, are there any other options across other programming languages which can help solve the above mentioned problem (considering how you guys would probably be more aware about stuff related to PDF).
Beta Was this translation helpful? Give feedback.
All reactions