PDF Recreation #591

banagg · 2022-01-27T22:32:53Z

banagg
Jan 27, 2022

Hello,

I am using 'extract_words(keep_blank_chars=True)' to extract blocks of text from my input pdfs, and so far the text extraction is working exactly as it should (thanks for the great work done in pdfplumber).

I am then using the array of text extracted in the previous step and applying a transformation on it such that the text is now modified. Next up, I want to replace the updated array of text at the same place as from where each of them were initially extracted from, so that all the background non-text items are preserved as they are. Lastly, I want to save the updated pdf (the one with the new text).

I understand that the documentation specifically mentions that pdfplumber can not be used for pdf modification and pdf writing, but can be used to save a pdf page as an image. I wanted to know if there is a workaround that I can apply to help me modify the pdf, as saving each page in image format is not a problem for me.

I am able to do the above mentioned transformation using PyPDF2, but the problem there is that PyPDF2 is not able to extract text from a lot of PDFs I am providing it. The problem would also not get solved there since the library is no longer maintained.

Is there a solution which I can apply using pdfplumber, if not, are there any other options across other programming languages which can help solve the above mentioned problem (considering how you guys would probably be more aware about stuff related to PDF).

jsvine · 2022-01-29T20:07:31Z

jsvine
Jan 29, 2022
Maintainer

I'm not aware of any out-of-the-box solution that would use pdfplumber for this purpose, unfortunately.

2 replies

banagg Feb 1, 2022
Author

Any other project across other programming languages which may come close to this?

samkit-jain Feb 2, 2022
Collaborator

https://github.com/JoshData/pdf-redactor might be useful. I have used it in the past and it worked fine.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PDF Recreation #591

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

PDF Recreation #591

banagg Jan 27, 2022

Replies: 1 comment · 2 replies

jsvine Jan 29, 2022 Maintainer

banagg Feb 1, 2022 Author

samkit-jain Feb 2, 2022 Collaborator

banagg
Jan 27, 2022

Replies: 1 comment 2 replies

jsvine
Jan 29, 2022
Maintainer

banagg Feb 1, 2022
Author

samkit-jain Feb 2, 2022
Collaborator