How to remove watermark with pypdf2 #2917
Replies: 1 comment 2 replies
-
I have converted your issue into a discussion which fits better. At first: PyPDF2 has long been deprecated and you probably should not use it anymore. Removing watermarks from PDFs probably is not really legal as there usually are reasons they have watermarks. I am going to assume that you are only doing this on PDF files created by yourself. For text: Your approach looks correct when the filtering is enabled. Nevertheless, there are tons of different ways to typeset text in PDF files - texts are basically just a collection of characters with a specific position. You might be lucky to have text operators which are organized in groups, but this does not necessarily have to be the case. Additionally taking into account the text color or transformation matrix might help to avoid false positives, although this further complicates the analysis. For images: You are not specifying further details about this. If the watermark is just an image, you should be able to use its |
Beta Was this translation helpful? Give feedback.
-
I can use following code to remove watermarks where pypdf2=3.0.1, but it only works in few situations.
Most of the time, the extracted text contain text and watermark text and the extracted images will also contain watermark images. Does anyone know how to remove watermark text/imgs using pypdf 5.0.1?
Beta Was this translation helpful? Give feedback.
All reactions