Replies: 4 comments 6 replies
-
Hello Martin Thoma, I see you edited my post almost immediately, can you please help me with an example on how to extract the text from the coordinates with the visitor? |
Beta Was this translation helpful? Give feedback.
-
The first snippet is a part of the annotations documentation, the second one is about visitor functions. They are completely different. I don't know why you think there is a link between the two. |
Beta Was this translation helpful? Give feedback.
-
Can you please provide an example on how to EXTRACT the text from the annotations coordinates? |
Beta Was this translation helpful? Give feedback.
-
OK, so the simple answer is: No, pypdf can NOT handle extracting text from highlight annotations. This answers also: #701 Some libraries that CAN handle this: |
Beta Was this translation helpful? Give feedback.
-
I want to extract highlight annotations.
Following example at:
https://pypdf.readthedocs.io/en/latest/user/reading-pdf-annotations.html#highlights
How do I use these coordinates to extract the text from in the last line:
x1, y1, x2, y2, x3, y3, x4, y4 = coords
I understand I should use the
visitor_text
https://pypdf.readthedocs.io/en/latest/user/extract-text.html?highlight=extract_text#using-a-visitor
https://pypdf.readthedocs.io/en/latest/modules/PageObject.html?highlight=visitor_text#pypdf._page.PageObject.extract_text
But the use of this function is very confusing to me and I can't seem to wrap my head around the 2 examples provided (Ignore header and footer, Extract rectangles and texts into a SVG-file)
Anybody so kind to show me the link between following code examples:
Beta Was this translation helpful? Give feedback.
All reactions