Replies: 1 comment 2 replies
-
Hi @ytiam, and thanks for your interest in this library. I'm not quite sure what you mean about text inside an "Image", but here's one way you could exclude text within a table: from pdfplumber.utils import intersects_bbox
def get_nontable_text(page):
tables = page.find_tables()
def outside_tables(obj):
return not any(intersects_bbox([obj], t.bbox) for t in tables)
return page.filter(outside_tables).extract_text() |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi guys,
Is there any trick or method available, if anyone wants to just extract the textual information available just inside the paragraph lines, excluding the texts available inside Image and Table? If anyone knows any logic, please help with that as well.
Beta Was this translation helpful? Give feedback.
All reactions