Possibility of text extraction using coordinates #831
Unanswered
sandeepreddy5
asked this question in
Q&A
Replies: 1 comment 2 replies
-
Hi @sandeepreddy5; you can do this: with pdfplumber.open("example.pdf") as pdf:
for page in pdf.pages:
for rect in page.rects:
bbox = pdfplumber.utils.obj_to_bbox(rect)
print(page.crop(bbox).extract_text()) |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Describe the bug
How would i extract text using coordinates or along with coordinates?
Code to reproduce the problem
with pdfplumber.open("example.pdf"
example.pdf
) as pdf:
for page in pdf.pages:
print(page.rects)
PDF file
example.pdf
Expected behavior
{'x0': 0.0, 'y0': 0.0, 'x1': 612.0, 'y1': 792.0, 'width': 612.0, 'height': 792.0, 'pts': [(0.0, 0.0), (612.0, 0.0), (612.0, 792.0), (0.0, 792.0)], 'linewidth': 0, 'stroke': False, 'fill': True, 'evenodd': False, 'stroking_color': (1, 1, 1), 'non_stroking_color': (1, 1, 1), 'object_type': 'rect', 'page_number': 1, 'top': 0.0, 'bottom': 792.0, 'doctop': 0.0, '<text': 'respective text'}
Actual behavior
{'x0': 0.0, 'y0': 0.0, 'x1': 612.0, 'y1': 792.0, 'width': 612.0, 'height': 792.0, 'pts': [(0.0, 0.0), (612.0, 0.0), (612.0, 792.0), (0.0, 792.0)], 'linewidth': 0, 'stroke': False, 'fill': True, 'evenodd': False, 'stroking_color': (1, 1, 1), 'non_stroking_color': (1, 1, 1), 'object_type': 'rect', 'page_number': 1, 'top': 0.0, 'bottom': 792.0, 'doctop': 0.0}
Environment
Beta Was this translation helpful? Give feedback.
All reactions