Possibility of text extraction using coordinates #831

sandeepreddy5 · 2023-03-03T14:07:57Z

sandeepreddy5
Mar 3, 2023

Describe the bug

How would i extract text using coordinates or along with coordinates?

Code to reproduce the problem

with pdfplumber.open("example.pdf"
example.pdf
) as pdf:
for page in pdf.pages:
print(page.rects)

PDF file

example.pdf

Expected behavior

{'x0': 0.0, 'y0': 0.0, 'x1': 612.0, 'y1': 792.0, 'width': 612.0, 'height': 792.0, 'pts': [(0.0, 0.0), (612.0, 0.0), (612.0, 792.0), (0.0, 792.0)], 'linewidth': 0, 'stroke': False, 'fill': True, 'evenodd': False, 'stroking_color': (1, 1, 1), 'non_stroking_color': (1, 1, 1), 'object_type': 'rect', 'page_number': 1, 'top': 0.0, 'bottom': 792.0, 'doctop': 0.0, '<text': 'respective text'}

Actual behavior

{'x0': 0.0, 'y0': 0.0, 'x1': 612.0, 'y1': 792.0, 'width': 612.0, 'height': 792.0, 'pts': [(0.0, 0.0), (612.0, 0.0), (612.0, 792.0), (0.0, 792.0)], 'linewidth': 0, 'stroke': False, 'fill': True, 'evenodd': False, 'stroking_color': (1, 1, 1), 'non_stroking_color': (1, 1, 1), 'object_type': 'rect', 'page_number': 1, 'top': 0.0, 'bottom': 792.0, 'doctop': 0.0}

Environment

pdfplumber version: [e.g., 0.5.22]
Python version: [e.g., 3.8.1]
OS: [e.g., Mac, Linux, etc.]

jsvine · 2023-03-03T15:22:05Z

jsvine
Mar 3, 2023
Maintainer

Hi @sandeepreddy5; you can do this:

with pdfplumber.open("example.pdf") as pdf:
  for page in pdf.pages:
    for rect in page.rects:
        bbox = pdfplumber.utils.obj_to_bbox(rect)
        print(page.crop(bbox).extract_text())

2 replies

sandeepreddy5 Mar 6, 2023
Author

Thanks @jsvine for timely response, but with the above code text and coordinates for the second column is completely missing, only left side part of the text and coordinates are getting printed.
I'm providing the screenshot of the text which is missing.

samkit-jain Mar 6, 2023
Collaborator

This is because that column is not wrapped in a rect object.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possibility of text extraction using coordinates #831

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

Possibility of text extraction using coordinates #831

sandeepreddy5 Mar 3, 2023

Describe the bug

Code to reproduce the problem

PDF file

Expected behavior

Actual behavior

Environment

Replies: 1 comment · 2 replies

jsvine Mar 3, 2023 Maintainer

sandeepreddy5 Mar 6, 2023 Author

samkit-jain Mar 6, 2023 Collaborator

sandeepreddy5
Mar 3, 2023

Replies: 1 comment 2 replies

jsvine
Mar 3, 2023
Maintainer

sandeepreddy5 Mar 6, 2023
Author

samkit-jain Mar 6, 2023
Collaborator