Skip to content

How do I extract table from this PDF? #445

Answered by samkit-jain
jakobdo asked this question in Q&A
Discussion options

You must be logged in to vote

Hi @jakobdo Appreciate your interest in the library. I would recommend a 2 step process here. Taking the 3rd page as an example, if you use the debug_tablefinder() with the lines strategy ({"vertical_strategy": "lines", "horizontal_strategy": "lines"}) you'll notice the output as

Step 1: Extract the table using the table strategy and store the vertical coordinates as provided by the first row of the table.

tables = page.find_tables()
header_row = tables[0].rows[0].cells
vertical_lines = [cell[0] for cell in header_row] + [header_row[-1][2]]
# Output -> [Decimal('48.625'), Decimal('399.503'), Decimal('684.000')]

Step 2: Run table extraction using the explicit vertical lines strategy with …

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@jakobdo
Comment options

Answer selected by jakobdo
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants