Skip to content

How do I fine-tune the extraction of the table from this PDF? #448

Answered by samkit-jain
jakobdo asked this question in Q&A
Discussion options

You must be logged in to vote

Hi @jakobdo For the first case where you are using the extraction strategy as

{
    "vertical_strategy": "text",
    "horizontal_strategy": "lines",
    "keep_blank_chars": True
}

but are failing to get the last row, you may use the intersection_y_tolerance as

{
    "vertical_strategy": "text",
    "horizontal_strategy": "lines",
    "keep_blank_chars": True,
    "intersection_y_tolerance": 5
}

It will give you the result as

You may find all the available table extraction settings at https://github.com/jsvine/pdfplumber#table-extraction-settings.

Unfortunately, I have not been able to come up with a table extraction strategy that works for this table without using the explicit_vertical_…

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by jakobdo
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants