How can I fine-tune the table settings from the given pdf? #867
Closed
donburi82
started this conversation in
Ask for help with specific PDFs
Replies: 1 comment 1 reply
-
The issue here seems to be that the table does not include any vertical lines for the left and right borders. So you might try deriving those positions programmatically and passing them to settings = {
"explicit_vertical_lines": (
[ r["x0"] for r in page.rects if r["width"] > 300 ] +
[ r["x1"] for r in page.rects if r["width"] > 300 ]
)
}
im = page.to_image(resolution=150)
im.debug_tablefinder(settings)
im.save(...) Produces: |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am having some troubles when trying to fine-tune the table settings for the following pdf.
document.pdf
I first crop out the area containing the table better extraction using the code below:
cropped = p0.within_bbox((150, 123, p0.width-151, p0.height-190))
c = cropped.to_image()
Below is the best settings (for extracting the table) I have found so far but I am not sure how to further fine-tune the settings:
table_settings = {
"vertical_strategy": "explicit",
"horizontal_strategy": "explicit",
"explicit_vertical_lines": p0.curves+p0.edges,
"explicit_horizontal_lines": p0.curves+p0.edges,
"intersection_tolerance": 15,
}
Any hints or help would be greatly appreciated.
Beta Was this translation helpful? Give feedback.
All reactions