Skip to content

Unable to detect borderless table #931

Closed Answered by cmdlineluser
bdthanh asked this question in Q&A
Discussion options

You must be logged in to vote

Perhaps there is a better approach, but they are similar to #934

keep_largest=True would be needed in this case and using all sides as explicit lines:

for page in pdf.pages:
    filtered_page = remove_nested_rects(page, keep_largest=True)

    for table in filtered_page.find_tables():
        table = filtered_page.crop(table.bbox).extract_table(dict(
            explicit_horizontal_lines = [table.bbox[1], table.bbox[3]],
            explicit_vertical_lines = [table.bbox[0], table.bbox[2]]
        ))
        print("-" * 42)
        for row in table:
            print(row)
------------------------------------------
['No', 'Description 1', 'Description 2', 'Description 3']
['1', 'Scenario 1…

Replies: 4 comments 6 replies

Comment options

You must be logged in to vote
1 reply
@bdthanh
Comment options

Comment options

You must be logged in to vote
3 replies
@bdthanh
Comment options

@bdthanh
Comment options

@samkit-jain
Comment options

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
2 replies
@cmdlineluser
Comment options

Answer selected by bdthanh
@bdthanh
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants