Skip to content

extract table headings along with table contents #1011

Closed Answered by jsvine
poojitharamachandra asked this question in Q&A
Discussion options

You must be logged in to vote

Hi @poojitharamachandra; thanks for your interest in this library. There's no built-in automated functionality for what you describe, because the logic ends up being quite custom to the particular layout and structure of any given PDF. But the general idea would be to use page.find_tables() to find the tables on a page; each table's .bbox property will give you its coordinates, which you can use along with page.crop((x0, top, x1, bottom)).extract_text() to select an area above the table (perhaps optionally with page.filter(...), depending on what else is in that area), with those coordinates determined by you based on the spacing between the table and its heading.

Replies: 2 comments 2 replies

Comment options

You must be logged in to vote
1 reply
@poojitharamachandra
Comment options

Answer selected by poojitharamachandra
Comment options

You must be logged in to vote
1 reply
@dhdaines
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
feature-request All feature requests receive this label initially, can be upgraded to "enhancement"
3 participants
Converted from issue

This discussion was converted from issue #1008 on October 12, 2023 16:06.