-
With the pdfplumber library, you can extract the text of a PDF page, or you can extract the tables from a pdf page. The issue is that I can't seem to find a way to extract text and tables. Essentially, if the pdf is formatted in this way:
I would like the output to be:
If there is no way to separate the text by line, this output is also fine:
In this example you could run extract_text from pdfplumber:
but that extracts text and tables as text. You could run extract_tables, but that only gives you the tables. I need a way to extract both text and tables at the same time. Is this built into the library some way that I don't understand? If not, is this possible? Thank you so much for any you help you can provide, this library is awesome! |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 1 reply
-
I also do see this issue, which explains how to extract text without tables: But I want something slightly different - I want text and tables, one after the other |
Beta Was this translation helpful? Give feedback.
-
@jsvine thank you so much for your response! Currently I'm extracting the tables, then extracting the text, and then I replace the text with tables with a special function. It's pretty hacky though and probably prone to breakage, I'd imagine there's a better way of doing it. Thank you for your consideration for a possible future version, best of luck with everything! You can close this ticket |
Beta Was this translation helpful? Give feedback.
-
For anyone that comes across this in the future, I asked a similar question on stack overflow, and it was answered! Note that this probably only works when there are no words to the left and right of the table itself (this was my case). |
Beta Was this translation helpful? Give feedback.
For anyone that comes across this in the future, I asked a similar question on stack overflow, and it was answered! Note that this probably only works when there are no words to the left and right of the table itself (this was my case).
https://stackoverflow.com/q/71612119/8903959