Detecting paragraphs or blank lines inside a table #736
Cristishor201
started this conversation in
Ask for help with specific PDFs
Replies: 2 comments 5 replies
-
Possible duplicate of #122 |
Beta Was this translation helpful? Give feedback.
0 replies
-
Hi @Cristishor201, and thanks for your interest in Depending on the specifics of the PDF (sharing it in this thread will help), that setting may not be fully sufficient, but it's a start. |
Beta Was this translation helpful? Give feedback.
5 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
So there are these questions on stackoverflow:
pdfplumber - How to extract table with no horizontal lines? - this is mine
Use pdfplumber to extract paragraphs - this one is similar
But I will repost it again here...
So my pdf looks like this:
As you can see I don't have horizontal lines inside the table. And I need some sort of parameter or something to split the data from second column like:
['PRODUCT 1\ndescription line 1\ndescription line 2', 'PRODUCT 2\ndescription line 1', 'PRODUCT 3\ndescription line 1\ndescription line 2']
- on vertical extraction ( I jumped over the other columns)or
[['1', 'PRODUCT 1\ndescription line 1\ndescription line 2', 'BUC', '1', '35.00', '35.00', '6.65'], ['2', 'PRODUCT 2\ndescription line 1', 'buc', '1', '7.00', '7.00', '1.33'], ['3', 'PRODUCT 3\ndescription line 1\ndescription line 2', 'buc', '1', '31.00', '31.00', '5.89']]
- on horizontal extractionOn the image, I put some red rectangles to understand where should split.
Beta Was this translation helpful? Give feedback.
All reactions