Far right columns from all tables not extracted #754
erindavisdataiku
started this conversation in
Ask for help with specific PDFs
Replies: 1 comment 2 replies
-
Update: When I manually manipulate the bbox tuple to add 10 to both x1 and bottom, my table_settings work as expected. So my question is when I use find_tables is there a best practice to make that more forgiving so the bboxes don't cut off my far right vertical line and bottom horizontal line? |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I'm using the following to get the right lines on the first table in my pdf:
And this to extract using those table settings:
[['COVERAGES AND PREMIUM'], ['BASIC PREMIUM'], ['SECTION I: PROPERTY COVERAGE'], ['A. DWELLING'], ['B. OTHER STRUCTURES'], ['C. PERSONAL PROPERTY'], ['D. LOSS OF USE'], ['SECTION II: LIABILITY COVERAGE'], ['E. PERSONAL LIABILITY'], ['F. MEDICAL PAYMENTS TO OTHERS'], ['TOTAL BASIC PREMIUM']]
But for some reason it's not picking up the last 'Premium' column.
The same happens when I try to extract the second table from this pdf with these settings:
[['ENDORSEMENT INFORMATION', ''], ['FORMS AND ENDORSEMENTS MADE PART OF THIS POLICY', ''], ['COVERAGE', 'Limit:'], ['CIC-2237 05 14 Retail Benefit', ''], ['CIC-2090 11 17 Identity Theft Resolution Service', ''], ['CIC-2193 01 20 Escaped Liquid Fuel -Section II Limitation of Liability', ''], ['HO 04 90 05 11 Personal Property Replacement Cost Loss Settlement', ''], ['HO 04 54 05 11 Earthquake', ''], ['Percentage Deductible 5%', ''], ['IL 604 CW 05 17 Technology Support Available Through Your Homeowners', ''], ['Policy', ''], ['HO 04 16 10 00 Premises Alarm or Fire Protection System', ''], ['CIC-2040 01 20 Extended Replacement Cost Coverage Endorsement', ''], ['CIC- 964 01 20 Advantage Endorsement', ''], ['HO 05 38 05 11 Limited Fungi, Wet or Dry Rot, or Bacteria Coverage –', ''], ['Massachusetts', ''], ['Section II Liability Limit $50,000', ''], ['Section I Property Limit $10,000', ''], ['HO 06 53 02 17 Home-Sharing Host Activities Amendatory Endorsement HO', ''], ['00 03', ''], ['HO 24 41 09 01 Lead Poisoning Exclusion - Massachusetts', ''], ['CIC- 907 04 96 Lead Poisoning Liability Exclusion and Coverage Option', ''], ['HO 06 48 10 15 Residence Premises Definition Endorsement', ''], ['HO 01 20 02 21 Special Provisions – Massachusetts', ''], ['HO 00 03 05 11 Homeowners 3 Special Form', ''], ['CIC- 908 10 96 Disclosure Statement Lead Poisoning', ''], ['IL502 CW 10 17 Privacy Notice', ''], ['Personal Injury', '$500,000'], ['Deductible Coverage', '$500'], ['NOTES FOR POLICY OR, IF APPLICABLE, ENDORSEMENT R', 'EASON(S)'], ['Additional Interest(s) Modification, Coverage(s) Modification, Add/Change Discou', 'nt'], ['Q009508879', '']]
Both tables are extracted correctly when I explicitly give horizontal lines, however I have several pdfs I need to do this on and each will have a different number of rows, but the columns should be in the same place. Any reason why my far right columns aren't being picked up with these table settings? I've attached the pdf below and I'm only trying to extract from the 3rd page:
AE_p3 = AE_pdf.pages[2]
Beta Was this translation helpful? Give feedback.
All reactions