Want to extract information by separate two parts #571
youpengbo2018
started this conversation in
Ask for help with specific PDFs
Replies: 2 comments 3 replies
-
Hi @youpengbo2018, and thanks for your interest in this library. Rather than treating the information as tables, you may have better luck with a more customized approach. Luckily, the PDFs appear to have an obvious vertical line (see To get the horizontal dividers, try: import re
horizontal_dividers = [ w for w in page.extract_words()
if re.search(r"^-{10,}$", w["text"]) ] |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
[China_ptent.pdf](https://github.com/jsvine/pdfplumber/files/7750238/Chin
a_ptent.pdf)
Hi, I wonder that extract the file's data by split into two parts based on the picture.
.This is the code that I try to extract those information before.
import pdfplumber as pr
import pandas as pd
filename = r'F:\GB2652.pdf'
pdf=pr.open(filename)
page = pdf.pages[0]
table = page.extract_tables(table_settings={
"vertical_strategy":"text",
"horizontal_strategy":"text",
"join_tolerance":50
Thank you for your help
Beta Was this translation helpful? Give feedback.
All reactions