non-standard title table extraction problem #645
BrianCKLu
started this conversation in
Ask for help with specific PDFs
Replies: 1 comment
-
Hi @BrianCKLu, and thanks for the kind words. If the table you want to transform will always look like the examples you have provided, you could do something like the following, and then in into your code: def fix_table_header(table_rows):
header_normalized = [ (x or "").strip() for x in table_rows[0] ]
header_has_blanks = any(x == "" for x in header_normalized)
if header_has_blanks:
for i, alt in enumerate(table_rows[1]):
alt = (alt or "").strip()
if alt:
table_rows[0][i] = alt
table_rows = table_rows[:1] + table_rows[2:]
return table_rows Demonstrating: fix_table_header([
[ "A", "B", "To discard", None, "D"],
[ "", "", "C1", "C2", ""],
[ 1, 2, 3, 4, 5 ],
]) ... returns: [
['A', 'B', 'C1', 'C2', 'D'],
[1, 2, 3, 4, 5]
] |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi, thanks for provide such a useful library.
I would like to ask if there have solution to convert the non-standard title table( Figure 1 & Figure 2), into the form of Figure 3.
(remove spec.).
many thanks !
Beta Was this translation helpful? Give feedback.
All reactions