Replies: 2 comments
-
This is a typical "Discussions" item. |
Beta Was this translation helpful? Give feedback.
-
This happens when the PDF creator stored the text with insufficient gaps between words. Here, the distance between the right border of "y" in |
Beta Was this translation helpful? Give feedback.
-
Description of the bug
I am encountering cases where the spans in some of text of some PDFs is collapsing without any whitespace. While less than 1% of the spans are affected, it is very noticeable where it is happening.
In Googling this issue, I see that this it typically related to people specifying TEXT_INHIBIT_SPACES. However, I am not specifying any flags. I did try adding TEXT_PRESERVE_LIGATURES, TEXT_PRESERVE_WHITESPACE, and TEXT_PRESERVE_SPANS to the get_text call, but none of these had any affect.
How to reproduce the bug
Shared Colab notebook
formerlyinSection1hasbeenreorganizedinto
PyMuPDF version
1.24.13
Operating system
Linux
Python version
3.10
Beta Was this translation helpful? Give feedback.
All reactions