Skip to content

Why are 2 cropped areas producing same .extract_text() output when one is "empty"? #930

Answered by jsvine
cmdlineluser asked this question in Q&A
Discussion options

You must be logged in to vote

I guess the char is considered inside if any point falls within cropped area? And then its position values are "truncated" to those that fit within the area?

Yep, that's exactly it. From the readme:

Cropped pages retain objects that fall at least partly within the bounding box. If an object falls only partly within the box, its dimensions are sliced to fit the bounding box.

I acknowledge that this can be confusing at first, especially with char objects whose bounding boxes extend beyond their visual markings. On the other hand, I wanted crop to adhere to what "crop" means in other contexts/software. It also has some more "expected"/preferred results with things like line objects.

The …

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@cmdlineluser
Comment options

Answer selected by cmdlineluser
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants