How to get viewport in user coordinates of rendered images in a page? #2762

FSeidinger · 2024-07-20T11:06:42Z

FSeidinger
Jul 20, 2024

We get a lot of PDFs uploaded by customers, that are scanned documents or forms. So most of the time a PDF page only contains a single image.

The customers mainly use smart phones or scanners to produce the uploaded PDFs. A lot of these phones and scanners produce PDFs with images embedded that are in full resolution of the camera and produce huge PDFs due to huge images embedded in the PDF. It is not uncommon to see images in a native 1.200 DPI resolution of even higher

Before sending the images to an archive, I want to resize/resample the images for a target resolution of 72 DPI.

While pypdf gives me the images in the page and its physical size, it does not give me the viewport in user coordinates of the rendered image. This I would need to do the resample part.

Is there any solution to get the viewport from the PDF rendering operands in a pypdf ish way or can you recommend another python lib that does parsing the rendering parts and gives the viewport of rendered images on a page?

stefan6419846 · 2024-07-28T19:44:17Z

stefan6419846
Jul 28, 2024
Maintainer

This partly is a duplicate of #2763. As pointed out there, it is not really clear to me why you cannot just use the regular media box of the page. Is it really this common to have PDF files where these images do not fill the whole page? (At least looking at the large collection of customer files I usually deal with, photos are either sent as regular images or as PDFs with the images matching the page size. Apart from this, 1200 DPI sound quite high and I have not really seen this ever being used anywhere for real use-cases.)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to get viewport in user coordinates of rendered images in a page? #2762

{{title}}

Replies: 1 comment

{{title}}

Select a reply

How to get viewport in user coordinates of rendered images in a page? #2762

FSeidinger Jul 20, 2024

Replies: 1 comment

stefan6419846 Jul 28, 2024 Maintainer

FSeidinger
Jul 20, 2024

stefan6419846
Jul 28, 2024
Maintainer