Missing images in document with duplicated images #2472
-
I want to create a microsoft word document and use a placeholder image that I want to replace with the proper images with pypdf after the document has been converted to pdf. I have created the attached document with microsoft word using the same image three times (3X). However, when I extract the images with Is there something that can be done do differentiate these images with pypdf other than using distinct images for the placeholder? EnvironmentWhich environment were you using when you encountered the problem? $ python -m platform
Windows-10-10.0.19045-SP0
$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==4.0.2, crypt_provider=('cryptography', '42.0.1'), PIL=8.2.0 Code + PDFThis is a minimal, complete example that shows the issue: from pypdf import PdfReader
pdf_fpath = 'placeholder.pdf'
reader = PdfReader(pdf_fpath)
page = reader.pages[0]
images = page.images
print(len(images)) |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
Images property extracts the images stored attached to the page. As noticed If images are called multiple times they only appear once. |
Beta Was this translation helpful? Give feedback.
-
Thanks for the help @pubpub-zz ! And if I locate the calls to the image, will this be feasible to remove those calls and insert another image in their place? |
Beta Was this translation helpful? Give feedback.
Images property extracts the images stored attached to the page. As noticed If images are called multiple times they only appear once.
In order to view the calls you have to parse the content. The easiest is to get it as operations and look for Do operations or BI for inline images.
2 warnings:
Do operations 'calls' images but also sub drawings: you have to ensure it is an image type object
Some images are included in sub drawings.