Can not find image XObject /Matrix data #2163
Replies: 3 comments 5 replies
-
If you use pdfbox in debug (https://github.com/py-pdf/awesome-pdf#file-analysis--security) you can view easily that the image are upside down. Actually the flip in done in pdf using a proper cm matrix. you should see the cm matrix if you use the extract_text() function with a visitor looking at operators and searching for "Do" operators |
Beta Was this translation helpful? Give feedback.
-
As this not an issue I convert it into discussion to ease analysis |
Beta Was this translation helpful? Give feedback.
-
@jianfan2012 Can you replace the title with a clearer one |
Beta Was this translation helpful? Give feedback.
-
Explanation
I have a pdf file with couple of images, when I extracted the image, the images are upside down, I was able to use PyMUPDF to get the transformation matrix , but with pypdf the image XObject does not contain the transformation matrix for those images. because of license issue, we can not use PyMUPDF package, is there way to see the image transformation matrix using pypdf package?
Code Example
from pypdf import PdfReader
pdf_file="./jpegcompress.pdf"
doc = PdfReader(pdf_file)
p0=doc.pages[0]
print(p0['/Resources']['/XObject']['/Im1'])
{'/BitsPerComponent': 8, '/ColorSpace': IndirectObject(97, 0, 140707430812304), '/Decode': [0.0, 255], '/Filter': '/FlateDecode', '/Height': 560, '/Name': '/X', '/Subtype': '/Image', '/Type': '/XObject', '/Width': 890}
Why there is no matrix in the xobject? is it hiding in some other place ,not in XObject?
jpegcompress.pdf
Beta Was this translation helpful? Give feedback.
All reactions