Replies: 1 comment 1 reply
-
Great set of questions and observations on a tricky topic, @petermr! This has led me down an enjoyable rabbit hole, though I'm not 100% confident in my findings. My current understanding:
However... when I examine the
I've filed an issue in If this ends up getting fixed there, I'll aim to incorporate the color space information into In the meantime, with many PDFs you can probably assume that an individual integer or float represents a monochrome value (0=black -> 1=white), a 3-value color is RGB, and a 4-value color is CMYK. Not foolproof, but usually a decent first guess. |
Beta Was this translation helpful? Give feedback.
-
I am extracting text characters using
pdfplumber
and wish to have the color as rgb (as it's supported by many tools including CSS). Currently I get a 4-tuple fornon-stroking-color
which I can't find documentedand I guess is CMKY (sic) . CMKY is, I think, CMYK with K and Y reversed.(When I use PDFBox (Java) for the same file I can extract RGB).
My code:
The
non-stroking-color
appears to be a light green/gray (character "J" leading the topline)The file (
PMC1421
) is available at https://github.com/petermr/pyami/blob/pmr4/py4ami/resources/projects/liion4/PMC4391421/fulltext.pdfMore generally, why is this color system being used? Is it defined by the author/software or converted by PDFMiner/pdfplumber?
UPDATE:
With a different document I can extract colour =
(1, 0, 0)
which corresponds to bright red on the screen, i.e. I guess that RGB (scaled to 1) is probably being used. So the question morphs to:"How can I find out what colour model is used for characters? And if not RGB, how can I convert it?"
Thanks
Beta Was this translation helpful? Give feedback.
All reactions