Replies: 3 comments 7 replies
-
I want to get rid of I would not add a read(path) function which essentially does the same as PdfReader(path). I simply don't see the benefit of it. Any breaking change to pypdf has to have a clear benefit. A lot of people use it and I want to avoid breaking changes. We had a lot of breaking changes in 2022 in order to make the interface more consistent / pythonic. Although I think that I communicated those changes pretty well, I still see a lot of people using the old interface/struggling with the switch. I update code in lots of places (stackoverflow, other git repositories, writing people who wrote articles/tutorials). I simply don't want to put that much time in another change like this, except if there is a big benefit. I'm still open to be convinced that there is such a big benefit (or that the change is less complex than I currently think) :-) |
Beta Was this translation helpful? Give feedback.
-
Hello @lababidi and everyone. Thank you to everyone, especially @MartinThoma , who has really breathed new life into this project. I really like that this library is being brought up to date. I appreciate all of your work and effort. I am outsider. I think this is my first comment. So, I don't expect my opinion to count for much. However, I just wanted to say that I think that @lababidi's idea regarding having a PDF class that retains the state of a PDF object make a lot of sense to me. In my humble opinion, the concern that @lababidi raises about how writing a PDF after it has been read results in many changes to the PDF even if nothing has really changed... is a valid point theoretically. On the one hand, I can see where this may be desireable. The user documentation for pypdf / PyPDF2 even says that you can reduce the PDF size by doing this (i.e., by reading and then writing the file). However, I wonder if there might be use cases where you might want to preserve the PDF state in all but the ways explicitly operated upon by the user. That's just my two cents. Thank you to everyone who contributes and works on this project, especially @MartinThoma who I can see has put a lot of effort into it. |
Beta Was this translation helpful? Give feedback.
-
I would like to resurrect this discussion because, I really like pypdf, and want to make it even more awesome :) I'm happy to help design and use already implemented functions. I think that having a PDF class isn't as daunting as one would think if you dig into it. I concur with everything @alexwgee said, and I think something could be done. Again, I'm happy to help implement and design, but I would love to connect first to align. |
Beta Was this translation helpful? Give feedback.
-
Explanation
Currently there are multiple classes (PdfReader, PdfWriter, PdfMerger) that are action oriented. Typically Classes represent objects, not actions. Functions would represent the actions needed a bit more succinctly and effectively (
pypdf.read() pypdf.PDF.write() pypdf.merge()
). In the same token, the resulting object that will be read or written is simply a PDF which could be a classPDF()
. This would allow for a clearer API and allow for PDFs to be read in and written out easily with modifications easily applied to thePDF
object. Additionally, this would allow MetaData to persist along with Fonts etc.This would be somewhat similar to the Pandas interface (
pd.read_csv
,pd.DataFrame
). I think that model would work well for PDFs.I wanted to present this and discuss it before I began prototyping something out. Would love to know your thoughts @MartinThoma et al!
Code Example
How would your feature be used? (Remove this if it is not applicable.)
Beta Was this translation helpful? Give feedback.
All reactions