-
I am running on It seems that an error occurs due to a loop of recursive calls when parallel processing using Here is the code where the error occurs. Occurs in pypdf 4.1.0 or 4.2.0. from concurrent.futures import ProcessPoolExecutor
from pypdf import PdfReader, PdfWriter
from multiprocessing import freeze_support
parallel_num = 4
def _page_numbering_pdf(pdf_file: PdfReader):
pdf_file_writer = PdfWriter()
for page in pdf_file.pages:
pdf_file_writer.add_page(page)
return pdf_file_writer
merge_base = PdfReader("test.pdf")
if __name__ == '__main__':
freeze_support()
with ProcessPoolExecutor(max_workers=parallel_num) as executor:
pdf_file_writer = PdfWriter()
result_pdf_writer_list = executor.map(
_page_numbering_pdf,
[merge_base]
) The file can be anything. The following is the error.
|
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 3 replies
-
You are defining The real issue seems to be that you are passing a shared reader to the parallel code. There is no real way to ensure that each worker catches the reader in the same state, most likely leading to the issue you see. I just did a quick test which initializes the reader inside I did not yet do any further research on what changed here in pypdf, but I personally would refrain from using such a pattern altogether. |
Beta Was this translation helpful? Give feedback.
-
Since no one seems to be particularly interested in this, I'll close this question.
|
Beta Was this translation helpful? Give feedback.
Since no one seems to be particularly interested in this, I'll close this question.
An infinite loop occurs when using multiprocessing with
PdfReader
orPdfWriter
.PdfReader
in multiple threads, but it is necessary to save it as a pdf first.