-
Notifications
You must be signed in to change notification settings - Fork 695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix parallel download with asyncio #204
base: master
Are you sure you want to change the base?
Fix parallel download with asyncio #204
Conversation
Other approach that I successfully tried but was far less elegant (and more resource consuming) was creating a copy of the session for each new process. |
Hi @diegorodriguezv, thank you for your amazing spot. |
Hi Lorenzo. I tested it in Ubuntu 19.10 with python 3.7.5 and in windows 10 with python 3.8.2. I downloaded several books and everything looked fine. |
It raises
|
Tested on windows 10 with Python 3.8.3 and it works well. I guess the macOS error should still be fixed before merging. |
The error mentioned by @bborysenko is not related to the asyncio implementation. Please see the following discussion for more information: Also, a possible fix for the qsize issue for macOS can be taken from here:
|
Hi. Thanks for this amazing piece of software.
This is a fix for bugs #163 #174 #199.
The bug happens because SSL uses the process id for encryption and if the same requests session is used in more than one process it will fail.
This fix solves the problem by using asyncio instead of multiprocessing. This way the parallelism is handled in-process without the need to fork. It uses less cpu, less memory and solves the problem in windows since no IPC pickling is necessary. Besides, it speeds up execution when the book contains several images or CSS files.
For an easy explanation of why this approach is better for IO bound problems (networking) see this: https://timber.io/blog/multiprocessing-vs-multithreading-in-python-what-you-need-to-know/
A good example is book 9780135262047 which include 1614 images. The execution time went from 30 minutes to 12 in my machine.
I hope this helps.