-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fundamental incompatibility of Threading and Multiprocessing (Python) #230
Comments
Yeah, I doubt this can or should be solved without a major redesign of the whole queuing mechanism. But I would guess it simply isn't worth it. Multiprocessing support was originally introduced to solve a performance bottleneck, which happened in two scenarios:
That's what I suspect, at least. Now if that turns out to be true in practice is to be tested. If not - it's redesign time. |
Hi Samuel, thanks for your explanation.
Well, I can tell, that we once had a config-rollout across 600 routers, done in 5 Min. with 20 concurrent connections and an account-pool of 20 accounts - and yes, that was on our old environment working fine with "multiprocessing" for the CLI tool. On the other hand, I would agree, that nowadays threading shouldn't be that much of a bottle-neck in most I/O bound scenarios when automating tasks on network-elements - maybe other users can comment on this in their setups? I would expect, that nowadays high-cost pre- or post-processing of data would be out-sourced to dedicated multiprocessing tools/scripts instead of trying to solve everything with the "old" (but still fine ;-)) Exscript Queue. So yes, I also doubt going for a redesign would make much sense nowadays - at least for our use-cases - and I would also reject any quick hacks on this just to save efforts. So, the consequence would be to drop official multiprocessing support in Exscript for now (making just the "multiprocessing" mode invalid) - maybe until there is someone/some time/passion for such a redesign... ;-) What do you think? |
Hi @knipknap, hi community,
the last days - after switching some old environments from Python 2.7 with Exscript 2.6.3/2.6.22 to Python 3.11 with Exscript 2.6.28 - we have encountered some very strange and unpredictable deadlocks while using the
exscript
script to run some Exscript templates. First we thought these issues are related to our new environment, then maybe to some bugs with Locks, handled by Exscript, but finally we found out, that this is a fundamental issue in Python! More about this later...Exscript is designed internally to rely on threading for some of his mayor functionalities, like the MainLoop in the WorkQueue, Jobs and some pipeline related stuff. See https://github.com/search?q=repo%3Aknipknap%2Fexscript%20threading.Thread&type=code. Additionally, around 12 years ago, there was support for multiprocessing introduced (e.g., e43f6b7, 0baf858). From that point, Exscript by default behaved differently for the two major use-cases:
exscript
script to just execute Exscript templates, Exscript was started inmultiprocessing
mode. This means essentially:a. one main process
b. many internal threads
c. one multiprocessing.Process for each job (=host)
threading
mode by default, which was good, because then we get essentially:a. one main process
b. many internal threads
c. one threading.Thread for each job (=host)
So, now you might wonder why all this is a big deal. ;-) Well, the problem is, that according to latest changes in Python 3.12, we found out, that the combination of multiprocessing and threading is not safe/stable for any POSIX system and also not considered stable by the CPython implementation! To not replicate all the stuff I found out about this fundamental issue in Python, I would like to share these articles/posts, that I found - they should explain the problem:
'fork'
is broken: change to `'forkserver' || 'spawn'
python/cpython#84559Quick solution:
Now, coming back to our issue, we were able to eliminate the deadlocks by changing the default in the
exscript
script tothreading
.See:
exscript/scripts/exscript
Line 182 in 9d5b035
If most of you have not encountered such deadlocks until now in the use-case 1, then you maybe have been lucky as we for the past 10 years running our older environments! But it does not change anything about the fundamental issue here.
Long-term solution:
The change above affects only the
exscript
script and should be good enough as a quick fix. On the long term, @knipknap @bigmars86, you should think about a fundamental fix - if that is possible at all - or about dropping multiprocessing support for Exscript. What I found out so far regarding possible solutions:fork
, relying onos.fork()
, which creates a copy of the parent thread with almost(!) all states/data - see the articles above explaining everything. This will be dropped as a default in Python 3.14, makingspawn
the new default. Butspawn
andforkserver
create clean child processes and fully rely on e.g. pickling to move data down to these childs - in case Exscript fundamentally relies on full process copies.exscript
script with the current code base according to https://docs.python.org/3/library/multiprocessing.html#multiprocessing.set_start_method with both alternate methods and failed. Exscript is then unable to pickle some local scope objects - among possible other issue as I stopped digging here... Example:spawn
orforkserver
method for multiprocessing - will cost some bigger efforts, unless you already know where the make your hands dirty.That's from my side. Hope this helps in any way.
Cheers, Martin
The text was updated successfully, but these errors were encountered: