Skip to content

Commit

Permalink
Use multiprocessing spawn mode instead of fork
Browse files Browse the repository at this point in the history
Summary:
In some cases, `multiprocessing.Pool` seems to be stuck when starting workers.
This might be related to some threads being started before `multiprocessing.Pool` which have issues with forking.
Let's use `multiprocessing.get_context("spawn")` since it avoid those problems by using `fork` followed immediately by `execve`.
Also note that the current `fork` strategy will be deprecated in python 3.14 anyway.

Reviewed By: compositor

Differential Revision: D53471525

fbshipit-source-id: 3ea502ad9cd7b470da80bf48809d2d5c32910418
  • Loading branch information
alexblanck authored and facebook-github-bot committed Jun 1, 2024
1 parent 7d60fbd commit 74c2f19
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions sapp/pipeline/parallel_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
# pyre-strict

import logging
from multiprocessing import Pool
import multiprocessing
from typing import Iterable, List, Set, Tuple, Type, Union

from ..analysis_output import AnalysisOutput, Metadata
Expand Down Expand Up @@ -50,7 +50,7 @@ def parse(
initial_rss = get_rss_in_gb()
log.info(f"RSS before parsing: {initial_rss:.2f} GB")

with Pool(processes=None) as pool:
with multiprocessing.get_context("spawn").Pool(processes=None) as pool:
for idx, f in enumerate(pool.imap_unordered(parse, args)):
if idx % 10 == 0:
cur = idx + 1
Expand Down

0 comments on commit 74c2f19

Please sign in to comment.