Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add check before setting multiprocessing context to prevent the RuntimeError #4620

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

xuyumeng
Copy link

@xuyumeng xuyumeng commented Feb 5, 2024

When pycbc is used with some multiprocessing package such as dask, importing pycbc will set the multiprocessing context repeatedly. The RuntimeError: context has already been set will occur. This fix will check the context before setting it, to make sure it will only be set once.

Error message

  File "/Users/yumengxu/miniforge3/envs/pycbc_context_error/lib/python3.11/site-packages/pycbc/__init__.py", line 174, in <module>
    multiprocessing.set_start_method('fork')
  File "/Users/yumengxu/miniforge3/envs/pycbc_context_error/lib/python3.11/multiprocessing/context.py", line 247, in set_start_method
    raise RuntimeError('context has already been set')
RuntimeError: context has already been set

Example to reproduce the error

OS: MacOS 14.2.1

Create a temporary env

conda create -n pycbc_context_error pycbc prefect 
conda activate pycbc_context_error
pip install prefect_dask

Simple script test.py to reproduce the error

import pycbc

from prefect import flow, task, get_run_logger
from prefect_dask.task_runners import DaskTaskRunner


@flow(task_runner=DaskTaskRunner(), log_prints=True)
def analysis():
    print('This is a test')


if __name__ == "__main__":
    analysis()
python test.py

Copy link
Member

@ahnitz ahnitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xuyumeng Thanks for this. I think the change is pretty clear. Once the CI has a chance to run (and assuming no issues seem to arise), I think we can merge this.

@xuyumeng
Copy link
Author

xuyumeng commented Feb 7, 2024

@ahnitz Thanks! I'm currently using pycbc as a toolkit in the pycwb+prefect development. This PR enables the pipeline running in script mode but it throws error again in the service mode. I guess the prefect server will run the children process in spawn mode, which causes the conflict again.

Does this multiprocessing context setting affect the functionality of pycbc as a toolkit? If not, is it possible to change raising error to warning so that we can use pycbc within other task management system like prefect?

@ahnitz
Copy link
Member

ahnitz commented Feb 7, 2024

@xuyumeng For toolkit functionality it is not usually needed. It is needed in some inference-related cases though as fork is the only method which reliable preserves copy on write (faster and reduces memory bloat because common data never needs to be copied or duplicated). Yes, for your use, it should be fine to change the default start method.

I think your intent here is right, e.g. if a choice has already made don't try to set it again. I guess somehow that didn't quite pick up the case here, even if the context was already chosen? Do you have the error? It might provide some clues why your patch didn't work. Naively it looks like it would have handled this case.

If that turns out not be possible for some reason, I think the solution is for us to set this in more limited cases, e.g. remove the global configuration for the package and set specifically in the cases we know it is needed.

Secondarily, I think prefect uses dask. Have you tried using the fork method with dask? That may not be the better fix though.

@xuyumeng
Copy link
Author

xuyumeng commented Feb 7, 2024

@ahnitz The error occurs because Dask is using spawn when the flow runs with .serve(). For example, the following code

import pycbc
from prefect import flow, task
from prefect_dask import DaskTaskRunner

@task
def test_pycbc():
    print("pycbc version: ", pycbc.__version__)
    return pycbc.__version__


@flow(task_runner=DaskTaskRunner(cluster_kwargs={"n_workers": 4, "processes": True, "threads_per_worker": 1}),
      log_prints=True, retries=1)
def pycbc_flow():
    for i in range(4):
        test_pycbc.submit()


if __name__ == "__main__":
    pycbc_flow.serve(name="pycbc-flow")

will trigger the line 193 in my patch, since the context is set to spawn.

I don't know why there is no such issue if I just change the last line to pycbc_flow()

It would be nice if the context is limited to the place where it is needed. I will also have a look at the prefect-dask to see if I can change the default multiprocessing context.

@ahnitz
Copy link
Member

ahnitz commented Feb 8, 2024

@xuyumeng I think in this case, you can turn the error into a warning. That way it will make a good faith attempt to set the context, but provide a warning otherwise. It would be fine I think to be a logging warning message so that it can also be silence if someone knows what they are doing at a higher level.

@ahnitz
Copy link
Member

ahnitz commented Mar 12, 2024

@xuyumeng I think you just need to rebase this from master so we can check the tests complete. Also can you confirm this solves the issue on your side? If so, once the tests pass, please merge this.

@xuyumeng
Copy link
Author

@ahnitz Hi, yes, this solves my issue. I have a technical question. I used merge to update this branch from master last time and switching to rebase will require a force push. Is it safe to do this for PR?

@ahnitz
Copy link
Member

ahnitz commented May 2, 2024

@xuyumeng You update the branch to master, so we can double check that the tests pass before merging? That's the only thing preventing this from being added. You can use merge or rebase. It's just better practice to do rebase rather than a merge as it creates problems comparing between branches.

…me error: context has already been set

When `pycbc` is used with some multiprocessing package such as `dask`, importing `pycbc` will cause setting the multiprocessing context repeatedly. The `RuntimeError: context has already been set` will happen. This fix will check the context before setting it.
Change raising error to warning
@xuyumeng xuyumeng force-pushed the multiprocessing_context_fix branch from a29c883 to bcd0850 Compare May 3, 2024 14:50
@xuyumeng
Copy link
Author

xuyumeng commented May 4, 2024

@ahnitz Hi, I updated the branch to master with rebase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants