Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: limit numpy version to < 2.0 and fix missing dependency #10537

Merged
merged 4 commits into from
Jun 20, 2024
Merged

Conversation

0x404
Copy link
Contributor

@0x404 0x404 commented Jun 19, 2024

Recently, numpy released version 2.0, which caused oneflow to crash:

$ python3 -m oneflow

A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.0.0 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.

Traceback (most recent call last):  File "~/miniconda3/envs/dev3/lib/python3.9/runpy.py", line 188, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "~/miniconda3/envs/dev3/lib/python3.9/runpy.py", line 147, in _get_module_details
    return _get_module_details(pkg_main_name, error)
  File "~/miniconda3/envs/dev3/lib/python3.9/runpy.py", line 111, in _get_module_details
    __import__(pkg_name)
  File "~/miniconda3/envs/dev3/lib/python3.9/site-packages/oneflow/__init__.py", line 34, in <module>
    oneflow._oneflow_internal.InitNumpyCAPI()
Traceback (most recent call last):
  File "~/miniconda3/envs/dev3/lib/python3.9/runpy.py", line 188, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "~/miniconda3/envs/dev3/lib/python3.9/runpy.py", line 147, in _get_module_details
    return _get_module_details(pkg_main_name, error)
  File "~/miniconda3/envs/dev3/lib/python3.9/runpy.py", line 111, in _get_module_details
    __import__(pkg_name)
  File "~/miniconda3/envs/dev3/lib/python3.9/site-packages/oneflow/__init__.py", line 34, in <module>
    oneflow._oneflow_internal.InitNumpyCAPI()
oneflow._oneflow_internal.exception.Exception: . Unable to import Numpy array, try to upgrade Numpy version!
  File "oneflow/extension/python/numpy.cpp", line 124, in InitNumpyCAPI
    CHECK_EQ_OR_RETURN(_import_array(), 0)
Error Type: oneflow.ErrorProto.check_failed_error

Also, we rely on the typing-extensions library:

from typing_extensions import TypeAlias

Since we depend on the rich library, and rich depends on typing-extensions when using Python < 3.9 (https://github.com/Textualize/rich/blob/master/pyproject.toml#L31C1-L31C18), typing-extensions is automatically installed when installing oneflow on Python < 3.9. However, starting from Python >= 3.9, typing-extensions is missing.

Another issue is that we will install packaging if Python < 3.8, but I encounter a packaging missing issue when using Python == 3.8:

$ python3 -m oneflow
Traceback (most recent call last):
  File "~/miniconda3/envs/dev3.8/lib/python3.8/runpy.py", line 185, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "~/miniconda3/envs/dev3.8/lib/python3.8/runpy.py", line 144, in _get_module_details
    return _get_module_details(pkg_main_name, error)
  File "~/miniconda3/envs/dev3.8/lib/python3.8/runpy.py", line 111, in _get_module_details
    __import__(pkg_name)
  File "~/miniconda3/envs/dev3.8/lib/python3.8/site-packages/oneflow/__init__.py", line 505, in <module>
    import oneflow.mock_torch
  File "~/miniconda3/envs/dev3.8/lib/python3.8/site-packages/oneflow/mock_torch/__init__.py", line 16, in <module>
    from .mock_importer import ModuleWrapper, enable, disable
  File "~/miniconda3/envs/dev3.8/lib/python3.8/site-packages/oneflow/mock_torch/mock_importer.py", line 31, in <module>
    from .mock_utils import MockEnableDisableMixin
  File "~/miniconda3/envs/dev3.8/lib/python3.8/site-packages/oneflow/mock_torch/mock_utils.py", line 33, in <module>
    from packaging.requirements import Requirement
ModuleNotFoundError: No module named 'packaging'

this PR:

  • limit numpy version to < 2.0.
  • explicitly add missing dependency typing-extensions.
  • change condition to install packaging from Python < 3.8 to Python <= 3.8.

@CLAassistant
Copy link

CLAassistant commented Jun 19, 2024

CLA assistant check
All committers have signed the CLA.

Copy link
Contributor

Speed stats:

Copy link
Contributor

Speed stats:
GPU Name: NVIDIA GeForce RTX 3080 Ti 

❌ OneFlow resnet50 time: 43.7ms (= 4368.3ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 57.8ms (= 5779.9ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.32 (= 57.8ms / 43.7ms)

OneFlow resnet50 time: 26.1ms (= 2606.5ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 38.1ms (= 3812.7ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.46 (= 38.1ms / 26.1ms)

OneFlow resnet50 time: 18.9ms (= 3772.7ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 35.4ms (= 7075.5ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.88 (= 35.4ms / 18.9ms)

OneFlow resnet50 time: 17.2ms (= 3439.6ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 31.5ms (= 6297.3ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.83 (= 31.5ms / 17.2ms)

OneFlow resnet50 time: 16.4ms (= 3284.6ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 30.1ms (= 6027.8ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.84 (= 30.1ms / 16.4ms)

OneFlow swin dataloader time: 0.199s (= 39.895s / 200, num_workers=1)
PyTorch swin dataloader time: 0.128s (= 25.629s / 200, num_workers=1)
Relative speed: 0.642 (= 0.128s / 0.199s)

OneFlow swin dataloader time: 0.054s (= 10.801s / 200, num_workers=4)
PyTorch swin dataloader time: 0.032s (= 6.486s / 200, num_workers=4)
Relative speed: 0.601 (= 0.032s / 0.054s)

OneFlow swin dataloader time: 0.030s (= 6.082s / 200, num_workers=8)
PyTorch swin dataloader time: 0.017s (= 3.403s / 200, num_workers=8)
Relative speed: 0.560 (= 0.017s / 0.030s)

❌ OneFlow resnet50 time: 50.0ms (= 5001.8ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 65.9ms (= 6585.9ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.32 (= 65.9ms / 50.0ms)

OneFlow resnet50 time: 37.4ms (= 3742.4ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 45.5ms (= 4554.7ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.22 (= 45.5ms / 37.4ms)

OneFlow resnet50 time: 27.8ms (= 5566.0ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 41.0ms (= 8195.7ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.47 (= 41.0ms / 27.8ms)

OneFlow resnet50 time: 25.2ms (= 5043.8ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 38.2ms (= 7644.9ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.52 (= 38.2ms / 25.2ms)

OneFlow resnet50 time: 24.9ms (= 4970.9ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 35.7ms (= 7144.0ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.44 (= 35.7ms / 24.9ms)

Copy link
Contributor

@0x404 0x404 merged commit 850b4ad into master Jun 20, 2024
25 checks passed
@0x404 0x404 deleted the fix_deps branch June 20, 2024 06:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants