Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyTorch "Undefined symbol" error when importing SAM ONNX models to cluster #161

Closed
marias65 opened this issue Apr 14, 2024 · 7 comments
Closed
Assignees
Labels
bug Something isn't working local cluster Issues encountered in local cluster notebooks Issues encountered while running the notebooks

Comments

@marias65
Copy link

Currently trying to follow the segment anything notebook to run sentinel2_segmentation.ipynb but when trying to import SAM's ONNX models to the cluster with ! python ../../scripts/export_sam_models.py --models vit_b, I run into an error that says "ImportError: /home/msbksan/micromamba/envs/segment_anything_cpu/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so: undefined symbol: iJIT_NotifyEvent"

@github-actions github-actions bot added the triage Issues still not triaged by team label Apr 14, 2024
@rafaspadilha rafaspadilha self-assigned this Apr 15, 2024
@rafaspadilha rafaspadilha added notebooks Issues encountered while running the notebooks local cluster Issues encountered in local cluster and removed triage Issues still not triaged by team labels Apr 15, 2024
@rafaspadilha
Copy link
Contributor

rafaspadilha commented Apr 15, 2024

Hi, @marias65. I couldn't reproduce your error on my machine but found a few similar issues here and here that the cause might be installing pytorch via conda and a possible solution would be pointing to the CPU wheel during installation.

Quick question: are you able to import PyTorch in the segment_anything_cpy environment?

$ python -c "import torch; print(torch.__version__)"

I was able to set up a new environment with the latest version of PyTorch and run the script to export the model to ONNX files. Could I ask you to try on your end as well?

Please change the env_cpy.yaml, commenting the pip lines as below:

name: new_segment_anything_cpu
channels:
  - pytorch
  - nvidia
  - conda-forge
  - defaults
dependencies:
  - python==3.8.*
  - geopandas~=0.11.1
  - ipython~=8.5.0
  - ipywidgets~=8.0.2
  - jupyter~=1.0.0
  - matplotlib~=3.6.0
  - numpy~=1.23.3
  # - pytorch=2.0.0=py3.8_cpu_0
  # - torchvision=0.15.0=py38_cpu
  # - torchaudio=2.0.0=py38_cpu
  - pip~=22.2.0
  - pandas~=1.5.0
  - rasterio~=1.3.2
  - shapely~=1.8.4
  - tqdm~=4.64.1
  - scikit-image~=0.20.0
  # - pip:
  #     - git+https://github.com/facebookresearch/segment-anything.git
  #     - ../../src/vibe_core
  #     - cartopy~=0.21.0
  #     - xarray~=2022.10.0
  #     - ipympl~=0.9.3
  #     - onnx~=1.14.0
  #     - onnxruntime~=1.15.0

Once the env is created, please activate it and install the pip packages:

$ micromamba activate new_segment_anything_cpu
$ pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
$ pip install git+https://github.com/facebookresearch/segment-anything.git 
$ pip install ../../src/vibe_core 
$ pip install cartopy~=0.21.0 xarray~=2022.10.0 ipympl~=0.9.3 onnx~=1.14.0 onnxruntime~=1.15.0

Make sure the path to src/vibe_core is correct on the $ pip install ../../src/vibe_core command.

Please, let me know if you are able to run the exportation script in this new environment.

@rafaspadilha rafaspadilha changed the title Error when importing SAM ONNX models to cluster PyTorch "Undefined symbol" error when importing SAM ONNX models to cluster Apr 15, 2024
@marias65
Copy link
Author

Thank you for your response! I was not able to run $ python -c "import torch; print(torch.__version__)" as it gave me the same iJIT_NotifyEvent error while in the segment_anything_cpu environment.

I was able to create the new_segment_anything_cpu environment and install all the pip packages you listed but when I attempted to run $ python -c "import torch; print(torch.__version__)" or ! python ../../scripts/export_sam_models.py --models vit_b I still came across the same iJIT_NotifyEvent error.

image

@rafaspadilha
Copy link
Contributor

rafaspadilha commented Apr 19, 2024

Hi, @marias65. I was able to replicate your issue.

Installing the pytorch 2.1.0 with the appropriate wheel within the segment anything environment solved the problem for me.

In summary, what I did was:

  • Create the segment_anything_cpu environment with the yaml that is currently available in the repo.
  • Run pip install torch~=2.1.0 --index-url https://download.pytorch.org/whl/cpu

After that, I was able to import torch:

$ python -c "import torch; print(torch.__version__)"
2.1.2+cpu

Please, could you let me know if this works for you?

I will fix the environment yaml files in the next release.

@rafaspadilha rafaspadilha added the bug Something isn't working label Apr 19, 2024
@rafaspadilha
Copy link
Contributor

rafaspadilha commented Apr 19, 2024

Another possibility that worked for me (and won't change the pytorch version) was creating the environment with the following yaml:

name: segment_anything_cpu
channels:
  - pytorch
  - nvidia
  - conda-forge
  - defaults
dependencies:
  - python==3.8.*
  - geopandas~=0.11.1
  - ipython~=8.5.0
  - ipywidgets~=8.0.2
  - jupyter~=1.0.0
  - matplotlib~=3.6.0
  - numpy~=1.23.3
  - pip~=22.2.0
  - pandas~=1.5.0
  - rasterio~=1.3.2
  - shapely~=1.8.4
  - tqdm~=4.64.1
  - scikit-image~=0.20.0
  - pip:
      - --extra-index-url https://download.pytorch.org/whl/cpu
      - torch~=2.0.0
      - torchvision~=0.15.0
      - torchaudio~=2.0.0
      - git+https://github.com/facebookresearch/segment-anything.git
      - ../../src/vibe_core
      - cartopy~=0.21.0
      - xarray~=2022.10.0
      - ipympl~=0.9.3
      - onnx~=1.14.0
      - onnxruntime~=1.15.0

by running:

$ micromamba env create -f notebooks/segment_anything/env_cpu.yaml

With the environment activated:

$ python -c "import torch; print(torch.__version__)"
2.0.1+cpu

@marias65
Copy link
Author

Thank you! I rebuild farmvibes-ai and followed your latest solution and that seems to have helped!

image

Right now, I receive this message but looking into it further suggests that it is due to limited memory on the machine I am currently using. Otherwise, I would say that it worked, thank you

@rafaspadilha
Copy link
Contributor

I'm glad that error is fixed.

For this new one, the script doesn't require that much memory, especially with the vit_b model. What are your specs (memory and disk space)?

The script also logs a few messages (e.g., when it is able to load the encoder/decoder model and when it starts converting them), but these didn't show up, which I find it weird.

Are you able to import onnxruntime and onnx?

import onnx
import onnxruntime

@rafaspadilha
Copy link
Contributor

Closing this issue for now. @marias65, let me know if you are still facing this error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working local cluster Issues encountered in local cluster notebooks Issues encountered while running the notebooks
Projects
None yet
Development

No branches or pull requests

2 participants