Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault in python package #192

Open
ynanli opened this issue May 27, 2023 · 6 comments
Open

Segmentation fault in python package #192

ynanli opened this issue May 27, 2023 · 6 comments

Comments

@ynanli
Copy link

ynanli commented May 27, 2023

Hi,

I am trying to use steinbock in HPC, which is installed in a python 3.8 conda environment. The steinbock and the dependencies are installed through pip and the requirement.txt provided.
https://github.com/BodenmillerGroup/steinbock/blob/main/requirements.txt

However, I am somehow stuck in the segmentation step with no mask images generated.

(steinbock) [rec82ces@hilbert214 rec82ces]$ steinbock segment deepcell --minmax
2023-05-27 11:52:55.095298: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/rec82ces/.local/lib/python3.8/site-packages/cv2/../../lib64:/software/conda/3//lib:/lib64:/usr/lib64:/usr/local/lib64:/usr/X11R6/lib64
2023-05-27 11:52:55.095337: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
Segmentation fault
(steinbock) [rec82ces@hilbert214 rec82ces]$ ls
images.csv  img  masks  panel.csv  raw
(steinbock) [rec82ces@hilbert214 rec82ces]$ ls masks
(steinbock) [rec82ces@hilbert214 rec82ces]$ ls img
Fibro_SSc194994_20220207_Aleix_001.tiff

May I ask what might be the problem?
Many thanks!!

@Milad4849
Copy link
Contributor

Milad4849 commented May 30, 2023

Hi ynanli,

Your error has to do with TensorFlow and configuration of gpu computation on your HPC, see here . To confirm this, you can try segmentation via ilastik/cellprofiler or cellpose in steinbock (see here) , these do not require TensorFlow and should run without an issue.

@ynanli
Copy link
Author

ynanli commented May 31, 2023

Hi @Milad4849 ,

Thanks a lot for the suggestion. We now tried to resolve the error by doing it in a GPU node and loading CUDA packages. There are no more errors popping up. However, it can still not perform the deepcell segmentation.

(steinbock) [rec82ces@hilbert300 rec82ces]$ steinbock --version
steinbock, version 0.16.1
(steinbock) [rec82ces@hilbert300 rec82ces]$ steinbock segment deepcell --minmax
Segmentation fault (core dumped)

The ilastik/cellprofiler seems to work, as it generated a cell profiler pipeline file. But I have not tried to do the segmentation yet.

The cellpose does not work. As soon as the cellpose is installed, the steinbock does not work at all.

  Successfully installed cellpose-2.2.2
(steinbock) [rec82ces@hilbert300 rec82ces]$ steinbock segment deepcell 
Segmentation fault (core dumped)
(steinbock) [rec82ces@hilbert300 rec82ces]$ steinbock --version
Segmentation fault (core dumped)
(steinbock) [rec82ces@hilbert300 rec82ces]$ pip uninstall cellpose
  Successfully uninstalled cellpose-2.2.2
(steinbock) [rec82ces@hilbert300 rec82ces]$ steinbock --version
steinbock, version 0.16.0

Other than segmentation, the rest of the steinbock functions seem to work quite well, e.g., measuring intensities/neighbors and exporting csv.

@jwindhager
Copy link
Contributor

jwindhager commented Jun 1, 2023

Hi @ynanli,

I assume you are installing steinbock as a Python package, instead of using the steinbock Docker container. If you do so, you need to make sure that deepcell and tensorflow packages (and potentially the GPU driver/CUDA library versions and the GPU) are compatible.

It is likely that the original error you observed was because your tensorflow package was linked against CUDA (typical on cluster environments with GPU support), but you didn't load the CUDA module on the machine. This you correctly resolved by loading the CUDA module.

The current error likely appears because of package version incompatibilities in your environment. Could you please let us know what versions of tensorflow (pip list) and CUDA are installed/loaded, and what GPU model you are using?

Another way this might go wrong is that by loading the CUDA module on your cluster, you implicitly load some tensorflow module "over" the tensorflow package installed in your environment, causing incompatibilities.

It may be a good idea to involve your system administrator at this point. Alternatively, you could use the steinbock Docker container (GPU-enabled or not), or try running steinbock with Singularity (if your cluster does not support docker; undocumented & untested, but should work).

@ynanli
Copy link
Author

ynanli commented Jun 6, 2023

Hi @jwindhager,

Thanks for the information. Here I provide the package versions here:

(steinbock) [rec82ces@hpc-storage-14k-1 rec82ces]$ pip list | grep tensorflow
tensorflow                    2.8.4
tensorflow-addons             0.16.1
tensorflow-estimator          2.8.0
tensorflow-io-gcs-filesystem  0.32.0
[rec82ces@hpc-storage-14k-1 rec82ces]$ module load CUDA/11.4.3

  CUDA Toolkit 11.4.3

The GPU model is Nvidia GTX 1080 Ti.
I think we might have pinpointed the problem with our system administrator. Due to our firewall policy, the cluster is not connected to the internet, so we cannot download the DeepCell model from the Amazon cloud. However, I heard it might be possible to download the file manually?
Do you, by any chance, have some experience with it?

You got us. The Docker is not supported in our cluster. Thanks for the suggestion with Singularity. Do you think it would work by installing with the Singularity file and running it without internet access?

Thanks a lot!!

@jwindhager
Copy link
Contributor

jwindhager commented Jun 6, 2023

I don't think that this would explain the segfault, but you will indeed need to download the model locally in case you don't have an internet connection. You can have a look how the this is solved in the steinbock Docker container (which ships with a copy of the model):

steinbock/Dockerfile

Lines 167 to 168 in bc10207

RUN mkdir -p /opt/keras/models && \
curl -SsL https://deepcell-data.s3-us-west-1.amazonaws.com/saved-models/MultiplexSegmentation-9.tar.gz | tar -C /opt/keras/models -xzf -

Then, if you use the steinbock Python package on the command line, you can specify the --modeldir parameter of the steinbock segment deepcell command (defaults to /opt/keras/models, matching above download instructions).

If you use the steinbock Python package from within a Python script/notebook, you can use the optional model argument to specify the Keras model instance, for example:

model = None
if model_path_or_name is not None:
from tensorflow.keras.models import load_model # type: ignore
if Path(model_path_or_name).exists():
model = load_model(model_path_or_name, compile=False)
elif Path(keras_model_dir).joinpath(model_path_or_name).exists():
model = load_model(
Path(keras_model_dir).joinpath(model_path_or_name),
compile=False,
)

In theory, using steinbock with Singularity should work, but is untested (#159). Maybe the current maintainer of steinbock, @Milad4849, can comment on whether this will be tested anytime in the foreseeable future? Otherwise, if you are willing to give this a try, it would be very helpful to know if this works for you!

@Milad4849
Copy link
Contributor

Closing due to inactivity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants