Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Breaking change in numpy changes the output of PyLanczos! #5

Open
LUK4S-B opened this issue Dec 31, 2024 · 4 comments
Open

Breaking change in numpy changes the output of PyLanczos! #5

LUK4S-B opened this issue Dec 31, 2024 · 4 comments

Comments

@LUK4S-B
Copy link

LUK4S-B commented Dec 31, 2024

Numpy was updated to version 2.x. This introduced some breaking changes that apparently strongly affect PyLanczos.
This is the output I get when running PyLanczos version 2.1.1 with the old numpy version 1.26.4 :

>>> import numpy as np
>>> np.version.version
'1.26.4'
>>> from pylanczos import PyLanczos
>>> M = np.array([[2.0, 0],[0,2]])
>>> M
array([[2., 0.],
       [0., 2.]])
>>> v, U = PyLanczos(-M, True, 2).run()
>>> v
array([-2., -2.])
>>> U
array([[-0.98037217, -0.1971558 ],
       [ 0.1971558 , -0.98037217]])

while this is the output when running the same version of PyLanczos 2.1.1 with the new numpy version 2.0.2 :

>>> import numpy as np
>>> np.version.version
'2.0.2'
>>> from pylanczos import PyLanczos
>>> M = np.array([[2.0, 0],[0,2]])
>>> M
array([[2., 0.],
       [0., 2.]])
>>> v, U = PyLanczos(-M, True, 2).run()
>>> v
array([-1.11022302e-16, -2.00000000e+00])
>>> U
array([[-1.11022302e-16,  1.00000000e+00],
       [-1.00000000e+00, -2.22044605e-16]])

The new results are wrong. This needs to be fixed very urgently. It took me a long time to figure out that my code was malfunctioning because of this issue. Presumably it has to do with changes described here in the numpy migration guide.

@mrcdr
Copy link
Owner

mrcdr commented Jan 1, 2025

@LUK4S-B
Thank you for reporting this issue. I'm investigating it but still have not succeed to reproduce the problem even on numpy 2.x environment (both 2.0.2 and 2.2.1 latest). If pylanczos is installed (built) with pybind11<2.12, which only supports numpy 1.x, it might cause some problem after upgrading numpy from 1.x to 2.x (although I should specify pybind version in build config).
Some OS-specific condition may also cause the compatibility problem. Could you tell me your pybind version that was used to build pylanczos and your environment (OS, etc)?

Log

$ pip list
Package   Version
--------- -------
numpy     2.2.1
pip       24.2
pybind11  2.13.6
pylanczos 2.1.1
$ python
Python 3.12.5 (main, Aug  9 2024, 08:20:41) [GCC 14.2.1 20240805] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> from pylanczos import PyLanczos
>>> M = np.array([[2.0, 0],[0,2]])
>>> M
array([[2., 0.],
       [0., 2.]])
>>> v, U = PyLanczos(-M, True, 2).run()
>>> v
array([-2., -2.])
>>> U
array([[ 0.40808049, -0.91294595],
       [ 0.91294595,  0.40808049]])

@LUK4S-B
Copy link
Author

LUK4S-B commented Jan 6, 2025

Thanks for getting back to this so quickly. You are right that the issue is more subtle and I now created a minimal environment that exhibits the issue and list the pybind version below. For installation, I proceed as follows:

conda create -n "lanczos_test"
conda activate lanczos_test
conda install numpy=2.0.2
conda install pip
pip install pylanczos

However, I noticed that the issue only arises when performing those installations on a remote server. I am not yet sure why but this might mean that the issue depends on system requirements. Or the reason might be that I use miniforge conda for installation on the server.

conda env export > env.yml produces this output:

name: lanczos_test
channels:
  - conda-forge
dependencies:
  - _libgcc_mutex=0.1=conda_forge
  - _openmp_mutex=4.5=2_gnu
  - bzip2=1.0.8=h4bc722e_7
  - ca-certificates=2024.12.14=hbcca054_0
  - ld_impl_linux-64=2.43=h712a8e2_2
  - libblas=3.9.0=26_linux64_openblas
  - libcblas=3.9.0=26_linux64_openblas
  - libexpat=2.6.4=h5888daf_0
  - libffi=3.4.2=h7f98852_5
  - libgcc=14.2.0=h77fa898_1
  - libgcc-ng=14.2.0=h69a702a_1
  - libgfortran=14.2.0=h69a702a_1
  - libgfortran5=14.2.0=hd5240d6_1
  - libgomp=14.2.0=h77fa898_1
  - liblapack=3.9.0=26_linux64_openblas
  - liblzma=5.6.3=hb9d3cd8_1
  - libnsl=2.0.1=hd590300_0
  - libopenblas=0.3.28=pthreads_h94d23a6_1
  - libsqlite=3.47.2=hee588c1_0
  - libstdcxx=14.2.0=hc0a3c3a_1
  - libuuid=2.38.1=h0b41bf4_0
  - libxcrypt=4.4.36=hd590300_1
  - libzlib=1.3.1=hb9d3cd8_2
  - ncurses=6.5=he02047a_1
  - numpy=2.0.2=py312h58c1407_1
  - openssl=3.4.0=hb9d3cd8_0
  - pip=24.3.1=pyh8b19718_2
  - python=3.12.8=h9e4cc4f_1_cpython
  - python_abi=3.12=5_cp312
  - readline=8.2=h8228510_1
  - setuptools=75.6.0=pyhff2d567_1
  - tk=8.6.13=noxft_h4845f30_101
  - tzdata=2024b=hc8b5060_0
  - wheel=0.45.1=pyhd8ed1ab_1
  - pip:
      - pybind11==2.13.6
      - pylanczos==2.1.1

However, when I copied this into a file lanc_env.yml and ran on my local machine

conda env create -f lanc_env.yml
conda activate lanczos_test

the issue did not persist on the local machine. So I am not sure anymore why this happens on the remote server.

Platform info of remote machine:

  OS: Linux (x86_64-linux-gnu)
  CPU: 144 × Intel(R) Xeon(R) CPU E7-8867 v4 @ 2.40GHz
  WORD_SIZE: 64

Maybe you can try to reproduce the issue using miniforge, and if this also does not give the same results, then it might be necessary to check how things might depend on blas library versions or other dependencies.

@mrcdr
Copy link
Owner

mrcdr commented Jan 14, 2025

@LUK4S-B

Sorry for the late reply.
I've tried your lanc_env.yml on miniforge, but the correct eigenvalues have been obtained. As you pointed out, pylanczos uses BLAS through numpy dot function (numpy doc; used here). I doubt numerical errors due to multi-threaded BLAS might cause the issue, but I think I have to investigate more.

@LUK4S-B
Copy link
Author

LUK4S-B commented Jan 14, 2025

Thanks for getting back to this. Okay, if you tried exactly this lanc_env.yml with miniforge, and the issue was not reproduced, then I am also not sure anymore what causes the problem. But at least it excludes miniforge and leaves only specific dependencies as possible causes. I am sorry, this seems to be a more subtle issue than I expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants