Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fastparquet raises on import with numpy 2.0 rc #923

Closed
phofl opened this issue May 6, 2024 · 5 comments · Fixed by #922
Closed

Fastparquet raises on import with numpy 2.0 rc #923

phofl opened this issue May 6, 2024 · 5 comments · Fixed by #922

Comments

@phofl
Copy link

phofl commented May 6, 2024

Describe the issue:

>>> import fastparquet

A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.0.0rc1 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.

Traceback (most recent call last):  File "<stdin>", line 1, in <module>
  File "/Users/patrick/mambaforge/envs/fastparquet/lib/python3.11/site-packages/fastparquet/__init__.py", line 4, in <module>
    from fastparquet.writer import write, update_file_custom_metadata
  File "/Users/patrick/mambaforge/envs/fastparquet/lib/python3.11/site-packages/fastparquet/writer.py", line 15, in <module>
    from fastparquet.api import ParquetFile, partitions, part_ids
  File "/Users/patrick/mambaforge/envs/fastparquet/lib/python3.11/site-packages/fastparquet/api.py", line 11, in <module>
    from fastparquet import core, schema, converted_types, encoding, dataframe, writer
  File "/Users/patrick/mambaforge/envs/fastparquet/lib/python3.11/site-packages/fastparquet/core.py", line 4, in <module>
    from fastparquet import encoding
  File "/Users/patrick/mambaforge/envs/fastparquet/lib/python3.11/site-packages/fastparquet/encoding.py", line 4, in <module>
    from fastparquet.speedups import unpack_byte_array
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/patrick/mambaforge/envs/fastparquet/lib/python3.11/site-packages/fastparquet/__init__.py", line 4, in <module>
    from fastparquet.writer import write, update_file_custom_metadata
  File "/Users/patrick/mambaforge/envs/fastparquet/lib/python3.11/site-packages/fastparquet/writer.py", line 15, in <module>
    from fastparquet.api import ParquetFile, partitions, part_ids
  File "/Users/patrick/mambaforge/envs/fastparquet/lib/python3.11/site-packages/fastparquet/api.py", line 11, in <module>
    from fastparquet import core, schema, converted_types, encoding, dataframe, writer
  File "/Users/patrick/mambaforge/envs/fastparquet/lib/python3.11/site-packages/fastparquet/core.py", line 4, in <module>
    from fastparquet import encoding
  File "/Users/patrick/mambaforge/envs/fastparquet/lib/python3.11/site-packages/fastparquet/encoding.py", line 4, in <module>
    from fastparquet.speedups import unpack_byte_array
  File "fastparquet/speedups.pyx", line 1, in init fastparquet.speedups
ImportError: numpy.core.multiarray failed to import (auto-generated because you didn't call 'numpy.import_array()' after cimporting numpy; use '<void>numpy._import_array' to disable if you are certain you don't need it).

Minimal Complete Verifiable Example:

mamba create -n fastparquet python=3.11
mamba activate fastparquet
pip install "numpy==2.0.0rc1"
pip install fastparquet

import fastparquet

Anything else we need to know?:

Environment:

  • Dask version:
  • Python version: 3.11
  • Operating System: Mac
  • Install method (conda, pip, source): pip
@martindurant
Copy link
Member

I also saw this when trying to investigate #921 .

We might need to wait for a conda-ready numpy 2 and dependencies like pandas.

@bnavigator , how did you manage to build fastparquet and run the tests?

  • A naive pip-install of pandas does not get numpy 2
  • force-installing 2.0.0rc1 brings about the error mentioned in this thread, and
  • following the instructions to call import_array() resulted in a the dreaded "size of dtype changed" exception.

@bnavigator
Copy link

mamba create -n fastparquet python=3.11
mamba activate fastparquet
pip install "numpy==2.0.0rc1"
pip install fastparquet

import fastparquet

Of course you have to compile fastparquet with numpy 2.0.0rc1, nut just install the older wheel from PyPI.

A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.0.0rc1 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.

The logs in #921 is from the rpm build in https://build.opensuse.org/package/show/home:bnavigator:numpy/python-fastparquet, before I applied the fix from #922.

Here is a way to build fastparquet with numpy 2 and check in a plain venv:

bump-numpy.patch
diff --git a/pyproject.toml b/pyproject.toml
index fd80deb..61c1f63 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -1,2 +1,2 @@
 [build-system]
-requires = ["setuptools", "wheel", "Cython >= 0.29.23", "oldest-supported-numpy", "pytest-runner"]
+requires = ["setuptools", "setuptools_scm", "Cython >= 0.29.23", "numpy>=2.0.0rc1"]
diff --git a/requirements.txt b/requirements.txt
index 384b66b..251278d 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,5 +1,5 @@
 pandas>=1.5.0
-numpy>=1.20.3
+numpy>=2.0.0rc1
 cramjam>=2.3
 fsspec
 packaging
diff --git a/setup.py b/setup.py
index d3053c6..b07c16f 100644
--- a/setup.py
+++ b/setup.py
@@ -53,13 +53,6 @@ setup(
         'local_scheme': 'no-local-version',
         'write_to': 'fastparquet/_version.py'
     },
-    setup_requires=[
-        'setuptools>18.0',
-        'setuptools-scm>1.5.4',
-        'Cython',
-        'pytest-runner',
-        'oldest-supported-numpy'
-    ],
     description='Python support for Parquet file format',
     author='Martin Durant',
     author_email='mdurant@anaconda.com',
git clone https://github.com/dask/fastparquet.git
cd fastparquet
patch -p1 < ../bump-numpy.patch
pip wheel -v .
cd ..
python3 -m venv fp_np2
fp_np2/bin/python3 -m pip install fastparquet/fastparquet-2024.2.1.dev1-*.whl
fp_np2/bin/python3
Python 3.11.9 (main, Apr 08 2024, 06:18:15) [GCC] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import fastparquet
>>> import numpy
>>> numpy.__version__
'2.0.0rc1'
>>> fastparquet.__version__
'2024.2.1.dev1'
>>>

@martindurant
Copy link
Member

Ah

-        'oldest-supported-numpy'

of course ...

@martindurant
Copy link
Member

Thanks, @bnavigator , I can build and run the test suite like that and see the failures you fixed in the other PR (I still get warnings on import).

Is, then, the recommendation to build a new set of wheels for release built with the rc1, and expect these should work with older numpy too?

@bnavigator
Copy link

Is, then, the recommendation to build a new set of wheels for release built with the rc1, and expect these should work with older numpy too?

Exactly: https://numpy.org/devdocs/dev/depending_on_numpy.html#numpy-2-0-specific-advice

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants