Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] multi-dimensional arrays are being accepted as inputs to column constructor #14151

Closed
galipremsagar opened this issue Sep 21, 2023 · 0 comments
Assignees
Labels
bug Something isn't working Python Affects Python cuDF API.

Comments

@galipremsagar
Copy link
Contributor

Describe the bug
A multi-dimensional array is being accepted as inputs to a column constructor if it has a __cuda_array_interface__ implemented.

Steps/Code to reproduce bug

In [25]: import numpy as np

In [26]: import cudf

In [27]: arr = np.array([(1, 2), (3, 4)])

In [28]: import pandas as pd

In [29]: pd.Series(arr)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[29], line 1
----> 1 pd.Series(arr)

File /nvme/0/pgali/envs/cudfdev/lib/python3.10/site-packages/pandas/core/series.py:470, in Series.__init__(self, data, index, dtype, name, copy, fastpath)
    468         data = data.copy()
    469 else:
--> 470     data = sanitize_array(data, index, dtype, copy)
    472     manager = get_option("mode.data_manager")
    473     if manager == "block":

File /nvme/0/pgali/envs/cudfdev/lib/python3.10/site-packages/pandas/core/construction.py:647, in sanitize_array(data, index, dtype, copy, raise_cast_failure, allow_2d)
    644             subarr = cast(np.ndarray, subarr)
    645             subarr = maybe_infer_to_datetimelike(subarr)
--> 647 subarr = _sanitize_ndim(subarr, data, dtype, index, allow_2d=allow_2d)
    649 if isinstance(subarr, np.ndarray):
    650     # at this point we should have dtype be None or subarr.dtype == dtype
    651     dtype = cast(np.dtype, dtype)

File /nvme/0/pgali/envs/cudfdev/lib/python3.10/site-packages/pandas/core/construction.py:698, in _sanitize_ndim(result, data, dtype, index, allow_2d)
    696     if allow_2d:
    697         return result
--> 698     raise ValueError("Data must be 1-dimensional")
    699 if is_object_dtype(dtype) and isinstance(dtype, ExtensionDtype):
    700     # i.e. PandasDtype("O")
    702     result = com.asarray_tuplesafe(data, dtype=np.dtype("object"))

ValueError: Data must be 1-dimensional

In [30]: cudf.Series(arr)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[30], line 1
----> 1 cudf.Series(arr)

File /nvme/0/pgali/envs/cudfdev/lib/python3.10/site-packages/nvtx/nvtx.py:101, in annotate.__call__.<locals>.inner(*args, **kwargs)
     98 @wraps(func)
     99 def inner(*args, **kwargs):
    100     libnvtx_push_range(self.attributes, self.domain.handle)
--> 101     result = func(*args, **kwargs)
    102     libnvtx_pop_range(self.domain.handle)
    103     return result

File /nvme/0/pgali/envs/cudfdev/lib/python3.10/site-packages/cudf/core/series.py:602, in Series.__init__(self, data, index, dtype, name, copy, nan_as_null)
    584 if not isinstance(data, ColumnBase):
    585     # Using `getattr_static` to check if
    586     # `data` is on device memory and perform
   (...)
    592     # be expensive or mark a buffer as
    593     # unspillable.
    594     has_cai = (
    595         type(
    596             inspect.getattr_static(
   (...)
    600         is property
    601     )
--> 602     data = column.as_column(
    603         data,
    604         nan_as_null=nan_as_null,
    605         dtype=dtype,
    606         length=len(index) if index is not None else None,
    607     )
    608     if copy and has_cai:
    609         data = data.copy(deep=True)

File /nvme/0/pgali/envs/cudfdev/lib/python3.10/site-packages/cudf/core/column/column.py:2115, in as_column(arbitrary, nan_as_null, dtype, length)
   2113 # CUDF assumes values are always contiguous
   2114 if len(shape) > 1:
-> 2115     raise ValueError("Data must be 1-dimensional")
   2117 arbitrary = np.asarray(arbitrary)
   2119 # Handle case that `arbitrary` elements are cupy arrays

ValueError: Data must be 1-dimensional

In [31]: cp.array(arr)
Out[31]: 
array([[1, 2],
       [3, 4]])

In [32]: cudf.Series(cp.array(arr))
Out[32]: 
0    1
1    2
2    3
3    4
dtype: int64

In [33]: cp.array(arr).shape
Out[33]: (2, 2)

Expected behavior
Raise an error

Environment overview (please complete the following information)

  • Environment location: [Bare-metal]
  • Method of cuDF install: [from source]
@galipremsagar galipremsagar added bug Something isn't working Python Affects Python cuDF API. labels Sep 21, 2023
@galipremsagar galipremsagar self-assigned this Sep 21, 2023
@github-project-automation github-project-automation bot moved this from In Progress to Done in cuDF/Dask/Numba/UCX Sep 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Python Affects Python cuDF API.
Projects
Archived in project
Development

No branches or pull requests

1 participant