You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
properties struct
dtype: object
Traceback (most recent call last):
... SNIP ...
TypeError: Merlin doesn't provide a mapping from struct (<class 'cudf.core.dtypes.StructDtype'>) to a Merlin dtype. If you'd like to provide one, you can use `merlin.dtype.register()`.
Expected behavior
Since its a standard cuDF data type, I'd expect it to be processed correctly by NVT, or some type of graceful fallback behavior.
Environment details (please complete the following information):
Updated repro that illustrates workflow issues in addition to Dataset creation.
deff_to_pandas(col, df):
pd_series=col.to_pandas()
returncudf.from_pandas(pd_series)
deftest_cudf_struct_type_conversion():
importcudfimportnvtabularasnvtfromnvtabular.opsimportLambdaOpfromnvtabular.ops.operatorimportColumnSelectorinput_df=cudf.read_json("example.json") # different error if we use pd.read_jsonsingle_op=ColumnSelector("properties") >>LambdaOp(f=f_to_pandas)
workflow=nvt.Workflow(single_op)
ds=nvt.Dataset(input_df)
result=workflow.fit_transform(ds).to_ddf().compute()
print(result)
This is related to a lower-level issue that happens when converting cuDF struct columns that contain both nulls and empty structs to Pandas. It can be worked around by exploding structs into separate columns with series.struct.explode() before passing data into NVT.
Describe the bug
Attempting to create an NVT Dataset using a cudf DataFrame containing a struct dtype fails.
Steps/Code to reproduce bug
Create a test file:
reproducer.py
output
Expected behavior
Since its a standard cuDF data type, I'd expect it to be processed correctly by NVT, or some type of graceful fallback behavior.
Environment details (please complete the following information):
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: