Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mask values by isnan for dataframe groupby with dropna=True #3762

Open
stress-tess opened this issue Sep 11, 2024 · 0 comments · May be fixed by #3766
Open

mask values by isnan for dataframe groupby with dropna=True #3762

stress-tess opened this issue Sep 11, 2024 · 0 comments · May be fixed by #3766
Assignees
Labels
bug Something isn't working

Comments

@stress-tess
Copy link
Member

In [32]: df = ak.DataFrame({"A":[1,2,2,np.nan],"B":[3,4,5,6]})

In [33]: df.groupby("A").count()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)

ValueError: Attempt to group array using key array of different length

>>> df.groupby("A",dropna=False).count()
     B
A
1.0  1
2.0  2
NaN  1 (3 rows x 1 columns)

doing a dataframe groupby on a column containing NaNs causes an error because we are trying do the aggregation on the full column length. We need to be masking out the values in the NaN segment

>>> df.to_pandas().groupby("A").count()
     B
A     
1.0  1
2.0  2
@stress-tess stress-tess added the bug Something isn't working label Sep 11, 2024
@stress-tess stress-tess self-assigned this Sep 11, 2024
stress-tess added a commit to stress-tess/arkouda that referenced this issue Sep 11, 2024
…ontain `NaN`s

This PR (fixes Bears-R-Us#3762) using dataframe groupby with keys that contain `NaN`s would cause the aggregations to fail. To resolve this, we mask out the values that belong to the `NaN` segment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant