Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shape mismatch when there are two columns and the aggfunc is count or sum #163

Closed
mahaalbashir opened this issue Oct 3, 2023 · 1 comment · Fixed by #167
Closed

Shape mismatch when there are two columns and the aggfunc is count or sum #163

mahaalbashir opened this issue Oct 3, 2023 · 1 comment · Fixed by #167

Comments

@mahaalbashir
Copy link
Contributor

table= acro.crosstab(df.year, [df.grant_type, df.survivor], values=df.inc_grants, aggfunc="aggfunc", margins=True )
This command produces the error ValueError: Array conditional must be same shape as self. The test for this case is the function test_crosstab_with_sum in the test-initial.

Explanation
When the pd.crosstab is used if the aggfunc is:

  1. Sum or count
    columns with zeros are not deleted
  2. Mean or std
    columns with zeros are deleted

The threshold mask is created using the count function which by default (the pandas version) doesn’t delete a column if it is all zeros. So the threshold originally is created with the columns that have zeros in all cells. After the creation of the threshold mask, every column that is all zeros is deleted.
The p-ratio and the nk-rule and p-ratio masks by default don’t show columns with zeros. When run the command:

  1. If the agg function is mean or std the resulting table is without columns with zeros, so the masks and the table are the same shape.
  2. If the agg function is sum or count the resulting table is with columns with zeros, so the masks and the tables are not the same shape.
@mahaalbashir
Copy link
Contributor Author

I think the above explanation is not so accurate. I tested the same scenario with different columns and while this command
table= acro.crosstab(df.year, [df.grant_type, df.survivor], values=df.inc_grants, aggfunc="aggfunc", margins=True ) doesn't work and throw an error, replacing the second column with df.status instead of df.survivor
table= acro.crosstab(df.year, [df.grant_type, df.status], values=df.inc_grants, aggfunc="aggfunc", margins=True ) seems to work fine.

The suggested solution, deleting the columns with zeros from the table before applying the masks, will work fine and solve the issue, but I am not sure why different columns resulted in different behavior. @jim-smith

@mahaalbashir mahaalbashir linked a pull request Oct 5, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant