Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updating crosstab #167

Merged
merged 6 commits into from
Oct 10, 2023
Merged

updating crosstab #167

merged 6 commits into from
Oct 10, 2023

Conversation

mahaalbashir
Copy link
Contributor

Solving the problem of shape mismatch when there are two columns and the aggfunc is count or sum

@codecov
Copy link

codecov bot commented Oct 5, 2023

Codecov Report

Merging #167 (79e2852) into main (eb1a405) will increase coverage by 0.79%.
Report is 6 commits behind head on main.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #167      +/-   ##
==========================================
+ Coverage   98.82%   99.61%   +0.79%     
==========================================
  Files           9        9              
  Lines        1020     1038      +18     
==========================================
+ Hits         1008     1034      +26     
+ Misses         12        4       -8     
Files Coverage Δ
acro/acro_tables.py 99.76% <100.00%> (+1.98%) ⬆️

@mahaalbashir
Copy link
Contributor Author

I deleted the empty columns from the table regardless of the aggregation functions. Although the errors occurred when the agg func is count or sum, I thought because the masks always delete the columns with zeros, we always want the columns with zeros to be deleted from the table as well.

@jim-smith
Copy link
Contributor

@mahaalbashir
So it looks like your solution always deletes empty columns from tables, even if suppress==False.

A couple of comments:

  1. looks like stata does this by default for frequency tables, (but not for interaction co-efficients) so that won;t be too unexpected for reseaarchers
  2. Can you confirm the circumstance under which this is the default behaviour for crosstab anyway please.
  • I think from what you said it does it already for mean, std deviation but for for count and sum?
  1. Your code only does this for columns, does this never apply to rows?

@mahaalbashir
Copy link
Contributor Author

mahaalbashir commented Oct 9, 2023

@jim-smith

  1. The solution always deletes empty columns from tables, even if suppress==False because in the current version of the code, the masks are applied to the table and the suppressed table is calculated even if suppress==False. Then if suppress is true the table is equal to the suppressed table otherwise it is equal to the original table.

  2. The circumstance under which this is the default behaviour for crosstab

  • What I have noticed while doing the pandas version of crosstab with different aggfuncs is that if the survivor column is used, when the aggfunc is mean or std the empty cells are represented as Nan. Therefore, if there is a column with empty values it will be deleted. However, if the agg func is count or sum the empty cells are represented as zeros. Therefore, if there is an empty column it will not be deleted and all its values will be zeros.
  • If the status column is used regardless of the aggfunc the empty cells are represented as Nan and if there are any columns with empty values they will be deleted.
  • The difference between the status and the survivor columns is that the status column is of type object while the survivor column is of type category.
  1. It happens for rows as well. I will include that in the code.

Maha Albashir and others added 5 commits October 9, 2023 19:53
Added text to docstring

Signed-off-by: Jim-smith <jim-smith@users.noreply.github.com>
fixing typos

Signed-off-by: Jim-smith <jim-smith@users.noreply.github.com>
@jim-smith jim-smith merged commit 193754e into main Oct 10, 2023
4 checks passed
@jim-smith jim-smith deleted the updating_crosstab branch October 10, 2023 16:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Shape mismatch when there are two columns and the aggfunc is count or sum
2 participants