Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIX] Remove requirement for equal {participants x sessions} across pipelines/assessments #144

Merged
merged 16 commits into from
Mar 18, 2024

Conversation

alyssadai
Copy link
Collaborator

@alyssadai alyssadai commented Mar 6, 2024

Changes proposed in this pull request:

  • Removed unnecessary check for equal {participants x sessions} across pipelines (for imaging CSVs) or assessments (for phenotypic CSVs)
    • When pivoting the long-format input to wide, pandas will simply create empty cells for any combinations that do not exist in the original data, which we think is fine for now
  • Switched to using a method for reshaping the long input data to wide (pd.pivot -> pd.pivot_table) that has finer control over the aggregation method when there are duplicates, and values used to fill missing values in the resulting pivoted table
    • EDIT: Sticking with pd.pivot due to known problems with pd.pivot_table silently dropping NaNs
  • Added input check for duplicate entries for a given participant-session and pipeline/assessment combination (i.e., when there is >1 value for a participant-session for a specific pipeline or phenotypic measure)
    • When this happens, error out for now (pivot_table could theoretically handle this by keeping only the first occurrence in the resulting pivoted table, but this is not intuitive and may mean relevant data is lost - rather, we probably want to flag and tell users to fix duplicate observations)

Checklist

This section is for the PR reviewer

  • PR has an interpretable title with a prefix ([ENH], [FIX], [REF], [TST], [CI], [MNT], [INF], [MODEL], [DOC]) (see our Contributing Guidelines for more info)
  • PR links to GitHub issue with mention Closes #XXXX
  • Tests pass
  • Checks pass

For new features:

  • Tests have been added

For bug fixes:

  • There is at least one test that would fail under the original bug conditions.

@surchs surchs self-requested a review March 6, 2024 20:21
Copy link
Contributor

@surchs surchs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @alyssadai! I have not tested this locally, but the code makes sense to me! If you'd like me to test it, could you provide some files to try the new functionality out with?

I left some comments, otherwise looks 🧑‍🍳

digest/utility.py Outdated Show resolved Hide resolved
digest/utility.py Show resolved Hide resolved
digest/utility.py Outdated Show resolved Hide resolved
digest/utility.py Outdated Show resolved Hide resolved
digest/utility.py Outdated Show resolved Hide resolved
tests/test_utility.py Show resolved Hide resolved
@alyssadai
Copy link
Collaborator Author

alyssadai commented Mar 10, 2024

Thanks for your comments @surchs!

I've made some additional changes including reverting back to using pivot instead of pivot_table due to some (documented) issues w/ the latter dropping NaNs silently that I didn't notice before. Tests and comments have also been updated to prevent this from coming back, and to capture some of the current peculiarities in the long-to-wide tabular data transform.

Let me know if the changes/comments make sense 🙂

@alyssadai alyssadai requested a review from surchs March 10, 2024 19:30
Copy link
Contributor

@surchs surchs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great, thanks @alyssadai !

🧑‍🍳

digest/app.py Show resolved Hide resolved
@alyssadai alyssadai merged commit 08381a8 into main Mar 18, 2024
2 checks passed
@alyssadai alyssadai deleted the remove-equal-records-requirement branch March 18, 2024 18:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] extra requirements for same subjects and sessions across pipelines seems too restrictive
2 participants