Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Phenotypic columns in parsed datatable do not have the appropriate dtype #93

Closed
2 tasks done
alyssadai opened this issue Oct 7, 2023 · 0 comments · Fixed by #94
Closed
2 tasks done

Phenotypic columns in parsed datatable do not have the appropriate dtype #93

alyssadai opened this issue Oct 7, 2023 · 0 comments · Fixed by #94
Assignees
Labels
flag:blocker flag that issue is blocking at least one other issue from being completed.

Comments

@alyssadai
Copy link
Collaborator

alyssadai commented Oct 7, 2023

Since the dashboard relies on the input tabular data to be in long format, an input column that contains values from multiple tasks with distinct value types per task gets read in by pandas as type object. This means that when individual tasks in the column are extracted to form separate columns of the final datatable shown in the dashboard, their dtypes all still remain object. 😞

e.g., if the expected assessment_score column for a phenotypic .csv contains scores from two different assessments, one whose participant scores are integers and one whose participant scores are true/false, when the .csv is read in everything in the column gets turned into a string. These string values are what get ultimately stored for other back-end data operations in the dashboard, which is problematic when we need to know the original type of the data for e.g., plotting.

This is mainly a problem for the more recently supported phenotypic bagels which have a more liberal column schema, since with imaging bagels all the values in a given input column are generally expected to have the same type.

Steps to fix

  • After processing the input, add a helper function to try and convert columns of the processed dataframe into more appropriate types from object
  • Test above function using a toy dataframe

Note: pandas functions convert_dtypes() and infer_objects() both aren't sufficient here

@alyssadai alyssadai added bug:functional flag:blocker flag that issue is blocking at least one other issue from being completed. labels Oct 7, 2023
@alyssadai alyssadai moved this to Implement - Active in Neurobagel Oct 7, 2023
@alyssadai alyssadai self-assigned this Oct 7, 2023
@alyssadai alyssadai moved this from Implement - Active to Implement - Done in Neurobagel Oct 8, 2023
@alyssadai alyssadai moved this from Implement - Done to Implement - Active in Neurobagel Oct 8, 2023
@alyssadai alyssadai moved this from Implement - Active to Implement - Done in Neurobagel Oct 10, 2023
@rmanaem rmanaem moved this from Implement - Done to Review - Active in Neurobagel Oct 10, 2023
@github-project-automation github-project-automation bot moved this from Review - Active to Review - Done in Neurobagel Oct 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flag:blocker flag that issue is blocking at least one other issue from being completed.
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

1 participant