Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refine query results download #376

Open
1 of 2 tasks
alyssadai opened this issue Nov 25, 2024 · 5 comments
Open
1 of 2 tasks

Refine query results download #376

alyssadai opened this issue Nov 25, 2024 · 5 comments
Labels
Epic A collection of issues that are related by topic and can be addressed together. flag:discuss Flag issue that needs to be discussed before it can be implemented.

Comments

@alyssadai
Copy link
Contributor

alyssadai commented Nov 25, 2024

After trying to integrate the new query results, I'm realizing that there are some things confusing about the current files (see https://docs.google.com/spreadsheets/d/1rLgbzZv1AqgYTGglI3yuQVdOM1bbYvvY8ijmbTXsfHg/edit?usp=sharing)

  • we treat one as the "machine-readable" version and one the "human-readable" version, but the machine-readable version currently only contains columns about imaging data. Phenotypic variables also have URIs - why are they not included in the machine-readable file?
    • in other words, the two files have different subject-level attributes represented, but there's no intuitive logic to the missing attributes in the machine-readable file
    • at the same time, both files are technically "machine-readable" in that they are TSVs that should be able to be handled by any tabular data processing tools
  • the order of columns that appear in both files differs between files
  • both files have a session path column (and thus could theoretically be used for data access), and yet we recommend the machine-readable version specifically for data access/dataget - I think this can be confusing

minor:

  • modalities and pipelines probably shouldn't have spaces in them, particularly for the "machine-readable" file

One option that could be more intuitive is to have the same columns in both files, but with exclusively URIs in one, and descriptive labels in the other. This would make the differences between the two files much easier to convey, and we can focus on the fact that one contains the data in a linked data format rather than it being "machine-readable" (which is currently a little bit of a superfluous description).

a user is just someone who wants to find and then use data

What we would like a user to be able to do with

  • the machine file:
    • consume non-path information in expanded URI form, so I can look up other cool things about these controlled terms with code
  • human readable file
    • understand my cohort better, what are their characteristics, so I don't have to look up silly URIs first
    • maybe do some preliminary / testing analysis across samples in my cohort, so I don't have to first harmonize phenotypic data myself
    • finding where the session paths are located, so that I can
      • download them
      • navigate to them when I have these files locally

use cases

  • learn more about cohort
    • via human readable info
    • via more detailed info about controlled terms
  • access cohort

decisions

  • the human file is the primary file
    • it does both access and provide more detail on cohort
  • the other file contains same information (column names and order), but as values uses where applicable the expanded URI of controlled terms
  • the data access tools/scripts should work with either of them
  • we will recommend that users download both files
  • there will be two buttons next to each other (like in the previous version) next to each other, one for each file

Other questions

  • how do we communicate differences between both files to user

Other tasks:

  • Update files in neurobagel_examples
  • Confirm that dataget and our BIC script still works
@alyssadai alyssadai added flag:discuss Flag issue that needs to be discussed before it can be implemented. flag:schedule Flag issue that should go on the roadmap or backlog. labels Nov 25, 2024
@surchs surchs removed the flag:schedule Flag issue that should go on the roadmap or backlog. label Nov 28, 2024
@surchs
Copy link
Contributor

surchs commented Nov 29, 2024

could have a hover-tip for the download buttons to explain what they are about

@alyssadai
Copy link
Contributor Author

Related to #381

@surchs surchs moved this from Backlog to Specify - Active in Neurobagel Dec 2, 2024
@rmanaem
Copy link
Contributor

rmanaem commented Dec 3, 2024

@neurobagel/dev Does this issue require further discussion or what we discussed during stand up and whats in the issue description is sufficient?

@surchs
Copy link
Contributor

surchs commented Dec 3, 2024

To met it's clear as in

  • the GDoc specifies order and names of columns
  • the issue specifies that both files are identical in shape and content, but different in encoding (i.e. URI vs human labels)
  • UI is a bit TBD, but we also define that two buttons should exist for user to click, one per file

@alyssadai alyssadai changed the title Rethink columns in query result files Refine query results download Dec 3, 2024
@alyssadai alyssadai added the Epic A collection of issues that are related by topic and can be addressed together. label Dec 3, 2024
@alyssadai
Copy link
Contributor Author

I've changed this into an epic with sub-issues for the different parts of this update that we discussed. If you folks agree, we can add these to the board.

@alyssadai alyssadai removed the Epic A collection of issues that are related by topic and can be addressed together. label Dec 3, 2024
@surchs surchs added the Epic A collection of issues that are related by topic and can be addressed together. label Dec 4, 2024
@surchs surchs removed the status in Neurobagel Dec 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Epic A collection of issues that are related by topic and can be addressed together. flag:discuss Flag issue that needs to be discussed before it can be implemented.
Projects
Status: No status
Development

No branches or pull requests

3 participants