Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create script for accessing cohort query matches on BIC #188

Closed
4 of 8 tasks
surchs opened this issue Nov 11, 2024 · 0 comments · Fixed by neurobagel/internal_deployment#2
Closed
4 of 8 tasks

Create script for accessing cohort query matches on BIC #188

surchs opened this issue Nov 11, 2024 · 0 comments · Fixed by neurobagel/internal_deployment#2
Assignees

Comments

@surchs
Copy link
Contributor

surchs commented Nov 11, 2024

As a query user at the BIC, when I find a cohort of subjects that exist at the BIC with the query tool, I want to then be able to follow a few easy steps to start working with these data locally, so that I can do analysis on this cohort.

Steps that are needed here:

  • Decide on: how should a local BIC user (assuming access to the data) proceed after finding a cohort of interest, in order to start accessing / working with the data we show them
  • A simple CLI or script (maybe installed already on the BIC) that can take a query tool output and generate symlinks to a target directory of the desired subjects
  • a way to download easily the input to the script, in a way that is robust to me including remote (i.e. inaccessible) datasets in my cohort

Assumptions:

  • User has access to the root dataset folder, e.g. /data/pd
    • usually, this means they will need to be added to a specific user group
  • Different datasets are stored on the same server (e.g., Calgary dataset on CC)
  • Dataset results are disaggregated

Limitations:

  • Cannot provide specific access to pheno data for now since we don't store the path of the TSV (and it doesn't have a guaranteed location within the dataset)
  • Cannot provide specific access to derivatives for now

Desired outcome:

  • A simple bash script, without argparse capabilities, that takes at least 2 positional args:
    • participants results TSV from query tool
    • a target location for the symlinks
    • dataset root? or name? (how to know where the directory tree begins)?
  • Output should be a skeleton BIDS-like dataset directory going down to the level of the subject or session (whichever is the lowest level in the provided paths), which then is a symlink to the actual directory in the data storage location
  • Script should check if the symlinked location exists, otherwise error + exclude that subject-session (?)
  • Script should also create a simple README.md that includes a disclaimer that the created directory tree is not an actual valid BIDS dataset (and that BIDS validation should be skipped if using BIDS app pipelines)
  • Script should live under /data/pd for now
@surchs surchs added the flag:schedule Flag issue that should go on the roadmap or backlog. label Nov 11, 2024
@alyssadai alyssadai changed the title Provide tool to symlink a cohort on the BIC Discuss how to instruct users on accessing cohort results Nov 11, 2024
@alyssadai alyssadai removed the flag:schedule Flag issue that should go on the roadmap or backlog. label Nov 11, 2024
@alyssadai alyssadai moved this to Backlog in Neurobagel Nov 11, 2024
@alyssadai alyssadai self-assigned this Nov 15, 2024
@alyssadai alyssadai changed the title Discuss how to instruct users on accessing cohort results Create script for accessing cohort query matches on BIC Nov 15, 2024
@github-project-automation github-project-automation bot moved this from Review - Active to Review - Done in Neurobagel Nov 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Review - Done
Development

Successfully merging a pull request may close this issue.

2 participants