Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ingest VirusSeq data #421

Open
chaoran-chen opened this issue Dec 7, 2023 · 3 comments
Open

Ingest VirusSeq data #421

chaoran-chen opened this issue Dec 7, 2023 · 3 comments
Labels
enhancement New feature or request

Comments

@chaoran-chen
Copy link

VirusSeq is a Canadian data portal that hosts over 500,000 SARS-CoV-2 sequences that are not on GenBank. It would be amazing if they could be incorporated into the ncov open dataset.

I talked to Sally Otto and Justin Jia (@bfjia) who work on VirusSeq and they support this. Users of sequences from VirusSeq should properly acknowledge the data generators. The policy is explained on this website and @bfjia can provide further information.

Once the data are in the open dataset, they can be ingested into LAPIS open and Sally and Justin are interested in subsequently fetching data from LAPIS, for example, for Duotang.

@chaoran-chen chaoran-chen added the enhancement New feature or request label Dec 7, 2023
@tsibley
Copy link
Member

tsibley commented Jan 10, 2024

Unfortunately, it seems (to me at least) that the usage policy precludes incorporation in our Open dataset. The policy is closer to GISAID's (requires acknowledgement and co-operation with submitters) than INSDC's (no restrictions). We wouldn't be able to meet nor pass along those requirements once incorporated into the Open dataset. (Obviously we strongly encourage acknowledgement and co-operation with data generators/submitters regardless of data source, but there's a difference between expecting courtesy and requiring it.)

Presumably these usage restrictions are why the dataset is not part of INSDC already.

@bfjia
Copy link

bfjia commented Jan 15, 2024

@tsibley I think we can address this issue on open data policy. I will be including this as part of our next biweekly meeting and provide you with an update shortly after. Would this link (https://www.insdc.org/policy/) be a good summary of the policy that I can forward to the VirusSeq team? Thank you.

@tsibley
Copy link
Member

tsibley commented Jan 17, 2024

@bfjia Ah, excellent. The INSDC policy you link to would work, but there are other options too such as a CC-BY license (with attribution to CanCOGeN VirusSeq rather than individual submitters).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants