-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pull data directly from COG-UK Data #329
Comments
We discussed a couple of options to address this during triage:
@joverlee521 will continue work on the latter scripts and then revisit this issue. |
Prompted by @corneliusroemer, this is my general idea of how to switch to directly pulling data from COG-UK instead of relying on their submissions to GenBank:
|
Context
There has been a significant drop off in sequences from the UK in the NCBI data since ~April 2022 (issue was originally raised in Slack):
Description
We can update the pipeline to pull metadata and sequences directly from COG-UK Data instead of waiting on them to submit to NCBI.
We would have to use the
ena_sample.secondary_accession
column in their accessions TSV to drop duplicates from GenBank via the BioSample accession.The text was updated successfully, but these errors were encountered: