Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Building DB of Representative Genomes #295

Open
ktmbiome-niaid opened this issue Dec 12, 2024 · 4 comments
Open

Building DB of Representative Genomes #295

ktmbiome-niaid opened this issue Dec 12, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@ktmbiome-niaid
Copy link

Hi!

I'm interested in creating a database of representative genomes. I tried using the command you provided on your github page:

ganon build --source refseq --organism-group archaea bacteria fungi viral \
            --threads 48 --representative-genomes --db-prefix abfv_rs_rg

Unfortunately during filtering of the assembly_summary.txt file, all genomes get filtered out, and thus no database is made. My sense is some miscommunication with the updater script, but I'm not sure. I am able to make databases using several of the other commands on the page, so it seems to be specific to representatives.

Thank you so much!

@pirovc
Copy link
Owner

pirovc commented Dec 13, 2024

Hi! Indeed this is not working at the moment, but it's not related to ganon or genome_updater (the script used to download sequences from NCBI). It's something on the NCBI side, the current assembly_summary.txt (col. 5) is not anymore listing which genomes are representatives :/ I could not find any information about recent changes, so I guess it's a bug.

That's a bummer since the representative genome sub-set was a great reference set to work with. I hope they fix it soon otherwise I would try to implement some alternative way to obtain this sub-set with ganon.

@ktmbiome-niaid
Copy link
Author

I found this article from this September: https://ncbiinsights.ncbi.nlm.nih.gov/2024/09/25/updated-terminology-reference-genome/, suggesting they're now calling them reference genomes instead of representatives, and I do see this terminology in the 5th column as well.

One thing I tried in my testing yesterday, thinking this was a possibility, was to use the -u option to change genome_updater's -c option, but nothing I tried would get read correctly by the script (kept telling me it only expects one argument).

@pirovc
Copy link
Owner

pirovc commented Dec 15, 2024

Thanks for the link, I was not aware of the change. I will update the tools accordingly. For now you could use in ganon build:

--genome-updater "\-c 'reference genome'" to get the old representatives, now called reference genomes.

@pirovc pirovc added the bug Something isn't working label Dec 15, 2024
@ktmbiome-niaid
Copy link
Author

Thank you so much! Also, confirming that the genome-updater option works!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants