Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to get the full taxa of predicted hosts #13

Open
WUD2018 opened this issue Apr 2, 2024 · 5 comments
Open

How to get the full taxa of predicted hosts #13

WUD2018 opened this issue Apr 2, 2024 · 5 comments

Comments

@WUD2018
Copy link

WUD2018 commented Apr 2, 2024

Hey developer,

I noted that the Cherry gives hosts of phage sequences. However, it only lists the species names. How can I get the full taxa? Is there any taxa list?

Thanks.

PS: there are two files (virus.csv & prokaryote.csv) in the './database/cherry' file. Which one should I use?

@KennthShang
Copy link
Owner

Hi,

Thanks for using our tools. the full taxa can be found in the file "prokaryote.csv"

You can also search for it with the ETE3 toolkit: http://etetoolkit.org/docs/latest/tutorial/tutorial_ncbitaxonomy.html

Best,
Jiayu

@WUD2018
Copy link
Author

WUD2018 commented Apr 3, 2024

Thanks Jiayu,

One more suggestion here: you may note that the gtdbtk has updated its database, which has a new taxa name system. Is it possible for you to update phabox and assign the phage-host names in alignment with gtdbtk (such as v2.3 release 214)

@KennthShang
Copy link
Owner

Good to know.

We will consider to update the gtdbtk in the near future. (I am sorry that we have to catch up ddl recently

If you are in a hurry, you can convert it by yourself. We provided a script to convert the current results from NCBI taxa to GTDB (in the GTDB folder). If there is a table to align GTDB to gtdbtk, it can be done easily.

Or maybe if you want to share this table with us, that will be helpful to write a script for that.

Best,
Jiayu

@WUD2018
Copy link
Author

WUD2018 commented Apr 5, 2024

Thanks, Jiayu,

Here is the taxa table (v2.3.0 release 214):

gtdb_taxonomy.txt

@KennthShang
Copy link
Owner

Hi there,

I am sorry, but it seems the provided taxa table cannot be used to convert RefSeq into GTDB. I found that many of the sequences in the RefSeq taxa cannot find their corresponding taxa in the provided file (missing many accession maps).

However, I suddenly found that my provided scripts in the GTDB folder had some problems in the previous release. I have fixed the problems with a readme file. Hope it can help you to convert the RefSeq into the wanted GTDB. The GTDB taxa are also downloaded from the official website.

Best,
Jiayu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants